Thursday, December 26, 2019

Indexes and ranges in C# 8


You would probably hear about new types in C# 8 such as Index and Range. If you answer no, it’s high time to take a look and see how they help us with collections in C#. Also for supporting these types Microsoft added two new operators such as ^ and ..
  • System.Index represents an index into a sequence.
  • The index from end operator ^, which specifies that an index is relative to the end of the sequence.
  • System.Range represents a sub range of a sequence.
  • The range operator .., which specifies the start and end of a range as its operands.

System.Index operator ^(int fromEnd);
System.Range operator ..(Index start = 0, Index end = ^0);
Microsoft came up with the idea of using this sequence quite a long time ago. They have already implemented work with sequences in F#
// Sequence that has an increment.
seq { 0 .. 10 .. 100 }
seq { for i in 1 .. 10 -> i * i }
Probably if you work with Python, you will probably use or see similar functionality there:
>>> "Hello, world!"[3:9]
'lo, wo'
>>> string = "Hello, world!"
>>> string[:5]
'Hello'
>>> string[-6:-1]
'world'
>>> string[-9:]
'o, world!'
>>> string[:-8]
'Hello'
>>> string[:]
'Hello, world!'
So, let’s take a look how it was implemented in C#.
static void Main(string[] args)
{
    var words = new string[]
    {
        // index from start    index from end
        "The",      // 0                   ^9
        "quick",    // 1                   ^8
        "brown",    // 2                   ^7
        "fox",      // 3                   ^6
        "jumped",   // 4                   ^5
        "over",     // 5                   ^4
        "the",      // 6                   ^3
        "lazy",     // 7                   ^2
        "dog"       // 8                   ^1
    };              // 9 (or words.Length) ^0

    Console.WriteLine($"The last word is {words[^1]}");

    var quickBrownFox = words[1..4];
    var lazyDog = words[^2..^0];

    var allWords = words[..]; // contains "The" through "dog".
    var firstPhrase = words[..4]; // contains "The" through "fox"
    var lastPhrase = words[6..]; // contains "the", "lazy" and "dog"

    Range phrase = 1..4;

    var text = words[phrase];
}

This example was grabbed from Microsoft documentation about C# 8 new features. I would suggest checking out how it works under the hood.
private static void Main(string[] args)
{
    string[] obj = new string[9];
    obj[0] = "The";
    obj[1] = "quick";
    obj[2] = "brown";
    obj[3] = "fox";
    obj[4] = "jumped";
    obj[5] = "over";
    obj[6] = "the";
    obj[7] = "lazy";
    obj[8] = "dog";
    string[] array = obj;
    string[] array2 = array;
    Console.WriteLine("The last word is " + array2[new Index(1, true).GetOffset(array2.Length)]);
    string[] subArray = RuntimeHelpers.GetSubArray(array, new Range(1, 4));
    string[] subArray2 = RuntimeHelpers.GetSubArray(array, new Range(new Index(2, true), new Index(0, true)));
    string[] subArray3 = RuntimeHelpers.GetSubArray(array, Range.All);
    string[] subArray4 = RuntimeHelpers.GetSubArray(array, Range.EndAt(4));
    string[] subArray5 = RuntimeHelpers.GetSubArray(array, Range.StartAt(6));
    Range range = new Range(1, 4);
    string[] subArray6 = RuntimeHelpers.GetSubArray(array, range);
}

In this example, you will find how ^ operator works. In original code we have line of code for output the text “The last word is dog”.
Console.WriteLine($"The last word is {words[^1]}");
Compiler just creates extension method using new type Index and function GetOffset.
Console.WriteLine("The last word is " + array2[new Index(1, true).GetOffset(array2.Length)]);
If we decide to take 3 elements from the end, compiler will generate for us something like this.
Original:
Console.WriteLine($"The last word is {words[^3]}");
Compiler generated:
Console.WriteLine("The last word is " + array2[new Index(3, true).GetOffset(array2.Length)]);
Here the compiler creates Index structure with the first parameter index value which has to be more than zero, and the second parameter indicating the index direction, false means from the start and true means that we start from the end of index.
//
// Summary:
//     Initializes a new System.Index with a specified index position and a value that
//     indicates if the index is from the start or the end of a collection.
//
// Parameters:
//   value:
//     The index value. It has to be greater or equal than zero.
//
//   fromEnd:
//     A boolean indicating if the index is from the start (false) or from the end (true)
//     of a collection.
public Index(int value, bool fromEnd = false);

And function GetOffset calculates the offset from the start using the given collection length.
The new indexer will be implemented by converting the argument of type Index into an int and emitting a call to the int based indexer. For discussion purposes, let’s use the example of receiver[expr]. The conversion of expr to int will occur as follows:
  • When the argument is of the form ^expr2 and the type of expr2 is int, it will be translated to receiver.Length - expr2.
  • Otherwise, it will be translated as expr.GetOffset(receiver.Length).

Range:
C# has no syntactic way to access "ranges" or "slices" of collections. Usually users are forced to implement complex structures to filter/operate on slices of memory, or resort to LINQ methods like list.Skip(5).Take(2). With the addition of System.Span<T> and other similar types, it becomes more important to have this kind of operation supported on a deeper level in the language/runtime, and have the interface unified.
The language will introduce a new range operator x..y. It is a binary infix operator that accepts two expressions. Either operand can be omitted (examples below), and they have to be convertible to System.Index. It will be lowered to the appropriate System.Range factory method call.
In example above for creating new Range structure we copied the information from Index(2, true) and Index(0, true) (where Index(0, true) mean .Length of your collection)
var lazyDog = words[^2..^0];
So, compiler generated for you code like this:
string[] subArray2 = RuntimeHelpers.GetSubArray(array, new Range(new Index(2, true), new Index(0, true)));
Here you can find how this GetSubArray function is implemented:
public static T[] GetSubArray<T>(T[] array, Range range)
{
    if (array == null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array);
    }

    (int offset, int length) = range.GetOffsetAndLength(array.Length);

    if (default(T)! != null || typeof(T[]) == array.GetType()) // TODO-NULLABLE: default(T) == null warning (https://github.com/dotnet/roslyn/issues/34757)
    {
        // We know the type of the array to be exactly T[].

        if (length == 0)
        {
            return Array.Empty<T>();
        }

        var dest = new T[length];
        Buffer.Memmove(
            ref Unsafe.As<byte, T>(ref dest.GetRawSzArrayData()),
            ref Unsafe.Add(ref Unsafe.As<byte, T>(ref array.GetRawSzArrayData()), offset),
            (uint)length);
        return dest;
    }
    else
    {
        // The array is actually a U[] where U:T.
        T[] dest = (T[])Array.CreateInstance(array.GetType().GetElementType()!, length);
        Array.Copy(array, offset, dest, 0, length);
        return dest;
    }
}

Net core compiler (roslyn) under the hood uses Array.Copy to clone your information from one array to another.
Range structure has three important properties called All, End, Start.
var allWords = words[..]; // contains "The" through "dog".
var firstPhrase = words[..4]; // contains "The" through "fox"
var lastPhrase = words[6..]; // contains "the", "lazy" and "dog"

And here you will see how compiler uses properties described above for generated code.
string[] subArray3 = RuntimeHelpers.GetSubArray(array, Range.All);
string[] subArray4 = RuntimeHelpers.GetSubArray(array, Range.EndAt(4));
string[] subArray5 = RuntimeHelpers.GetSubArray(array, Range.StartAt(6));
These EndAt() and StartAt() methods are called the property, which I have mentioned above.
/// <summary>Create a Range object starting from start index to the end of the collection.</summary>
public static Range StartAt(Index start) => new Range(start, Index.End);

/// <summary>Create a Range object starting from first element in the collection to the end Index.</summary>
public static Range EndAt(Index end) => new Range(Index.Start, end);
Range constructor has two Index parameters which allow you to setup start and end index range.
/// <summary>Construct a Range object using the start and end indexes.</summary>
/// <param name="start">Represent the inclusive start index of the range.</param>
/// <param name="end">Represent the exclusive end index of the range.</param>
public Range(Index start, Index end)
{
    Start = start;
    End = end;
}
Here are a few examples how you can combine and use your Range in C#.
Range phrase = 1..4;
Range phrase1 = 1..^3;
Range phrase2 = ^3..^1;
Range slice1 = 2..^3;               //Range.Create(2, Index.CreateFromEnd(3))
Range slice2 = ..^3;                //Range.ToEnd(Index.CreateFromEnd(3))
Range slice3 = 2..;                 //Range.FromStart(2)
Range slice4 = ..;                  //Range.All
You need to know some important things about Index and Range support in existing library types.

Implicit Index support
If you want to use Index implicitly in you code, your .net code should meet the following criteria:
  • The type is Countable.
  • The type has an accessible instance indexer which takes a single int as the argument.
  • The type does not have an accessible instance indexer which takes an Index as the first parameter. The Index must be the only parameter or the remaining parameters must be optional.

By Countable type Microsoft means the type which support has property named Length or Count with an accessible getter and a return type of int.

Implicit Range support
If you want to use Range implicitly in you code, you .net code should meet the following criteria:
  • The type is Countable.
  • The type has an accessible member named Slice which has two parameters of type int.
  • The type does not have an instance indexer which takes a single Range as the first parameter. The Range must be the only parameter or the remaining parameters must be optional.

For such types, the language will bind as if there is an index member of the form T this [Range range] where T is the return type of the Slice method including any ref style annotations. The new member will also have matching accessibility with Slice. Below you will find example provided by Microsoft how you can implement your type for supporting Range type.
C# language has special cases of the following known types:
  • string: the method Substring will be used instead of Slice.
  • array: the method System.Reflection.CompilerServices.GetSubArray will be used instead of Slice.

So, if you want to create your own collections and want them to support Index and Range, your collection must be countable and support slicing (methods Slice(), Substring(), GetSubArray() or indexer T this[Range range]).

Conclusions:
The new operators (^ and ..) are syntactic sugar. The functionality can be implemented by explicit calls to Index and Range factory methods, but it will result in a lot more boilerplate code, and the experience will be unintuitive. In addition, I would like to say that C# tries to grab the best things from functional languages to C# to improve development experience. I like this “syntactic” sugar and I will definitely use this syntax to improve existing codebase. 

No comments:

Post a Comment