среда, 4 апреля 2012 г.

The Linq SelectMany Operator


Most of you by now are familiar with LINQ, Microsoft’s foray into crossing the code/data impedance mismatch. Most of what we see in Linq translates directly into our knowledge of SQL, since most LINQ queries use very similar semantics to SQL queries. There are certain operators though that don’t look familiar because they either weren’t present in our SQL lexicon or they were represented in a fundamentally different way. A little bit back I posted about one of these operators, the “let” operator, and how to use it effectively. I later followed it up with apost that dug a little bit deeper into the “let” operator so that you could get a peek at what was going on behind the scenes.
Today I introduce to you another one of these operators that is even a bit more foreign to those coming from a SQL background, it is the SelectMany operator. This operator doesn’t have an explicit native syntax when dealing with LINQ queries, so we will first go over the operator using the LINQ extension methods and then we will show you how to achieve the same effect using the native C# LINQ syntax. Sound good? Good.
The SelectMany operator is described on MSDN as “Projects each element of a sequence to an IEnumerable<T>and flattens the resulting sequences into one sequence.” Well, that is actually a pretty good description, but until you see it work, it is hard to visualize it. Well, at least it was for me!
In order to first give an example of this operator we will start off with the same list of names that I used in my previous post about the “let” operator and we will select the list with both the “Select” and the SelectMany operator to see what happens.
var nameList = new List<string>
                   {
                       "Matt",
                       "Adam",
                       "John",
                       "Peter",
                       "Owen",
                       "Steve",
                       "Richard",
                       "Chris"
                   };
 
var names1 = nameList.Where(n => n.Length == 4)
    .Select(n => n).ToList();
 
names1.ForEach(n => Console.WriteLine(n));
 
var names2 = nameList.Where(n => n.Length == 4)
    .SelectMany(n => n).ToList();                            
 
names2.ForEach(n => Console.WriteLine(n));
So, here you see the same two queries with only the Select and SelectMany swapped out. Then we write the results out to the console, so, what happens when this is executed (with a bit extra formatting code)?
image
What just happened there? The Select just returned the list of names that we filtered out. In fact, the select is entirely unneeded. If we left it off, then the query would execute identically, I only put it in there so you can see the difference. The SelectMany on the other hand got the list of names with four letters and since String is an IEnumable it then projects each String as an IEnumerable and then combines the results. So, as you can see, we got a single result with each character of each name in our list.
But how is this useful to us? Actually there are quite a few wonderful uses for this. Lets look at what we could do if we changed our example above to have several lists of names instead of just one.
var nameList = new List<List<string>>()
                   {
                       new List<string>
                           {
                               "Matt",
                               "Adam",
                               "John",
                               "Peter",
                               "Owen",
                               "Steve",
                               "Richard",
                               "Chris"
                           },
                       new List<string>
                           {
                               "Tim",
                               "Jim",
                               "Andy",
                               "Fred",
                               "Todd",
                               "Rob",
                               "Richard",
                               "Ted"
                           }
                   };           
You can now see that we have two different lists of names and how do we get the list of names with only four characters now? Without using SelectMany we could do this:
var names1 = nameList.Select(l => l.Where(n => n.Length == 4));
foreach (var list in names1)
{
    foreach (string name in list)
    {
        Console.WriteLine(name);
    }
}
Hmmm. You see where we are doing a sub query to select the names from each list. We then have a nested IEnumerable<ienumerable><string>> that we have to use nested loops in order to access our names. But with SelectMany we should be able to select our names from our list and avoid the nested IEnumerable.
var names2 = nameList.SelectMany(n => n)
    .Where(n => n.Length == 4)
    .Select(n => n).ToList();
 
names2.ForEach(n => Console.WriteLine(n));
Pretty easy! Here we are actually using the SelectMany operator first in order to flatten the lists and then we are filtering it using a Where statement. Again, the Select here is not required, but I have thrown it in to be more explicit. What I am doing above though may hide a little bit of what is happening. In order to better show that you are actually passing an IEnumerable into SelectMany I am going to show you an example where we are splitting up a few sentences about the people in our list above. What we are going to do is to have a list of sentences, and then split those sentences into words.
var sentences = new List<string> {"Bob is quite excited.", "Jim is very upset."};
This is our list of sentences, and now we need to get our individual words out of this:
var words = sentences.SelectMany(w => w.TrimEnd('.').Split(' ')).ToList();
Here we do a SelectMany on our list of strings, and then we are removing the period from the end of the sentence, and then we split them on a space and the result of the call to “Split” is what SelectMany then operates on. If we took the Split off then it would simply treat each item as a String and then we would end up enumerating over each character like we did in our first example.
I hope that this has cleared up the SelectMany operator for you, and so now we have just one last thing to do which is to show you how to do a SelectMany using the native C# LINQ syntax. But, how do you do a SelectMany if there is no native query operator for it? Well, you do it by chaining from statements. That may sound weird, but check it out, it actually works quite well.
var words = from s in sentences
            from w in s.TrimEnd('.').Split(' ')
            select w;
See? You are simply funneling the first “from” into the second “from”. The reason why this works so well is that it lets you nest even further very easily. For example, what if we wanted each individual character from the above query.
var characters = from s in sentences
                from w in s.TrimEnd('.').Split(' ')
                from c in w
                select c;
Now all we did was split the words, and then select each character out of our list of words. Pretty sweet, and it can allow you to drill down into multiple levels of data very very easily. To me it also gives a good visual flow as to what is happening, but you don’t get any indication that a new operator is being involved.
Well, I hope this helps clear things up a bit, and I hope that you get some good use out of this!