July05

LINQ Group By with NULL database values

LINQ is fantastic for the ability to write queries that express intent much more clearly than the same SQL, or structured code. One problem that I have run into though is handling NULL database values that are part of a group by statement.

Grouping by ProductSKU

Grouping in LINQ allows you to return sets of data from a collection for a given key value. The group by clause is what the key ends up being in the result set. Let's take a grouping of the Products by the SKU.

Collapse | Copy Code
from p in Products
    group p by p.ProductSKU

Enumerable IGrouping Collection

This results from the group by are enumerable groups (IGrouping<String, Product>) with the String being the Key for the groups (the ProductSKU field from the table). The typical way you walk through this result is a nested for loop.

Collapse | Copy Code
var groups = from p in Products
    group p by p.ProductSKU;
    
foreach( var groupentry in groups )
{
    Console.WriteLine( "Group: {0}", groupentry.Key );
    
    foreach( var groupitem in groupentry )
    {
        Console.WriteLine("Product: {0}", groupitem.ProductSKU);
    }
}  

I end up with a list that looks something like this:

Collapse | Copy Code
Group: VDB4DBA
Product: VDB4DBA
Group: VDB4DMW
Product: VDB4DMW
Group: VDB4PARTNER
Product: VDB4PARTNER

This works, but not what I really wanted. In this case, the first four characters of the SKU are the same per product family (VDB4 for all VistaDB 4 SKUs). I would like to be able to group by only those first four characters instead of the complete ProductSKU. You can do this with the following code:

Collapse | Copy Code
from p in Products
    group p by p.ProductSKU.Substring(0, 4)

What If There Are NULL Entries?

But what happens if there is a NULL entry in the ProductSKU? You get a ConstraintException: The property cannot be set to a null value.

Ternary and Null Coalescing Operators to the Rescue

There are two operators you can use to modify the null values into something you can use. In SQL, you would use the COALESCE or ISNULL operations, these are pretty close matches.

The ternary operator is a shortcut for:

Collapse | Copy Code
if( condition ) then (true code) : (false code)

The null coalescing operator is used to define a default value if the variable is null.

Collapse | Copy Code
variable = ( condition ) ?? ( defaultvalue)

The code to use both of these follows:

Collapse | Copy Code
var groups = from p in Products
    group p by p.ProductSKU == null ? "<null>" : p.ProductSKU.Substring(0, 4);

var groups2 = from p in Products
    group p by p.ProductSKU.Substring(0, 4) ?? "<null>";  // FAILS

In this case, the ternary operator is the only one that will work. This is because the test is independent of the operation. The second example above will crash with the same constraint exception because the ProductSKU.Substring is attempted to be evaluated first, and substring on a null doesn’t work!

The null coalescing operator would work if we only wanted to test if the ProductSKU was null, but in this case the ternary is the only way to get the desired result.

Final Result

So the final result after the ternary operator looks like this:

Collapse | Copy Code
Group: VDB3
Product: VDB3SRC
Group: VDB4
Product: VDB4DMW
Product: VDB4PROB
Product: VDB4CORE
Product: VDB4ASPPAK
Product: VDB4DBA

Now I have cleaner groups like I wanted without having to write string parsing after the query.

Summary

LINQ has a very expressive syntax that allows you to do some amazing queries without resorting to SQL.

Group by can also be used on composite keys (more than one column) by projecting into an anonymous type. Maybe I will leave that for another post.

Comments are closed