January19

PLINQ and .Net Parallel Programming resources

Seems there are more and more articles popping up on PLINQ and just multi core programming in general.  Visual Studio Magazine has a great summary of the current state of affairs in multicore programming.

Key to the article is that just enabling parallel code doesn’t always make it run faster.  There are dependencies, deadlocks, and all sorts of other problems that can occur.  Programmers have to know how to take advantage of the various technologies, and shift some thinking to new design patterns as well.

There are many technologies in .Net 4 including the following list:

More...

July05

LINQ Group By with NULL database values

LINQ is fantastic for the ability to write queries that express intent much more clearly than the same SQL, or structured code. One problem that I have run into though is handling NULL database values that are part of a group by statement.

Grouping by ProductSKU

Grouping in LINQ allows you to return sets of data from a collection for a given key value. The group by clause is what the key ends up being in the result set. Let's take a grouping of the Products by the SKU.

Collapse | Copy Code
from p in Products
    group p by p.ProductSKU

Enumerable IGrouping Collection

This results from the group by are enumerable groups (IGrouping<String, Product>) with the String being the Key for the groups (the ProductSKU field from the table). The typical way you walk through this result is a nested for loop.

Collapse | Copy Code
var groups = from p in Products
    group p by p.ProductSKU;
    
foreach( var groupentry in groups )
{
    Console.WriteLine( "Group: {0}", groupentry.Key );
    
    foreach( var groupitem in groupentry )
    {
        Console.WriteLine("Product: {0}", groupitem.ProductSKU);
    }
}  

I end up with a list that looks something like this:

Collapse | Copy Code
Group: VDB4DBA
Product: VDB4DBA
Group: VDB4DMW
Product: VDB4DMW
Group: VDB4PARTNER
Product: VDB4PARTNER

This works, but not what I really wanted. In this case, the first four characters of the SKU are the same per product family (VDB4 for all VistaDB 4 SKUs). I would like to be able to group by only those first four characters instead of the complete ProductSKU. You can do this with the following code:

Collapse | Copy Code
from p in Products
    group p by p.ProductSKU.Substring(0, 4)

What If There Are NULL Entries?

But what happens if there is a NULL entry in the ProductSKU? You get a ConstraintException: The property cannot be set to a null value.

Ternary and Null Coalescing Operators to the Rescue

There are two operators you can use to modify the null values into something you can use. In SQL, you would use the COALESCE or ISNULL operations, these are pretty close matches.

The ternary operator is a shortcut for:

Collapse | Copy Code
if( condition ) then (true code) : (false code)

The null coalescing operator is used to define a default value if the variable is null.

Collapse | Copy Code
variable = ( condition ) ?? ( defaultvalue)

The code to use both of these follows:

Collapse | Copy Code
var groups = from p in Products
    group p by p.ProductSKU == null ? "<null>" : p.ProductSKU.Substring(0, 4);

var groups2 = from p in Products
    group p by p.ProductSKU.Substring(0, 4) ?? "<null>";  // FAILS

In this case, the ternary operator is the only one that will work. This is because the test is independent of the operation. The second example above will crash with the same constraint exception because the ProductSKU.Substring is attempted to be evaluated first, and substring on a null doesn’t work!

The null coalescing operator would work if we only wanted to test if the ProductSKU was null, but in this case the ternary is the only way to get the desired result.

Final Result

So the final result after the ternary operator looks like this:

Collapse | Copy Code
Group: VDB3
Product: VDB3SRC
Group: VDB4
Product: VDB4DMW
Product: VDB4PROB
Product: VDB4CORE
Product: VDB4ASPPAK
Product: VDB4DBA

Now I have cleaner groups like I wanted without having to write string parsing after the query.

Summary

LINQ has a very expressive syntax that allows you to do some amazing queries without resorting to SQL.

Group by can also be used on composite keys (more than one column) by projecting into an anonymous type. Maybe I will leave that for another post.

June26

Speed up blocking functions with PLINQ

PLINQ DOP Speedup Comparison

 

I have been studying the new PLINQ and Parallel Task Library in .NET 4 looking for various ways to do things that we can’t do in .NET 2.0. PLINQ is huge, and there are a lot of new ways to do multi threaded programming using .NET 4. In this article, I want to cover a particular problem I have had many times over the years.  How do you speed up multithreaded apps that are bound by blocking functions, or long running I/O operations?

I started looking at this method to speed up some long running file I/O routines deep in the VistaDB engine. Most of the time, we are blocked in reads from disk before we can continue working, but usually we have part of the blocks we need.  So we could start working, and then continue when the rest of the blocks are loaded. Adding that logic is complex and prone to error with traditional threading code. Fortunately PLINQ has a way to make some of these types of operations very simple. 

Reading Multiple Websites

For this example, I am going to read the first page of 8 websites and then act on that information afterwards. This is the type of very simple parallel operation that splits up really well. But these types of long running reads are very similar to what happens in many applications.

Side Note on C# 4.0 In a Nutsell Book

I actually adopted this example from one given in Joseph Albahari’s book C# 4.0 In a Nutshell (he is also the author of the excellent LinqPAD). Weighing in a 1000 pages is not exactly a Nutshell, but it is a fantastic book for developers who already know C# and just want to go through C# and CLR 4. The concepts in the book cover older versions of .NET as well, but the juicy parts for me were all the new changes.

LINQ Expression

Ok, this expression will go to 8 websites in this list and get the first page of each.  The content length of the page and the content type are then stored in a variable to be used outside of the parallel computation later.

Collapse | Copy Code
static void Main(string[] args)
{
    Stopwatch sw = new Stopwatch();
    sw.Start();

    var results = from site in new[]
    {
        "http://infinitecodex.com",
        "http://www.vistadb.net",
        "http://stackoverflow.com",
        "http://cornerstonedb.com",
        "http://www.bing.com/",
        "http://www.linqpad.net",
        "http://www.cnn.com",
        "http://www.microsoft.com"
     }
     let p = WebRequest.Create( new Uri(site)).GetResponse()
         select new
         {
             site,
             Length = p.ContentLength,
             ContentType = p.ContentType
         };

     foreach (var result in results)
     {
         Console.WriteLine("{0}:{1}:{2}", 
             result.site, result.Length, result.ContentType);
     }

     sw.Stop();

     Console.WriteLine("Total Time: {0}ms", sw.ElapsedMilliseconds);          
}

The initial runs were done with no Parallel extensions being used. Just go through each site and get the first page, storing the ContentLength and the ContentType in the temp variable p. Afterwards, I foreach over the results to output them to a command line. If you take this step out, nothing actually happens because of deferred execution in LINQ (you have to do something with the collection before it is really run). I wrapped all of this in a Stopwatch so I would know how long it took. The graph at the top of this article are the 3 fastest times I received after running each method 10 times.

Three fastest times normal execution (ms):  1916, 2103, 1992.

Adding Parallel (PLINQ)

Now, let's make this use PLINQ and see if it runs faster.

The only change we have to make is to add a single line of code above the let statement like this.

}
.AsParallel()
let p = WebRequest.Create( new Uri(site)).GetResponse()

That’s it, and the entire LINQ query will now run parallel.  It is faster, but not as fast as we can get it.

Three fastest times with AsParallel() (ms): 745, 790, 814.

What PLINQ is doing under the hood is creating a thread pool and spinning up 4 threads on my 4 core machine. But what it doesn’t know is that each of these operations are blocking waiting on I/O from the website. PLINQ assumes that each thread will have a moderate amount of CPU work to do, so it prevents spinning up a lot of threads that would just overwhelm the CPU.

How can we tell the .NET framework that each of these parallel operations are not CPU intensive?

WithDegreeOfParallelism

From the MSDN help:  WithDegreeOfParallelism<TSource> - Degree of parallelism is the maximum number of concurrently executing tasks that will be used to process the query.

Now that doesn’t exactly explain in plain English that you can use this to tell the framework the task is not CPU intensive. Technically you are overriding the default behavior of PLINQ and telling it you know how many of these should be allowed to run concurrently. 

In this case, I am going to set 8 because I know that two of these objects per CPU core is not going to tax my system at all. The maximum you can set is 64. Now each of these thread pools will attempt to run more than 1 thread at a time. Why can we do this without incurring a lot of task switching overhead? Because the objects are all blocked in I/O. The OS will put them to sleep and release the CPU for other tasks to run anyway, we are just going to give each of those tasks more work to keep them a little busier.

Again, a single line change to the first query is all that is needed:

}
.AsParallel().WithDegreeOfParallelism(8) // HERE
let p = WebRequest.Create( new Uri(site)).GetResponse()

Three fastest times with 8 Parallelism set (ms): 543, 578, 589.

That is 3.5 times faster than the original query with one line code changed!

Summary

PLINQ and the .NET 4 framework give you a lot of power to speed up parallel operations very easily. In my page manager application, I was able to get a 4.5x improvement in the page cache manager through the techniques listed in this article. And through changing my queue mechanisms over to the new Concurrent classes, I was able to eliminate a lot of dead time wasted on locking and gained even more performance, but that is another blog post at some point in the future.

May13

Entity Framework 4 New Operations

There are some new operations in .Net 4 Entity Framework, this a quick example of each working with VistaDB 4 and Visual Studio 2010.  These changes were mostly made to bring LINQ to Entities inline with the other LINQ providers in .Net.  Of the list below I think that Single() was the one that most people were confused about because if you used it you would get weird errors that didn’t make a lot of sense.

For a complete list of LINQ to Entities operators visit the Supported and Unsupported LINQ Methods on MSDN.

Entity Framework in .Net 4

The operations I want to demonstrate are the new Contains(), Single(), SingleOrDefault(), and DefaultIfEmpty().

SimpleTable for Entity Framework Example These operations are all new in .Net 4, and yes they work with VistaDB 4.  I started with a simple one table database called Feedback.  The only table has 3 columns:

  • int FeedbackID
  • NText FeedbackText
  • DateTime FeedbackDate

I added a few small text entries, including “I like grapes'” to search against with the Contains() operator.

More...

February25

LINQPad helps you learn LINQ

Have you tried to use LINQ to query a databaseLINQPAD Logo using Visual Studio?  It can be a frustrating experience of things that compile fail at runtime, and that edit / compile / test cycle can quickly lead to hours of lost time trying to get a single complex query to work correctly.

LINQPad is an editor for LINQ?

LINQPad is sort of like a Notepad, you can write and edit .linq files using it.  But that is where the Notepad similarity stops. You can execute your LINQ queries and see the results without having to run your application.  It is truly something you have to watch in order to believe how much more productive it can make your writing of LINQ.

How I found LINQPad

When I was first working on the VistaDB product store and account manager writing LINQ queries against my Entity Framework objects was incredibly frustrating.  Most of the documentation and samples I found was for Linq2Sql, which is similar syntax… But not the same. And worse, most of the syntax compiles fine, but blows up at runtime with cryptic error messages.

My typical dev and test cycle was around 5 minutes to compile the DAL / Site, login to the account, navigate to the correct page, and visit the yellow screen of death from asp.net.  It was not fun, so I started working in a stand alone tester app I built just for this purpose.  Write the query, compile, debug, step, step, read exception and try to decipher it.

I was reading a post on Stack Overflow about one of those cryptic errors when someone suggested to the poster they use LINQPad to test their queries first before putting them into their apps.  Wow, what a great idea!  Where was this tool?  (You can download it for free from Linqpad.net )

LINQPad to the rescue

LINQPad allows you to execute single LINQ commands against an existing EF model, or even to write dot net code in the editor and execute it like a little dynamic dot net environment.  The main application is free, but there is an auto-complete feature to the editor that you must pay in order to activate. Believe me it is worth it to pay for the license, you also support the author and show him the application is worth money.  The license is very inexpensive and well worth the price in order to get intellisense like behavior on your LINQ queries.

More...