What is a Parallel Query?
Language-Integrated Query (LINQ) was introduced in the .NET Framework version 3.0 It features a unified model for querying any System.Collections.IEnumerable or System.Collections.Generic.IEnumerable
Through parallel execution, PLINQ can achieve significant performance improvements over legacy code for certain kinds of queries, often just by adding the AsParallel query operation to the data source. However, parallelism can introduce its own complexities, and not all query operations run faster in PLINQ. In fact, parallelization actually slows down certain queries. Therefore, you should understand how issues such as ordering affect parallel queries.
The remainder of this article gives an overview of the main PLINQ classes, and discusses how to create PLINQ queries. Each section contains links to more detailed information and code examples.
The ParallelEnumerable Class
The System.Linq.ParallelEnumerable class exposes almost all of PLINQ’s functionality. It and the rest of the System.Linq namespace types are compiled into the System.Core.dll assembly. The default C# and Visual Basic projects in Visual Studio both reference the assembly and import the namespace.
ParallelEnumerable includes implementations of all the standard query operators that LINQ to Objects supports, although it does not attempt to parallelize each one.
In addition to the standard query operators, the ParallelEnumerable class contains a set of methods that enable behaviors specific to parallel execution.
AsParallel: The entry point for PLINQ. Specifies that the rest of the query should be parallelized, if it is possible.
AsOrdered:Specifies that PLINQ should preserve the ordering of the source sequence for the rest of the query, or until the ordering is changed, for example by the use of an orderby (Order By in Vlsual Basic) clause.
Aggregate overload: An overload that is unique to PLINQ and enables intermediate aggregation over thread-local partitions, plus a final aggregation function to combine the results of all partitions.
The Opt-in Model
When you write a query, opt in to PLINQ by invoking the ParallelEnumerable.AsParallel extension method on the data source, as shown in the following example.
var source = Enumerable.Range(1, 10000); // Opt-in to PLINQ with AsParallel var evenNums = from num in source.AsParallel() where Compute(num) > 0 select num;
The AsParallel extension method binds the subsequent query operators, in this case, where and select, to the System.Linq.ParallelEnumerable implementations.
By default, PLINQ is conservative. At run time, the PLINQ infrastructure analyzes the overall structure of the query. If the query is likely to yield speedups by parallelization, PLINQ partitions the source sequence into tasks that can be run concurrently. If it is not safe to parallelize a query, PLINQ just runs the query sequentially. If PLINQ has a choice between a potentially expensive parallel algorithm or an inexpensive sequential algorithm, it chooses the sequential algorithm by default. You can use the WithExecutionMode
Degree of Parallelism
By default, PLINQ uses all of the processors on the host computer up to a maximum of 64. You can instruct PLINQ to use no more than a specified number of processors by using the WithDegreeOfParallelism
var query = from item in source.AsParallel().WithDegreeOfParallelism(2) where Compute(item) > 42 select item;
In cases where a query is performing a significant amount of non-compute-bound work such as File I/O, it might be beneficial to specify a degree of parallelism greater than the number of cores on the machine.
Ordered Versus Unordered Parallel Queries
In some queries, a query operator must produce results that preserve the ordering of the source sequence. PLINQ provides the AsOrdered operator for this purpose. AsOrdered is distinct from AsSequential
The following code example shows how to opt in to order preservation.
evenNums = from num in numbers.AsParallel().AsOrdered() where num % 2 == 0 select num;
Parallel vs. Sequential Queries
Some operations require that the source data be delivered in a sequential manner. The ParallelEnumerable query operators revert to sequential mode automatically when it is required. For user-defined query operators and user delegates that require sequential execution, PLINQ provides the AsSequential
Options for Merging Query Results
When a PLINQ query executes in parallel, its results from each worker thread must be merged back onto the main thread for consumption by a foreach loop (For Each in Visual Basic), or insertion into a list or array. In some cases, it might be beneficial to specify a particular kind of merge operation, for example, to begin producing results more quickly. For this purpose, PLINQ supports the WithMergeOptions
The ForAll Operator
In sequential LINQ queries, execution is deferred until the query is enumerated either in a foreach (For Each in Visual Basic) loop or by invoking a method such as ToList
var nums = Enumerable.Range(10, 10000); var query = from num in nums.AsParallel() where num % 10 == 0 select num; // Process the results as each thread completes // and add them to a System.Collections.Concurrent.ConcurrentBag(Of Int) // which can safely accept concurrent add operations query.ForAll((e) => concurrentBag.Add(Compute(e)));
PLINQ is integrated with the cancellation types in .NET Framework 4. (For more information, see Cancellation.) Therefore, unlike sequential LINQ to Objects queries, PLINQ queries can be canceled. To create a cancelable PLINQ query, use the WithCancellation
It is possible that a PLINQ query might continue to process some elements after the cancellation token is set.
For greater responsiveness, you can also respond to cancellation requests in long-running user delegates.
When a PLINQ query executes, multiple exceptions might be thrown from different threads simultaneously. Also, the code to handle the exception might be on a different thread than the code that threw the exception. PLINQ uses the AggregateException type to encapsulate all the exceptions that were thrown by a query, and marshal those exceptions back to the calling thread. On the calling thread, only one try-catch block is required. However, you can iterate through all of the exceptions that are encapsulated in the AggregateException and catch any that you can safely recover from. In rare cases, some exceptions may be thrown that are not wrapped in an AggregateException, and ThreadAbortExceptions are also not wrapped.
When exceptions are allowed to bubble up back to the joining thread, then it is possible that a query may continue to process some items after the exception is raised.
In some cases, you can improve query performance by writing a custom partitioner that takes advantage of some characteristic of the source data. In the query, the custom partitioner itself is the enumerable object that is queried.
[Visual Basic] Dim arr(10000) As Integer Dim partitioner = New MyArrayPartitioner(Of Integer)(arr) Dim query = partitioner.AsParallel().Select(Function(x) SomeFunction(x)) [C#] int arr= ...; Partitioner
partitioner = newMyArrayPartitioner (arr); var q = partitioner.AsParallel().Select(x => SomeFunction(x));
PLINQ supports a fixed number of partitions (although data may be dynamically reassigned to those partitions during run time for load balancing.). For and ForEach support only dynamic partitioning, which means that the number of partitions changes at run time.
Measuring PLINQ Performance
In many cases, a query can be parallelized, but the overhead of setting up the parallel query outweighs the performance benefit gained. If a query does not perform much computation or if the data source is small, a PLINQ query may be slower than a sequential LINQ to Objects query. You can use the Parallel Performance Analyzer in Visual Studio Team Server to compare the performance of various queries, to locate processing bottlenecks, and to determine whether your query is running in parallel or sequentially.