DB2 Load and Unload: How to Achieve Parallel Processing

The best way to LOAD is split your input data set into part and load individual data set into part of partition table space.

However, splitting your input data set sometimes is not an easy process and you may need to create a batch job to do the work using DFSORT, REXX, COBOL program.

DB2 11 deliveries a performance improvement for LOAD SHRLEVEL NONE when the input data is in a single input data set with the new PARALLEL option, If you specify the PARALLEL keyword, the LOAD utility can use multiple parallel subtasks, which can reduce the elapsed time for the load.

DB2 V11 LOAD and UNLOAD Enhancements:

LOAD SHRLEVEL NONE with PARALLEL option
LOAD SHRLEVEL CHANGE with PARALLEL option

This kind of parallelism is useful when the utility is CPU bound. It does not increase I/O parallelism. CPU bottlenecks are typical of CCSID conversion, numeric conversion, compression, complex data types and VARCHAR columns.

This kind of parallelism is also useful to overlap the synchronous I/Os of multiple indexes. Hence, the more indexes, the greater the benefit.

SHRLEVEL NONE appends to the end of the end of the table instead of trying to store the rows in cluster sequence. Free space searches are not a consideration. However, there is still overhead to insert keys into the index.

There can be synchronous I/Os and index splits. Performance is sensitive to whether or not the input data is sorted. If the input data is unsorted, parallelism is of greater value than it would be for presorted input.

The CPU overhead of parallelism is also somewhat higher for sorted input. Nevertheless, if the LOAD is the type that uses a lot of CPU time, parallelism will help reduce the elapsed time.

With a high degree of parallelism there will be an extra overhead on CPU time, so that should be factored in when determining the degree of parallelism specified with the PARALLEL keyword.

The recommendation is to specify PARALLEL 0 or PARALLEL without a number specified so DB2 can determine the most optimal degree of parallelism. Whether or not the input data is sorted also affects the performance, because the parallel tasks may suffer from contention.

Measurements show that LOAD SHRLEVEL NONE PARALLEL reduced up to 50% of elapsed time with unsorted data.