Oracle

December 14, 2009

Hash Join – LAST_EXECUTION in V$SQL_WORKAREA and NUMBER_PASSES in V$SQL_WORKAREA_ACTIVE

Filed under: Uncategorized — srivenu @ 1:39 pm

Most of us wouldn’t like to see an n-pass hash join in V$SQL_WORKAREA. We always strive to see that our hash joins complete in OPTIMAL or 1-PASS mode.  So does a 12-PASS hash join mean that we had iterated over the same data 12 times ? or we had iterated over (used) the hash memory 12 times ? As a corollary to the above question, does a 12-PASS hash join mean it was 12 times as costly or resource consuming or timetaking than a 1-PASS hash join ?

This note is just to explain how the number of passes are recorded for a hash join in V$SQL_WORKAREA & V$SQL_WORKAREA_ACTIVE.

LAST_EXECUTION in V$SQL_WORKAREA

The LAST_EXECUTION column in V$SQL_WORKAREA records the number of passes required by the Hash Join. At the start of the Join, the LAST_EXECUTION contains the value OPTIMAL for a newly loaded SQL. And this is how the number of passes is evaluated.

1) If the build table fit completely in memory, then the hash table is built from the build table and the probe table is scanned once to probe this in-memory hash table. End of Story. You see that the this type of execution is recorded as OPTIMAL. In this ideal case, we had processed the build and probe data only once.

if this is not the case, then

2) The Build table and Probe table data are split into partitions. If for each partition that the build and probe tables are split into, if the smallest of either of these fits completely in memory, it will complete in 1-PASS. ie for each of the spilled partitions, the hash table is built from the smallest partition and the other one is used to probe into it. In this case, we have processed the data (or atleast some of it) 2 times. The reason i said some of it is because if some partitions of the build table fit in memory during the initial hash table build phase, we build the hash table with those partitions and process those rows while scanning the probe data. We only partition and spill to disk those partitions of probe table that do not have matching partition table (of build table) in memory. So only the partitions that spilled to disk are processed twice.

If this is not the case, then

3) The worst case is the nested loops hash join. In this case, there is atleast 1 partition where the data from both build & hash tables is too large to fit in memory and hence requires to be processed multiple times in parts. I will use an example to illustrate this case.

Lets say we have 2 tables X1 & X2. X1 is our build table and X2 is the probe. Our hash join is fanned out 4 ways (ie 4 partitions) into partitions 0, 1, 2 & 3. We start scan of X1 and lets say all partitions except partition 1 spilled to disk. We build hash table for partition 1 and spill the other partitions to disk.

Lets say these are partition sizes for X1

200 blocks for partition 0,

300 blocks for partition 2 &

50 blocks for partition 3.

We start scanning X2 and we process all rows from X2 that get partitioned to 1.

So processing of partition 1 data complete in OPTIMAL mode.

The data from X2 thats partitioned into 0, 2 & 3 partitions now spills to disk.

Lets say these are partition sizes for X2

150 blocks for partition 0,

200 blocks for partition 2 &

300 blocks for partition 3.

Lets say our hash area size fits 80 blocks. It will now start processing the smallest of the partitions. So it will start with partition 3. It will be able to fit 50 blocks in memory so it will use partition 3 of X1 to create the hash table and will probe this hash table by scanning partition 3 X2.

So processing of partition 3 completes in 1-Pass.

But this is not the end of the story. It will now start start processing partition 0. Since the size of X1 partition 0 is smaller than X2 partition 0, a role reversal will occur and it will use partition 0 of X2 to build the hash table. But it will be able to fit only 80 blocks in memory, so it will create a hash table out of 80 blocks and will probe this hash table with the X1 partition 0. After this it will build a hash table with the remaining 70 blocks of X2 partition 0 and will probe this hash table by scanning X1 partition 0 once more.

So processing of partition 0 completes in 2-Passes.

Similarly processing of partition 2 completes in 3 passes.

So this is the final status –

Processing of partition 1 completed in OPTIMAL mode.

Processing of partition 3 completed in 1-Pass mode.

Processing of partition 0 completed in 2-Pass mode.

Processing of partition 2 completed in 3-pass mode.

But what is recorded for LAST_EXECUTION in V$SQL_WORKAREA ?

It is 3.

What is recorded is not the sum of passes across all partitions but the maximum of the passes faced across all partitions. So the Number of Passes is not the number of times we reused the hash area or number of times we scanned the whole data. I think we can say it is the number of times we had to iterate over the largest partitions data.

NUMBER_PASSES in V$SQL_WORKAREA_ACTIVE

If you had monitored a long running hash join, sometimes, you might have observed that the value in NUMBER_PASSES changes during the execution. It starts with 0 and steadily increases before falling back to 1 or 2 (once it crosses the value 2 it doesn’t go down below 2) and again steadily increases till it falls back to 1 or 2. This cycle is repeated till the join completes. The high values repeated for each cycle increases.

For ex – you might see the NUMBER_PASSES like this 0…1…2…3…2…3…4…5…2…3…4…5…6.

What is recorded in V$SQL_WORKAREA_ACTIVE is the Number of Passes within each partition. Since large partitions are processed at the end, you will see values getting larger towards the end of the join.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: