Oracle

August 9, 2010

PLAN_HASH_VALUE Limitations – PQ_DISTRIBUTE & skew in data distribution across parallel slaves

Filed under: Uncategorized — srivenu @ 12:04 pm

Recently I read an article entitled “Is plan_hash_value a final say?” on Riyaj Shamsudeen’s blog. As usual in his lucid style, Riyaj Shamsudeen has highlighted the shortcomings of PLAN_HASH_VALUE and why it shouldnt be relied upon while comparing plans.

I wanted to jot down my own experiences and how i came to realsize this fact. A couple of years back I worked in a BI DBA team managing a decently sized datawarehouse of around 200-300TB. I faced some serious performance issues in parallel queries. Mostly we used to have a fixed set of jobs. And somedays our jobs used to take more time than usual – sometimes even 5 – 6 times more than usual.

The tables involved in the sql were composite partitioned with range partitions on CIRCLE & DATE and hash subpartitioned on PRODUCT_ID, CUSTOMER_ID etc. The data in our tables was highly skewed across CIRCLE’s with some old circles having abnormally huge data compared to the new ones. Lets say table A is range partitioned on CIRCLE & DATE and hash sub partitioned on PRODUCT. Lets say B was hash partitioned on PRODUCT. The sql is something like this

select a.col1, a.col2, b.col2
from a, b
where a.circle=b.circle
and a.product_id = b.product_id;

The plan was something like this

           HASH JOIN               |
            PX RECEIVE             |
             PX SEND HASH          | :TQ10000
              PX BLOCK ITERATOR    |
               TABLE ACCESS FULL   | A
            PX RECEIVE             |
             PX SEND HASH          | :TQ10001
              PX BLOCK ITERATOR    |
               TABLE ACCESS FULL   | B

I had scheduled a job which used to capture a lot of session details for all the sql’s running in the database for more than 15 minutes. The information captured included – sql stats, parallel slaves, workarea stats, plan details etc. On comparing the different runs, I did not find anything unsual, both the good and bad runs used to had the same plan hash values, stats (like disk reads, buffer gets etc). Both the good and bad runs showed the same plan as above.

The clincher came from the V$PQ_TQSTAT output. The bad run clearly showed a skew in the data distribution across the parallel slaves while the good runs had a good equal distribution of data. On close observation of the complete plan (using the display option’s dbms_xplan.display(null,null,’all’,null)) I found the issue in the section “Column Projection Information (identified by operation id):”

The good plans had this column projection info

– for the lines “PX RECEIVE”

“A”.”PRODUCT_ID” [VARCHAR2,50], “A”.”CIRCLE” [VARCHAR2,50],
“B”.”PRODUCT_ID” [VARCHAR2,50], “B”.”CIRCLE” [VARCHAR2,50],

for the lines “PX SEND HASH”

(#keys=2) “A”.”PRODUCT_ID” [VARCHAR2,50], “A”.”CIRCLE” [VARCHAR2,50],

for the line “HASH JOIN”

(#keys=2) “A”.”PRODUCT_ID” [VARCHAR2,50], “A”.”CIRCLE” [VARCHAR2,50],

The bad plans had this column projection info

– for the lines “PX RECEIVE”

“A”.”CIRCLE” [VARCHAR2,50], “A”.”PRODUCT_ID” [VARCHAR2,50],
“B”.”CIRCLE” [VARCHAR2,50],”B”.”PRODUCT_ID” [VARCHAR2,50],

for the lines “PX SEND HASH”

(#keys=2) “A”.”CIRCLE” [VARCHAR2,50], “A”.”PRODUCT_ID” [VARCHAR2,50],

for the line “HASH JOIN”

(#keys=2) “A”.”CIRCLE” [VARCHAR2,50], “A”.”PRODUCT_ID” [VARCHAR2,50],

The distribution of data across the slaves in the order of CIRCLE & PREDICATE has caused only 1 or a few slaves to do most of the work as only 1 or few circles had large amount of data. I raised a TAR and got this fixed. But I became aware of the limitation that the way data is distributed across the parallel slaves in case of a multi column table join, the key order is not reflected in the PLAN_HASH_VALUE.

For the benefit of those who are facing the same issue, there is one workaround i stumbled upon.

I faced the same issue recently on 10.2.0.4 on SUN SPARC SOLARIS. I was able to workaround the issue by changing the order of predicates (i did not use any ORDERED_PREDICATES hint)

I had an issue with the sql written like this

select a.col1, a.col2, b.col2
from a, b
where a.circle=b.circle
and a.product_id = b.product_id.

But when I rewrote the sql like this, I got the right order of keys in the join.

select a.col1, a.col2, b.col2
from a, b
where a.product_id = b.product_id
and a.circle=b.circle;

So looks like the order of predicates does have some relevance ! (even without the ORDERED_PREDICATES hint)

About these ads

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: