Oracle

December 25, 2011

Optimizer might ignore a more suitable superset composite index

Filed under: Uncategorized — srivenu @ 9:56 am

I thought i would post this note after modifying a composite index that was being overruled by the CBO on cost grounds.
The environment is Oracle 64-Bit 10.2.0.4 on Solaris 64-bit.

The sql in question is something like this


select 	.....
from 	X
where 	FILE_ID =:1 
and     STATUS = :2

There are 2 indexes on the table which could be used for this query

- INDEX_FILE_ID 		on X(FILE_ID) 
- INDEX_FILE_ID_STATUS_ISD	on X(FILE_ID, STATUS, ISD)

The table stats are like this


  Avg
 Free    Avg                                          
Space    Row                              
 In A    Len       No Of             Empty
Block    gth        Rows     Blocks Blocks
----- ------ ----------- ---------- ------
    0    172    16858045     589525      0

The column stats are like this


                                                                              Ave
                          NU                                                  Col
                          LL          Num                           Den  Num  umn
                          AB         Dist          Num               si Buck  Len
Column Name               LE         inct        Nulls               ty  ets  gth
------------------------- -- ------------ ------------ ---------------- ---- ----
FILE_ID                   Y       422,169            0   .0000023687196    1    6
STATUS                    N             3            0   .3333333333333    1    2
ISD                       N     4,525,479            0   .0000002209711    1   11

Here are the plans


select 	.....
from 	X
where 	FILE_ID =:1 
and     STATUS = :2

----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |    13 |  2236 |     7   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID|             X        |    13 |  2236 |     7   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | INDEX_FILE_ID        |    41 |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"."STATUS"=:2)
   2 - access("X"."FILE_ID"=TO_NUMBER(:1))



select 	/*+INDEX(X INDEX_FILE_ID_STATUS_ISD)*/
	.....
from 	X
where 	FILE_ID =:1 
and     STATUS = :2


--------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                          |    13 |  2236 |    17   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID|                       X  |    13 |  2236 |    17   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | INDEX_FILE_ID_STATUS_ISD |    13 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"."FILE_ID"=TO_NUMBER(:1) AND "X"."STATUS"=:2)


The range scan costs the same in both the plans.
The Table Access cost is very low for the first plan even though the it has to look up 41 rows compared to 13 in the second plan.
Its easy to figure out that the cost difference has something to do with Clustering Factor.

Following are the index stats

 

                                                                                              Avg     Avg
                                                                                         %   Leaf    Data
                                                 B               Cluste               Dist Blocks  Blocks
                                                Lev     Leaf      -ring     Number    inct    Per     Per
Index Name                Index Type Status     -el   Blocks     factor    Of Rows    keys    Key     Key
------------------------- ---------- ---------- --- -------- ---------- ---------- ------- ------ -------
INDEX_FILE_ID             NORMAL     VALID        3   154199    1237333   22618792    1.78      1       3
INDEX_FILE_ID_STATUS_ISD  NORMAL     VALID        3   249404   20071519   22772761   98.78      1       1

Lets start with the plan which shows index range scan on INDEX_FILE_ID_STATUS_ISD.
The CBO estimates that the after applying the predicate on FILE_ID & STATUS, 13 rows would be retrieved.
Due to the high clustering factor of INDEX_FILE_ID_STATUS_ISD the 13 rows would be located on 13 different table blocks and hence a cost of 13.

Now for the plan which shows index range scan on INDEX_FILE_ID.
The CBO estimated that after applying the predicate on FILE_ID, 41 rows would be retrieved.
Due to the low clustering factor on the INDEX_FILE_ID index, it needs to access only 3 distinct table blocks and hence a cost of 3.

We could easily find out the oddity of this logic.
Thinking logically, if we use the INDEX_FILE_ID index, we have to access all the table blocks (if not more) accessed by INDEX_FILE_ID_STATUS_ISD index.
So the clustering factor should be ignored in cost comparision of these 2 indexes.
We could understand if the composite index is rejected based on higher range scan cost.

The CBO is just a piece of code and can be forgiven for figuring out this logic.
Also its not quite common to see a column repeatedly indexed.
But if such scenarios are common, i can think of an additional logic to be added to the CBO wherein,
if there are composite column predicates with superset/subset possibilities on the same set of columns,
the CBO should ignore clustering factor in determining the index path and will always use the superset index based only on the index access cost.

Hope this note also serves as a caution to DBA’s from cramming more than “needed” columns in an index.
The high clustering factor of INDEX_FILE_ID_STATUS_ISD was actually caused by ISD column.
Dropping ISD column from the index resolved the issue.

About these ads

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: