I thought i would post this note after modifying a composite index that was being overruled by the CBO on cost grounds.
The environment is Oracle 64-Bit 10.2.0.4 on Solaris 64-bit.
The sql in question is something like this
select ..... from X where FILE_ID =:1 and STATUS = :2
There are 2 indexes on the table which could be used for this query
- INDEX_FILE_ID on X(FILE_ID) - INDEX_FILE_ID_STATUS_ISD on X(FILE_ID, STATUS, ISD)
The table stats are like this
Avg
Free Avg
Space Row
In A Len No Of Empty
Block gth Rows Blocks Blocks
----- ------ ----------- ---------- ------
0 172 16858045 589525 0
The column stats are like this
Ave
NU Col
LL Num Den Num umn
AB Dist Num si Buck Len
Column Name LE inct Nulls ty ets gth
------------------------- -- ------------ ------------ ---------------- ---- ----
FILE_ID Y 422,169 0 .0000023687196 1 6
STATUS N 3 0 .3333333333333 1 2
ISD N 4,525,479 0 .0000002209711 1 11
Here are the plans
select .....
from X
where FILE_ID =:1
and STATUS = :2
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 13 | 2236 | 7 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| X | 13 | 2236 | 7 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | INDEX_FILE_ID | 41 | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("X"."STATUS"=:2)
2 - access("X"."FILE_ID"=TO_NUMBER(:1))
select /*+INDEX(X INDEX_FILE_ID_STATUS_ISD)*/
.....
from X
where FILE_ID =:1
and STATUS = :2
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 13 | 2236 | 17 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| X | 13 | 2236 | 17 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | INDEX_FILE_ID_STATUS_ISD | 13 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("X"."FILE_ID"=TO_NUMBER(:1) AND "X"."STATUS"=:2)
The range scan costs the same in both the plans.
The Table Access cost is very low for the first plan even though the it has to look up 41 rows compared to 13 in the second plan.
Its easy to figure out that the cost difference has something to do with Clustering Factor.
Following are the index stats
Avg Avg
% Leaf Data
B Cluste Dist Blocks Blocks
Lev Leaf -ring Number inct Per Per
Index Name Index Type Status -el Blocks factor Of Rows keys Key Key
------------------------- ---------- ---------- --- -------- ---------- ---------- ------- ------ -------
INDEX_FILE_ID NORMAL VALID 3 154199 1237333 22618792 1.78 1 3
INDEX_FILE_ID_STATUS_ISD NORMAL VALID 3 249404 20071519 22772761 98.78 1 1
Lets start with the plan which shows index range scan on INDEX_FILE_ID_STATUS_ISD.
The CBO estimates that the after applying the predicate on FILE_ID & STATUS, 13 rows would be retrieved.
Due to the high clustering factor of INDEX_FILE_ID_STATUS_ISD the 13 rows would be located on 13 different table blocks and hence a cost of 13.
Now for the plan which shows index range scan on INDEX_FILE_ID.
The CBO estimated that after applying the predicate on FILE_ID, 41 rows would be retrieved.
Due to the low clustering factor on the INDEX_FILE_ID index, it needs to access only 3 distinct table blocks and hence a cost of 3.
We could easily find out the oddity of this logic.
Thinking logically, if we use the INDEX_FILE_ID index, we have to access all the table blocks (if not more) accessed by INDEX_FILE_ID_STATUS_ISD index.
So the clustering factor should be ignored in cost comparision of these 2 indexes.
We could understand if the composite index is rejected based on higher range scan cost.
The CBO is just a piece of code and can be forgiven for figuring out this logic.
Also its not quite common to see a column repeatedly indexed.
But if such scenarios are common, i can think of an additional logic to be added to the CBO wherein,
if there are composite column predicates with superset/subset possibilities on the same set of columns,
the CBO should ignore clustering factor in determining the index path and will always use the superset index based only on the index access cost.
Hope this note also serves as a caution to DBA’s from cramming more than “needed” columns in an index.
The high clustering factor of INDEX_FILE_ID_STATUS_ISD was actually caused by ISD column.
Dropping ISD column from the index resolved the issue.