Oracle

January 30, 2012

SCHED_NOAGE and latch contention

Filed under: Uncategorized — srivenu @ 6:29 am

This was a performance issue i faced on a billing database of one of our telecom client’s. The server was having sporadic cpu spikes for 2-3 minutes before things come back to normal. The environment was 64-bit Oracle 10.2.0.4 on HP-UX ia64 B.11.31. The normal cpu usage was normally around 40-50% and during issue time, the cpu run queue doubles or trebles and usage peaks at 100%. As per the vendor recommendation, the memory comprised of 100GB buffer cache and 60GB shared pool!

I did an ASH analysis for the reported issue time and found out that the waits were mostly for “cursor: pin S”, “latch: row cache objects”, “log file sync” and “latch: library cache”. I also had the chance of monitoring the system during issue time. The “cursor: pin S” waits were across several sql’s. Many of the sql showing in this wait were simple and with only a single child. This raised my curiosity as this isn’t normal.
V$EVENT_HISTOGRAM had the following data (This data is after a recent instance bounce and after the first occurence of the issue).



           Event
    EVENT# Name                                     WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
       617 cursor: pin S                                          1    7052795
       617 cursor: pin S                                          2      16564
       617 cursor: pin S                                          4       1357
       617 cursor: pin S                                          8        438
       617 cursor: pin S                                         16        377
       617 cursor: pin S                                         32        478
       617 cursor: pin S                                         64        436
       617 cursor: pin S                                        128        910
       617 cursor: pin S                                        256        322
       617 cursor: pin S                                        512        118
       617 cursor: pin S                                       1024         81
       617 cursor: pin S                                       2048         52
       617 cursor: pin S                                       4096         30
       617 cursor: pin S                                       8192          8



           Event
    EVENT# Name                                     WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
       214 latch: library cache                                   1       3903
       214 latch: library cache                                   2        409
       214 latch: library cache                                   4        436
       214 latch: library cache                                   8        238
       214 latch: library cache                                  16        211
       214 latch: library cache                                  32        185
       214 latch: library cache                                  64        202
       214 latch: library cache                                 128        204
       214 latch: library cache                                 256        232
       214 latch: library cache                                 512        232
       214 latch: library cache                                1024        129
       214 latch: library cache                                2048         59
       214 latch: library cache                                4096          2

A quick search on the net revealed the blog entry from Tanel Poder “cursor: pin S waits, sporadic CPU spikes and systematic troubleshooting“. This matches exactly with the problem that this instance was facing.

The oracle processes were having different scheduling priorities at the os level.


priority 	count of oracle processes
		having that priority
--------	--------------------
134 		1
154 		1250
179 		1
188 		1
195 		1
196 		1
199 		1
200 		1
201 		1
207 		1
209 		2
213 		1
222 		1
225 		1
227 		1
229 		1
230 		1
232 		1
239 		3
240 		1
241 		4

There were around 50-100 logins per second during issue time. Also the huge SGA in this instance made the process creation even more costlier.

After implementing HPUX_SCHED_NOAGE, the cpu spikes were gone.

Following is the data post change


priority 	count of oracle processes
		having that priority
--------	--------------------
154 		2
178 		794
999999 		1




           Event
    EVENT# Name                                     WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
       617 cursor: pin S                                          1    5657072
       617 cursor: pin S                                          2       2889
       617 cursor: pin S                                          4       1177
       617 cursor: pin S                                          8        571
       617 cursor: pin S                                         16        228
       617 cursor: pin S                                         32         60
       617 cursor: pin S                                         64         32
       617 cursor: pin S                                        128         26



           Event
    EVENT# Name                                     WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
       214 latch: library cache                                   1     315419
       214 latch: library cache                                   2      45189
       214 latch: library cache                                   4      61475
       214 latch: library cache                                   8      40559
       214 latch: library cache                                  16      22104
       214 latch: library cache                                  32      14884
       214 latch: library cache                                  64      11129
       214 latch: library cache                                 128       7770
       214 latch: library cache                                 256       3419
       214 latch: library cache                                 512       3475
       214 latch: library cache                                1024       5525
       214 latch: library cache                                2048      12220
       214 latch: library cache                                4096      15814
       214 latch: library cache                                8192        829
       214 latch: library cache                               16384         61

They have yet to reduce their shared pool substantially from the existing value of 60GB.

I really have no words for my admiration of the genious, work and contribution to the DBA community by the guru’s – Jonathan Lewis, Tanel Poder, Tom Kyte.

January 25, 2012

latch: row cache objects

Filed under: Uncategorized — srivenu @ 1:07 pm

I was looking into a system having serious performance issues. The environment was a 4-node RAC of 64-bit Oracle 10.2.0.4 on AIX 6.1.
Every few minutes or so, randomly on one or more nodes, the cpu spikes from around the regular 40% usage to 100% and the system grinds to a halt for a minute or two till normalcy returns. A quick look at the Session Waits during the issue showed around 100 sessions on each instance waiting for “latch: row cache objects”. The sql they were executing was similar, but not same due to literal usage.

The event “latch: row cache objects” was around 33% of the total DB time.

The row cache gets and misses were mainly for dc_histogram_defs & dc_object_ids


col cache# head "Cache|no" form 999
col parameter head "Parameter" form a25
col type head "Type" form a12
col subordinate# head "Sub|ordi|nate" form 9999
col rcgets head "Cache|Gets" form 999999999999
col rcmisses head "Cache|Misses" form 999999999999
col rcmodifications head "Cache|Modifica|tions" form 999999999999
col rcflushes head "Cache|Flushes" form 999999999999
col kqrstcln head "Child#" form 999
col lagets head "Latch|Gets" form 999999999999
col lamisses head "Latch|Misses" form 999999999999
col laimge head "Latch|Immediate|gets" form 999999999999

select	dc.kqrstcid CACHE#, dc.kqrsttxt PARAMETER, decode(dc.kqrsttyp, 1,'PARENT','SUBORDINATE') type,
	decode(dc.kqrsttyp, 2, kqrstsno, null) subordinate#, 
	dc.kqrstgrq rcgets, dc.kqrstgmi rcmisses, dc.kqrstmrq rcmodifications, dc.kqrstmfl rcflushes, dc.kqrstcln, 
	la.gets lagets, la.misses lamisses, la.immediate_gets laimge
from	x$kqrst 		dc,
	v$latch_children 	la
where	dc.inst_id = userenv('instance')
and 	la.child# = dc.kqrstcln
and 	la.name = 'row cache objects'
order 	by rcgets desc
/                                                                                                                                                           

                                      
                                               Sub                                                       
Cache                                         ordi         Cache         Cache        Latch          Latch
   no Parameter                 Type          nate          Gets        Misses         Gets         Misses
----- ------------------------- ------------ ----- ------------- ------------- ------------- -------------
   16 dc_histogram_defs         PARENT                1163261239        182775    3492413701     862439459
   11 dc_object_ids             PARENT                1091768860         37575    3275630976     221260636
    0 dc_tablespaces            PARENT                 440255133          1310    1320780266       8747309
    7 dc_users                  PARENT                 279457505           747     843421881       9823132
    2 dc_segments               PARENT                 178559327       1192786     545703514       8000261
    3 dc_rollback_segments      PARENT                 146295144          9143     437217211       6452501
    8 dc_objects                PARENT                  43116889         31984     129632484        943641
   15 dc_database_links         PARENT                  21401418            79      64204915         15514
    7 dc_users                  SUBORDINATE      1      10638770           323     843421881       9823132
   10 dc_usernames              PARENT                   8048301           856      24151948         24725
    7 dc_users                  SUBORDINATE      0       2809074           101     843421881       9823132
   19 dc_table_scns             PARENT                   2389350          2103       7208283         12624
   14 dc_profiles               PARENT                   1413630             7       4240943          2677
   16 dc_histogram_data         SUBORDINATE      0        716143         41435    3492413701     862439459
   16 dc_histogram_data         SUBORDINATE      1        315044          4578    3492413701     862439459
   13 dc_sequences              PARENT                    181111         56571       2997694           930
    6 dc_files                  PARENT                     90210         61110        758952           191
   17 dc_global_oids            PARENT                     58365           691        180794             3
   24 outstanding_alerts        PARENT                     40882         38340        582110          1617
    5 dc_tablespace_quotas      PARENT                     29368          1679        336175           316
    8 dc_object_grants          SUBORDINATE      0         27432           941     129632484        943641
   12 dc_constraints            PARENT                      2911          1802         46892            48
   22 dc_awr_control            PARENT                      2378            46          8001             0
   19 dc_partition_scns         SUBORDINATE      0             0             0       7208283         12624
   33 kqlsubheap_object         PARENT                         0             0             0             0
    7 dc_users                  SUBORDINATE      2             0             0     843421881       9823132
    4 dc_used_extents           PARENT                         0             0             0             0
    1 dc_free_extents           PARENT                         0             0             0             0
   34 realm cache               PARENT                         0             0             0             0
   34 realm auth                SUBORDINATE      0             0             0             0             0
   35 Command rule cache        PARENT                         0             0             0             0
   36 Realm Object cache        PARENT                         0             0             0             0
   36 Realm Subordinate Cache   SUBORDINATE      0             0             0             0             0
   40 Rule Set Cache            PARENT                         0             0             0             0
   37 event map                 PARENT                         0             0             0             0
   38 format                    PARENT                         0             0             0             0
   39 audit collector           PARENT                         0             0             0             0
   26 global database name      PARENT                         0             0             0             0
   20 rule_info                 PARENT                         0             0             0             0
   21 rule_or_piece             PARENT                         0             0             0             0
   21 rule_fast_operators       SUBORDINATE      0             0             0             0             0
    9 dc_qmc_cache_entries      PARENT                         0             0             0             0
   23 dc_qmc_ldap_cache_entries PARENT                         0             0             0             0
   27 qmtmrcin_cache_entries    PARENT                         0             0             0             0
   28 qmtmrctn_cache_entries    PARENT                         0             0             0             0
   29 qmtmrcip_cache_entries    PARENT                         0             0             0             0
   30 qmtmrctp_cache_entries    PARENT                         0             0             0             0
   31 qmtmrciq_cache_entries    PARENT                         0             0             0             0
   32 qmtmrctq_cache_entries    PARENT                         0             0             0             0
   25 dc_hintsets               PARENT                         0             0             0             0
   18 dc_outlines               PARENT                         0             0             0             0

51 rows selected.

The misses were mainly at these locations

                                                     NoWait              Waiter
Latch Name               Where                       Misses     Sleeps   Sleeps
------------------------ -------------------------- ------- ---------- --------
row cache objects        kqrpre: find obj                 0    869,898  937,846
row cache objects        kqreqd: reget                    0    578,411  414,607
row cache objects        kqreqd                           0    407,455  503,634

Following is one of the sql executed by the sessions contending for the row cache latch. (They were using literal values as shown below)


select 	......, SUB_STATUS, DOC_STATUS, CRE_STATUS, PHY_STATUS, ...
from 	OWNER.VIEW1
where  	(
	10.0 = 10 
	AND 
	7.07 = 7.07 
	AND 
	'NA' = 'G_60_TO'
	) 
AND 	ISD = 987654321
AND 	ACCOUNT_ID = 1234567
AND 	circle_id = 55555
AND 	ACCOUNT_ID NOT IN (
	SELECT 	ACCOUNT_ID 
	FROM 	OWNER.TAB1
	WHERE 	ID1 = 12345
	GROUP	BY ACCOUNT_ID 
	HAVING 	COUNT(*) >= 1000
	)

There were around a 100 sessions executing the same piece of code in a tight loop. The offending sql was executed roughly around 2 times/second by each session. There were others sql’s in the loop and some of them were having literals. But the latch contention shows up only during the execution of the above sql. This piqued my interest to find the root cause.

A trace showed the parse call of the offending sql to be preceded by 3 recursive calls.


=====================
select text from view$ where rowid=:1
Bind value=0000D6E8.0000.0001
=====================
select condition from cdef$ where rowid=:1
Bind value=00011BD3.004F.0001
=====================
select condition from cdef$ where rowid=:1
Bind value=00011BD3.0050.0001
=====================
select condition from cdef$ where rowid=:1
Bind value=00011BD3.0051.0001
=====================

I queried view$ to find out what this recursive call related to.


select 	rowid, dbms_rowid.ROWID_TO_RESTRICTED(rowid,1) 
from 	view$ 
where 	dbms_rowid.ROWID_TO_RESTRICTED(rowid,1) = '0000D6E8.0000.0001'
/

ROWID              DBMS_ROWID.ROWID_T
------------------ ------------------
AAAAA/AABAAANboAAA 0000D6E8.0000.0001

select 	obj#
from 	view$ 
where 	rowid = 'AAAAA/AABAAANboAAA'
/

      OBJ#
----------
     58193

select 	owner, object_type, object_name
from	dba_objects
where	object_id = 58193
/

Owner            Object type        Object Name                   
---------------- ------------------ ------------------------------
OWNER            VIEW               VIEW1           

Similar queries on cdef$ revealed that the data related to 3 constraints on a table (lets call it SUB)


DOC_STATUS IN ('P', 'S', 'F','N') 
CRE_STATUS IN ('P', 'S', 'F','N')
PHY_STATUS IN ('P', 'S', 'F','N')

I then checked the view definition, which was something like this


select	.........., SUB_STATUS, DOC_STATUS, CRE_STATUS, PHY_STATUS, ...........
from	SUB,
	X,
	Y
where	SUB.id = x.id
and	x.id = y.id
and	SUB.SUB_STATUS != 'C'

The table SUB also has a check constraint on the SUB_STATUS column – SUB_STATUS IN (‘I’, ‘A’, ‘G1’, ‘G2’, ‘S’, ‘C’). But there was no recursive call to get this constraint definition even though a predicate on that column was used in the view definition. Looks like it only gets the constraint definitions for columns used in the query for doing a possible rewrite.

This led to the next question, why is it executing these recursive calls again and again.
Wasn’t that the main purpose for the existence of the row cache? To cache dictionary data?

As usual my master comes to the rescue with this note – Constraints, Inserts and bind

TEXT column in VIEW$ table and CONDITION column in CDEF$ table are LONG datatype and hence their values are not going to be cached in the row cache.

We had resolved the issue by changing CURSOR_SHARING to FORCE from EXACT.
There is also another workaround of setting event 10195 to disable transitive predicate generation using check constraints.

So using literals in sql’s involving views is going to have an additional penalty.

January 5, 2012

Session statistics for Parallel Query Coordinator

Filed under: Uncategorized — srivenu @ 10:01 pm

Trying to gauge the statistics of a parallel sql operation? consider using query coordinator session statistics from V$SESSTAT with a pinch of salt.

When a parallel query (i used query for simplicity) operation is performed, the statistics from all the slave sessions spawned by the coordinator are rolled up (summed up) into the coordinator session once the slave session completes its operation. (If the parallel operation is cancelled in the middle, say by pressing ctrl-c then the slave stats are not rolled up into the coordinator).

I was tuning some parallel sql and in the process i was using statistics from V$SESSTAT for the coordinator session to compare the different execution plans. Some of the stats seemed suspect to me and i decided to test if V$SESSTAT values for the coordinator session are really true and reflect the slave stats spawned by them.

The environment is Oracle 10.2.0.4 on HP-UX ia64 B.11.31.

It was a simple test.

Create a table X with some 10 GB of data and do a count(*) from it.

select	/*+parallel(x 2)*/
	count(*)
from	x
where	col1  'xxxxxx'
/

Gather the session stats for the parallel query coordinator at the start of the query. Fire the query and while it is in execution, get the stats for the query coordinator and the slaves in a continuous loop. At the end of the operation get the stats of the coordinator.

************************
Stats before query start
************************

coordinator stats


   Sid Stat# NAME                                                                         VALUE
------ ----- ------------------------------------------------------------- --------------------
 10695    57 physical read IO requests                                                        4
 10695    58 physical read bytes                                                          32768
 10695    37 physical read total IO requests                                                  4
 10695    39 physical read total bytes                                                    32768
 10695    38 physical read total multi block requests                                         0
 10695    54 physical reads                                                                   4
 10695    55 physical reads cache                                                             4
 10695   114 physical reads cache prefetch                                                    0
 10695    56 physical reads direct                                                            0

**************************************************
Stats at the last iteration just before query ends
**************************************************

Slave 1 stats

   Sid Stat# NAME                                                                         VALUE
------ ----- ------------------------------------------------------------- --------------------
 11092    57 physical read IO requests                                                    93796
 11092    58 physical read bytes                                                     5294833664
 11092    37 physical read total IO requests                                              93662
 11092    39 physical read total bytes                                               5287428096
 11092    38 physical read total multi block requests                                     78927
 11092    54 physical reads                                                              646158
 11092    55 physical reads cache                                                         13915
 11092   114 physical reads cache prefetch                                                    0
 11092    56 physical reads direct                                                       632356


Slave 2 stats


   Sid Stat# NAME                                                                         VALUE
------ ----- ------------------------------------------------------------- --------------------
  9351    57 physical read IO requests                                                    92785
  9351    58 physical read bytes                                                     4911235072
  9351    37 physical read total IO requests                                              92660
  9351    39 physical read total bytes                                               4904845312
  9351    38 physical read total multi block requests                                     72387
  9351    54 physical reads                                                              599208
  9351    55 physical reads cache                                                         19579
  9351   114 physical reads cache prefetch                                                    0
  9351    56 physical reads direct                                                       579734


**********************
Stats after query ends
**********************


coordinator stats


   Sid Stat# NAME                                                                         VALUE
------ ----- ------------------------------------------------------------- --------------------
 10695    57 physical read IO requests                                                   188227
 10695    58 physical read bytes                                                     1710325760
 10695    37 physical read total IO requests                                             188227
 10695    39 physical read total bytes                                               1710325760
 10695    38 physical read total multi block requests                                    152957
 10695    54 physical reads                                                             1257356
 10695    55 physical reads cache                                                         33705
 10695   114 physical reads cache prefetch                                                    0
 10695    56 physical reads direct                                                      1223651

There is large discrepency in the “physical read bytes” and “physical read total bytes” values.

There were some bugs reported in older versions of oracle that could explain some discrepencies.
Bug no’s – 715649, 818237 (which seems relevant)

So we should keep this in mind from taking the coordinator stats from V$SESSTAT at face value.

Coming to bugs, i remember (and sometimes follow) reading on Tanel Poder’s blog that he reads on oracle bug’s to get those scraps of good oracle internal information that oracle developers occasionally and unintentionally reveal in them

Create a free website or blog at WordPress.com.