This was a performance issue i faced on a billing database of one of our telecom client’s. The server was having sporadic cpu spikes for 2-3 minutes before things come back to normal. The environment was 64-bit Oracle 10.2.0.4 on HP-UX ia64 B.11.31. The normal cpu usage was normally around 40-50% and during issue time, the cpu run queue doubles or trebles and usage peaks at 100%. As per the vendor recommendation, the memory comprised of 100GB buffer cache and 60GB shared pool!
I did an ASH analysis for the reported issue time and found out that the waits were mostly for “cursor: pin S”, “latch: row cache objects”, “log file sync” and “latch: library cache”. I also had the chance of monitoring the system during issue time. The “cursor: pin S” waits were across several sql’s. Many of the sql showing in this wait were simple and with only a single child. This raised my curiosity as this isn’t normal.
V$EVENT_HISTOGRAM had the following data (This data is after a recent instance bounce and after the first occurence of the issue).
Event
EVENT# Name WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
617 cursor: pin S 1 7052795
617 cursor: pin S 2 16564
617 cursor: pin S 4 1357
617 cursor: pin S 8 438
617 cursor: pin S 16 377
617 cursor: pin S 32 478
617 cursor: pin S 64 436
617 cursor: pin S 128 910
617 cursor: pin S 256 322
617 cursor: pin S 512 118
617 cursor: pin S 1024 81
617 cursor: pin S 2048 52
617 cursor: pin S 4096 30
617 cursor: pin S 8192 8
Event
EVENT# Name WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
214 latch: library cache 1 3903
214 latch: library cache 2 409
214 latch: library cache 4 436
214 latch: library cache 8 238
214 latch: library cache 16 211
214 latch: library cache 32 185
214 latch: library cache 64 202
214 latch: library cache 128 204
214 latch: library cache 256 232
214 latch: library cache 512 232
214 latch: library cache 1024 129
214 latch: library cache 2048 59
214 latch: library cache 4096 2
A quick search on the net revealed the blog entry from Tanel Poder “cursor: pin S waits, sporadic CPU spikes and systematic troubleshooting“. This matches exactly with the problem that this instance was facing.
The oracle processes were having different scheduling priorities at the os level.
priority count of oracle processes having that priority -------- -------------------- 134 1 154 1250 179 1 188 1 195 1 196 1 199 1 200 1 201 1 207 1 209 2 213 1 222 1 225 1 227 1 229 1 230 1 232 1 239 3 240 1 241 4
There were around 50-100 logins per second during issue time. Also the huge SGA in this instance made the process creation even more costlier.
After implementing HPUX_SCHED_NOAGE, the cpu spikes were gone.
Following is the data post change
priority count of oracle processes
having that priority
-------- --------------------
154 2
178 794
999999 1
Event
EVENT# Name WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
617 cursor: pin S 1 5657072
617 cursor: pin S 2 2889
617 cursor: pin S 4 1177
617 cursor: pin S 8 571
617 cursor: pin S 16 228
617 cursor: pin S 32 60
617 cursor: pin S 64 32
617 cursor: pin S 128 26
Event
EVENT# Name WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------- --------------- ----------
214 latch: library cache 1 315419
214 latch: library cache 2 45189
214 latch: library cache 4 61475
214 latch: library cache 8 40559
214 latch: library cache 16 22104
214 latch: library cache 32 14884
214 latch: library cache 64 11129
214 latch: library cache 128 7770
214 latch: library cache 256 3419
214 latch: library cache 512 3475
214 latch: library cache 1024 5525
214 latch: library cache 2048 12220
214 latch: library cache 4096 15814
214 latch: library cache 8192 829
214 latch: library cache 16384 61
They have yet to reduce their shared pool substantially from the existing value of 60GB.
I really have no words for my admiration of the genious, work and contribution to the DBA community by the guru’s – Jonathan Lewis, Tanel Poder, Tom Kyte.
[...] Update: Srivenu Kadiyala has experienced the same problem and has written about it here. [...]
Pingback by cursor: pin s waits, high CPU usage troubleshooting | Tanel Poder's blog: IT & Mobile for Geeks and Pros — February 2, 2012 @ 12:39 pm |
Hi,
Good one.But your event histogram listing for ‘latch: library cache’ shows more waits in the buckets ranges from 1 -16 seconds…
Thanks
Comment by antony — October 11, 2012 @ 12:38 pm |
Yes it does. The do have a huge library cache. Unfortunately i didnt have access to that system for much time to analyze further.
Comment by srivenu — October 12, 2012 @ 1:29 am |
[...] Once you have the top database cpu consumer you can connect to it and diagnose more in depth (Example of cpu spike on this post). [...]
Pingback by Real-Time CPU utilization per oracle database « bdt's oracle blog — December 4, 2012 @ 11:22 am |
Hi Srivenu
Excellent post. Can you share with me the command to list the oracle process priority as in your analysis as shown below?
The oracle processes were having different scheduling priorities at the os level.
priority count of oracle processes
having that priority
——– ——————–
134 1
154 1250
179 1
188 1
195 1
196 1
199 1
200 1
201 1
207 1
209 2
213 1
222 1
225 1
227 1
229 1
230 1
232 1
239 3
240 1
241 4
Thank you
-Haris
Comment by Haris Ali — January 6, 2013 @ 11:12 pm |
Sorry for the long delay.
I dont have access to an unix system right now, but i think this is the command i used
ps -u oracle -U oracle -o pri| sort| awk ' BEGIN { prev=0; sum=1 } { if ($1==prev) { sum+=$2 } else { print prev, sum; prev=$1; sum=0 } } END { print prev, sum } 'Comment by srivenu — January 23, 2013 @ 8:44 am |