This was a performance issue i faced on a billing database of one of our telecom client’s. The server was having sporadic cpu spikes for 2-3 minutes before things come back to normal. The environment was 64-bit Oracle 10.2.0.4 on HP-UX ia64 B.11.31. The normal cpu usage was normally around 40-50% and during issue time, the cpu run queue doubles or trebles and usage peaks at 100%. As per the vendor recommendation, the memory comprised of 100GB buffer cache and 60GB shared pool!
I did an ASH analysis for the reported issue time and found out that the waits were mostly for “cursor: pin S”, “latch: row cache objects”, “log file sync” and “latch: library cache”. I also had the chance of monitoring the system during issue time. The “cursor: pin S” waits were across several sql’s. Many of the sql showing in this wait were simple and with only a single child. This raised my curiosity as this isn’t normal.
V$EVENT_HISTOGRAM had the following data (This data is after a recent instance bounce and after the first occurence of the issue).
Event EVENT# Name WAIT_TIME_MILLI WAIT_COUNT ---------- ---------------------------------------- --------------- ---------- 617 cursor: pin S 1 7052795 617 cursor: pin S 2 16564 617 cursor: pin S 4 1357 617 cursor: pin S 8 438 617 cursor: pin S 16 377 617 cursor: pin S 32 478 617 cursor: pin S 64 436 617 cursor: pin S 128 910 617 cursor: pin S 256 322 617 cursor: pin S 512 118 617 cursor: pin S 1024 81 617 cursor: pin S 2048 52 617 cursor: pin S 4096 30 617 cursor: pin S 8192 8 Event EVENT# Name WAIT_TIME_MILLI WAIT_COUNT ---------- ---------------------------------------- --------------- ---------- 214 latch: library cache 1 3903 214 latch: library cache 2 409 214 latch: library cache 4 436 214 latch: library cache 8 238 214 latch: library cache 16 211 214 latch: library cache 32 185 214 latch: library cache 64 202 214 latch: library cache 128 204 214 latch: library cache 256 232 214 latch: library cache 512 232 214 latch: library cache 1024 129 214 latch: library cache 2048 59 214 latch: library cache 4096 2
A quick search on the net revealed the blog entry from Tanel Poder “cursor: pin S waits, sporadic CPU spikes and systematic troubleshooting“. This matches exactly with the problem that this instance was facing.
The oracle processes were having different scheduling priorities at the os level.
priority count of oracle processes having that priority -------- -------------------- 134 1 154 1250 179 1 188 1 195 1 196 1 199 1 200 1 201 1 207 1 209 2 213 1 222 1 225 1 227 1 229 1 230 1 232 1 239 3 240 1 241 4
There were around 50-100 logins per second during issue time. Also the huge SGA in this instance made the process creation even more costlier.
After implementing HPUX_SCHED_NOAGE, the cpu spikes were gone.
Following is the data post change
priority count of oracle processes having that priority -------- -------------------- 154 2 178 794 999999 1 Event EVENT# Name WAIT_TIME_MILLI WAIT_COUNT ---------- ---------------------------------------- --------------- ---------- 617 cursor: pin S 1 5657072 617 cursor: pin S 2 2889 617 cursor: pin S 4 1177 617 cursor: pin S 8 571 617 cursor: pin S 16 228 617 cursor: pin S 32 60 617 cursor: pin S 64 32 617 cursor: pin S 128 26 Event EVENT# Name WAIT_TIME_MILLI WAIT_COUNT ---------- ---------------------------------------- --------------- ---------- 214 latch: library cache 1 315419 214 latch: library cache 2 45189 214 latch: library cache 4 61475 214 latch: library cache 8 40559 214 latch: library cache 16 22104 214 latch: library cache 32 14884 214 latch: library cache 64 11129 214 latch: library cache 128 7770 214 latch: library cache 256 3419 214 latch: library cache 512 3475 214 latch: library cache 1024 5525 214 latch: library cache 2048 12220 214 latch: library cache 4096 15814 214 latch: library cache 8192 829 214 latch: library cache 16384 61
They have yet to reduce their shared pool substantially from the existing value of 60GB.
I really have no words for my admiration of the genious, work and contribution to the DBA community by the guru’s – Jonathan Lewis, Tanel Poder, Tom Kyte.
[…] Update: Srivenu Kadiyala has experienced the same problem and has written about it here. […]
Pingback by cursor: pin s waits, high CPU usage troubleshooting | Tanel Poder's blog: IT & Mobile for Geeks and Pros — February 2, 2012 @ 12:39 pm |
Hi,
Good one.But your event histogram listing for ‘latch: library cache’ shows more waits in the buckets ranges from 1 -16 seconds…
Thanks
Comment by antony — October 11, 2012 @ 12:38 pm |
Yes it does. The do have a huge library cache. Unfortunately i didnt have access to that system for much time to analyze further.
Comment by srivenu — October 12, 2012 @ 1:29 am |
[…] Once you have the top database cpu consumer you can connect to it and diagnose more in depth (Example of cpu spike on this post). […]
Pingback by Real-Time CPU utilization per oracle database « bdt's oracle blog — December 4, 2012 @ 11:22 am |
Hi Srivenu
Excellent post. Can you share with me the command to list the oracle process priority as in your analysis as shown below?
The oracle processes were having different scheduling priorities at the os level.
priority count of oracle processes
having that priority
——– ——————–
134 1
154 1250
179 1
188 1
195 1
196 1
199 1
200 1
201 1
207 1
209 2
213 1
222 1
225 1
227 1
229 1
230 1
232 1
239 3
240 1
241 4
Thank you
-Haris
Comment by Haris Ali — January 6, 2013 @ 11:12 pm |
Sorry for the long delay.
I dont have access to an unix system right now, but i think this is the command i used
Comment by srivenu — January 23, 2013 @ 8:44 am |
sum+=1 instead of sum+=$2 i assume
Comment by dik pater — November 1, 2017 @ 4:42 am |
You are right. I made a mistake
Comment by srivenu — November 1, 2017 @ 7:12 am |
ps -u oracle -U oracle -o pri|
sort|
awk ‘
BEGIN {
prev=0;
sum=1
}
{
if ($1==prev) {
sum+=1
}
else {
print prev, sum;
prev=$1;
sum=0
}
}
END {
print prev, sum
}
‘
Comment by srivenu — November 1, 2017 @ 7:14 am |