Oracle

November 22, 2010

RAC system hang due to “global cache cr request”

Filed under: Uncategorized — srivenu @ 5:49 am

We have an Oracle CRM application running on 3-Node RAC. The system was scheduled for an OS upgrade during last weekend. As they faced some issues, they decided to rollback the OS upgrade.

Monday morning on coming to office, I see that a fire-fighting was going on. The RAC database had serious performance issues impacting all users.
On first observation i could see numerous session waits on “global cache cr request”. I cross-checked the network interface using oradebug ipc and verified the network using netstat. The next step was to check the average timings for the Global Cache events. These showed some abnormally high values.

Global Cache Service - Workload Characteristics
-----------------------------------------------
Ave global cache get time (ms):                           41.9
Ave global cache convert time (ms):                      162.2

Ave build time for CR block (ms):                          0.0
Ave flush time for CR block (ms):                          0.2
Ave send time for CR block (ms):                           0.2
Ave time to process CR block request (ms):                 0.5
Ave receive time for CR block (ms):                      119.0

Ave pin time for current block (ms):                       7.8
Ave flush time for current block (ms):                     0.0
Ave send time for current block (ms):                      0.2
Ave time to process current block request (ms):            8.0
Ave receive time for current block (ms):                  97.1

I thought of verifying the OS scheduler priorities of the lms processes. And as i was verifying them on Node 2 & Node 3, i was immediately informed that Node 1 had only 2 lms processes! Node 2 & Node 3 had 8 each. The CPU_COUNT was showing 8 on Node 1 while it was 56 on nodes 2 & 3. Looks like the cpu boards were hot plugged on server 1 after db startup. To avoid bouncing node 1, we tried to set “_cpu_count” to 56 and see if more lms’s would start up automatically. Bad Luck!, that didn’t work out. So we had to restart the db on node 1.

Advertisements

2 Comments »

  1. Hi, Did you find the actual issue? Why the system hang.?

    Comment by Shankar — February 28, 2011 @ 4:48 pm | Reply

    • Its due to insufficient number of LMS processes.

      Comment by ksrivenu — March 2, 2011 @ 5:59 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: