Oracle

March 18, 2011

Digging in the ASH

Filed under: Uncategorized — srivenu @ 11:11 am

Recently logins to one of our databases had frozen for around 10 minutes from 1:04 PM till around 1:16PM.
New users were not able to login to the database.
By the time i had logged in, i saw several sessions waiting on “gc buffer busy”.
The object involved was found out to be AUD$ table. The issue got resolved immediately after auditing was turned off.
I decided to dig in further for a portmortem using ASH.

Get the latest snap_time and snap_id from dba_hist_snapshot.

select 	* 
from 	dba_hist_snapshot 
where 	snap_id in (
	select 	max(snap_id) 
	from 	dba_hist_snapshot 
	where 	instance_number=1
	)
/

   SNAP_ID       DBID INSTANCE_NUMBER STARTUP_TIME
---------- ---------- --------------- ---------------------------------------------------------------------------
BEGIN_INTERVAL_TIME
---------------------------------------------------------------------------
END_INTERVAL_TIME
---------------------------------------------------------------------------
FLUSH_ELAPSED                                                               SNAP_LEVEL ERROR_COUNT
--------------------------------------------------------------------------- ---------- -----------
     45044 2872548522               1 11-MAR-11 04.52.14.000 AM
18-MAR-11 12.30.35.275 PM
18-MAR-11 01.30.19.435 PM
+00000 00:00:41.6                                                                    1           0

     45044 2872548522               2 11-MAR-11 05.01.23.000 AM
18-MAR-11 12.30.16.417 PM
18-MAR-11 01.30.01.443 PM
+00000 00:00:36.6                                                                    

Get the count of samples from DBA_HIST_ACTIVE_SESS_HISTORY for the previous and current snaps.

— Previous Snap

col sample_id head "Sample ID" form 999999999
col sample_time head "Sample Time" form a28

select 	SAMPLE_ID, SAMPLE_TIME, count(*)
from  	DBA_HIST_ACTIVE_SESS_HISTORY
where 	snap_id = 45043
and   	dbid = 2872548522
and   	instance_number = 1
group 	by SAMPLE_ID, SAMPLE_TIME
order 	by sample_id, SAMPLE_TIME
/


 Sample ID Sample Time                    COUNT(*)
---------- ---------------------------- ----------
 158418490 18-MAR-11 12.27.45.691 PM            74
 158418500 18-MAR-11 12.27.56.181 PM            64
 158418510 18-MAR-11 12.28.06.441 PM            64
 158418520 18-MAR-11 12.28.16.771 PM            83
 158418530 18-MAR-11 12.28.27.121 PM            64
 158418540 18-MAR-11 12.28.37.372 PM            57
 158418550 18-MAR-11 12.28.47.721 PM            74
 158418560 18-MAR-11 12.28.58.211 PM            60
 158418570 18-MAR-11 12.29.08.521 PM            70
 158418580 18-MAR-11 12.29.18.871 PM            71
 158418590 18-MAR-11 12.29.29.231 PM            75
 158418600 18-MAR-11 12.29.39.491 PM            69
 158418610 18-MAR-11 12.29.49.883 PM            75
 158418620 18-MAR-11 12.30.00.412 PM            75

— Current Snap

col sample_id head "Sample ID" form 999999999
col sample_time head "Sample Time" form a28

select	SAMPLE_ID, SAMPLE_TIME, count(*)
from  	DBA_HIST_ACTIVE_SESS_HISTORY
where 	snap_id = 45044
and   	dbid = 2872548522
and   	instance_number = 1
group 	by SAMPLE_ID, SAMPLE_TIME
order 	by sample_id, SAMPLE_TIME
/



 Sample ID Sample Time                    COUNT(*)
---------- ---------------------------- ----------
 158420640 18-MAR-11 01.04.58.204 PM          4493
 158420650 18-MAR-11 01.05.08.781 PM          4565
 158420660 18-MAR-11 01.05.19.481 PM          4629
 158420670 18-MAR-11 01.05.30.311 PM          2480
 158420680 18-MAR-11 01.05.40.881 PM          2400
 158420690 18-MAR-11 01.05.51.321 PM          2148
 158420700 18-MAR-11 01.06.03.851 PM          1731
 158420710 18-MAR-11 01.06.14.401 PM          1608
 158420720 18-MAR-11 01.06.24.960 PM          1511
 158420730 18-MAR-11 01.06.35.391 PM          1419
 158420740 18-MAR-11 01.06.45.890 PM          1412
 158420750 18-MAR-11 01.06.58.430 PM          1437
 158420760 18-MAR-11 01.07.08.870 PM          1373
 158420770 18-MAR-11 01.07.19.394 PM          1323
 158420780 18-MAR-11 01.07.29.960 PM          1294
 158420790 18-MAR-11 01.07.40.470 PM          1287
 158420800 18-MAR-11 01.07.50.855 PM          1288
 158420810 18-MAR-11 01.08.01.570 PM          1281
 158420820 18-MAR-11 01.08.12.071 PM          1281
 158420830 18-MAR-11 01.08.22.423 PM          1287
 158420840 18-MAR-11 01.08.32.921 PM          1283
 158420850 18-MAR-11 01.08.43.431 PM          1275
 158420860 18-MAR-11 01.08.53.811 PM          1298
 158420870 18-MAR-11 01.09.04.541 PM          1286
 158420880 18-MAR-11 01.09.15.084 PM          1282
 158420890 18-MAR-11 01.09.25.581 PM          1271
 158420900 18-MAR-11 01.09.35.951 PM          1292
 158420910 18-MAR-11 01.09.46.481 PM          1284
 158420920 18-MAR-11 01.09.57.221 PM          1279
 158420930 18-MAR-11 01.10.07.601 PM          1274
 158420940 18-MAR-11 01.10.18.111 PM          1300
 158420950 18-MAR-11 01.10.28.641 PM          1279
 158420960 18-MAR-11 01.10.39.031 PM          1267
 158420970 18-MAR-11 01.10.49.541 PM          1287
 158420980 18-MAR-11 01.11.00.341 PM          1271
 158420990 18-MAR-11 01.11.10.846 PM          1284
 158421000 18-MAR-11 01.11.21.251 PM          1281
 158421010 18-MAR-11 01.11.31.791 PM          1275
 158421020 18-MAR-11 01.11.42.331 PM          1276
 158421030 18-MAR-11 01.11.52.701 PM          1270
 158421040 18-MAR-11 01.12.03.441 PM          1278
 158421050 18-MAR-11 01.12.13.972 PM          1264
 158421060 18-MAR-11 01.12.24.361 PM          1263
 158421070 18-MAR-11 01.12.34.881 PM          1260
 158421080 18-MAR-11 01.12.45.410 PM          1279
 158421090 18-MAR-11 01.12.56.191 PM          1275
 158421100 18-MAR-11 01.13.06.571 PM          1266
 158421110 18-MAR-11 01.13.17.111 PM          1286
 158421120 18-MAR-11 01.13.27.641 PM          1256
 158421130 18-MAR-11 01.13.38.021 PM          1259
 158421140 18-MAR-11 01.13.48.531 PM          1263
 158421150 18-MAR-11 01.13.59.403 PM          1264
 158421160 18-MAR-11 01.14.09.981 PM          1276
 158421170 18-MAR-11 01.14.20.381 PM          1271
 158421180 18-MAR-11 01.14.30.951 PM          1268
 158421190 18-MAR-11 01.14.41.524 PM          1286
 158421200 18-MAR-11 01.14.51.941 PM          1261
 158421210 18-MAR-11 01.15.02.841 PM          1257
 158421220 18-MAR-11 01.15.13.396 PM          1264
 158421230 18-MAR-11 01.15.23.821 PM          1276
 158421240 18-MAR-11 01.15.34.431 PM          1267
 158421250 18-MAR-11 01.15.45.033 PM          1265
 158421260 18-MAR-11 01.15.55.871 PM          1258
 158421270 18-MAR-11 01.16.06.291 PM          1253
 158421280 18-MAR-11 01.16.16.901 PM            60
 158421290 18-MAR-11 01.16.27.381 PM            71
 158421300 18-MAR-11 01.16.37.721 PM            74
 158421310 18-MAR-11 01.16.48.204 PM            56
 158421320 18-MAR-11 01.16.58.881 PM            66
 158421330 18-MAR-11 01.17.09.211 PM            63
 158421340 18-MAR-11 01.17.19.681 PM            73
 158421350 18-MAR-11 01.17.30.181 PM            89
 158421360 18-MAR-11 01.17.40.721 PM           159
 158421370 18-MAR-11 01.17.51.104 PM           283
 158421380 18-MAR-11 01.18.02.671 PM           145
 158421390 18-MAR-11 01.18.13.151 PM           128
 158421400 18-MAR-11 01.18.23.501 PM           110
 158421410 18-MAR-11 01.18.33.992 PM            79
 158421420 18-MAR-11 01.18.44.461 PM            91
 158421430 18-MAR-11 01.18.55.161 PM            91
 158421440 18-MAR-11 01.19.05.501 PM            70
 158421450 18-MAR-11 01.19.15.981 PM           384
 158421460 18-MAR-11 01.19.26.463 PM            74
 158421470 18-MAR-11 01.19.36.801 PM            79
 158421480 18-MAR-11 01.19.47.291 PM            55
 158421490 18-MAR-11 01.19.57.961 PM            69
 158421500 18-MAR-11 01.20.08.321 PM            74
 158421510 18-MAR-11 01.20.18.801 PM            62
 158421520 18-MAR-11 01.20.29.281 PM            80
 158421530 18-MAR-11 01.20.39.631 PM            93
 158421540 18-MAR-11 01.20.50.121 PM            73
 158421550 18-MAR-11 01.21.00.791 PM            83
 158421560 18-MAR-11 01.21.11.261 PM            66
 158421570 18-MAR-11 01.21.21.601 PM            72
 158421580 18-MAR-11 01.21.32.111 PM            66
 158421590 18-MAR-11 01.21.42.581 PM            57
 158421600 18-MAR-11 01.21.52.922 PM            65

There is a gap between 12:30 PM and 1:04 PM. (I will try to find out about this later)
We only have data from 1:04 PM.

These are the main wait events at that time.

col name head "Event" form a40

select 	en.NAME, count(*)
from  	DBA_HIST_ACTIVE_SESS_HISTORY  dhas,
      	v$event_name                  en
where 	snap_id = 45044
and   	dbid = 2872548522
and   	instance_number = 1
and   	sample_id = 158420640
and   	dhas.event_id = en.event_id
group 	by en.NAME
order 	by count(*) desc
/


NAME                                                               COUNT(*)
---------------------------------------------------------------- ----------
gc buffer busy                                                         1694
buffer busy waits                                                      1451
log file sync                                                          1193
gc cr request                                                            85
gc current request                                                       26

Now lets find out the object involved in “gc buffer busy”

col name head "Event" form a20

select 	en.NAME, p1, p2, count(*)
from  	DBA_HIST_ACTIVE_SESS_HISTORY  dhas,
      	v$event_name                  en
where 	snap_id = 45044
and   	dbid = 2872548522
and   	instance_number = 1
and   	sample_id = 158420640
and   	en.NAME = 'gc buffer busy'
and   	dhas.event_id = en.event_id
group 	by en.NAME, p1, p2
order 	by count(*) desc
/


Event                        P1         P2   COUNT(*)
-------------------- ---------- ---------- ----------
gc buffer busy              319       6413       1387
gc buffer busy              769     113488         49
gc buffer busy              582     284415         37

They are mostly for the file#/block# 319/6413

Dumping the block shows that its a level 1 BMB for AUD$ table

Start dump data blocks tsn: 20 file#: 319 minblk 6413 maxblk 6413
buffer tsn: 20 rdba: 0x4fc0190d (319/6413)
scn: 0x0a35.c98b07d1 seq: 0x01 flg: 0x00 tail: 0x07d12001
frmt: 0x02 chkval: 0x0000 type: 0x20=FIRST LEVEL BITMAP BLOCK
Hex dump of block: st=0, typ_found=1

When the query is repeated for SAMPLE_ID=158420690, this is the data

Event                        P1         P2   COUNT(*)
-------------------- ---------- ---------- ----------
gc buffer busy              319       2590        421
gc buffer busy              741     630505         18

This is the SEGMENT HEADER for the same TABLE AUD$

Start dump data blocks tsn: 20 file#: 319 minblk 2590 maxblk 2590
buffer tsn: 20 rdba: 0x4fc00a1e (319/2590)
scn: 0x0a35.c3264fe1 seq: 0x01 flg: 0x00 tail: 0x4fe12301
frmt: 0x02 chkval: 0x0000 type: 0x23=PAGETABLE SEGMENT HEADER
Hex dump of block: st=0, typ_found=1

Why so much contention on AUD$. Lets look at the per minute login rate

select      count(*), to_char(cast((from_tz(ntimestamp#,'00:00') at local) as date), 'YY-MM-DD HH24:MI')
from  aud$
group by to_char( cast((from_tz(ntimestamp#,'00:00') at local) as date), 'YY-MM-DD HH24:MI')
order by to_char( cast((from_tz(ntimestamp#,'00:00') at local) as date), 'YY-MM-DD HH24:MI')
/

  COUNT(*) TO_CHAR(CAST((FROM_TZ(NTIMESTA
---------- ---------------------------------------------------------------------------
       377 11-03-18 12:55
       359 11-03-18 12:56
       385 11-03-18 12:57
       338 11-03-18 12:58
       473 11-03-18 12:59
       752 11-03-18 13:00
      1307 11-03-18 13:01
       814 11-03-18 13:02
       596 11-03-18 13:03
       598 11-03-18 13:04
      1095 11-03-18 13:05
      1122 11-03-18 13:06
       935 11-03-18 13:07
       780 11-03-18 13:08
       730 11-03-18 13:09
       437 11-03-18 13:10
       217 11-03-18 13:11
       266 11-03-18 13:12
       161 11-03-18 13:13
       426 11-03-18 13:14
       591 11-03-18 13:15
      1905 11-03-18 13:16
      1640 11-03-18 13:17
      1021 11-03-18 13:18
       603 11-03-18 13:19
       529 11-03-18 13:20
       618 11-03-18 13:21
       642 11-03-18 13:22
       451 11-03-18 13:23
       603 11-03-18 13:24
       576 11-03-18 13:25
       516 11-03-18 13:26
       414 11-03-18 13:27
       438 11-03-18 13:28
       428 11-03-18 13:29
       492 11-03-18 13:30

There are too many Logins at 1 PM (13:01)

The issue at around 1PM was that AUD$ was not able to handle the concurrent inserts at that time.

But what caused the ASH history to disappear between 12:30PM and 1:04PM?
Did something happen around 12:30PM that caused the system freeze and existing user connections to freeze?
Did this cause users to relogin causing a flood of new connections and hence contention at AUD$?
Was there a shift change at around 1PM to cause this flood?
Will need to dig in further into analyze what happened between 12:30PM and 1:04PM using other means as ASH is not available

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: