Srivenu Kadiyala's Blog

December 2, 2020

Create index online hangs with “blocking txn id for DDL”

Filed under: Uncategorized — srivenu @ 5:01 pm

Oracle 12.1.0.2.0 on Solaris 5.11

Recently I was creating an index like this

CREATE      INDEX USER1.INDEX5
ON          USER1.TAB1(COL1, COL2, COL3)
ONLINE      INVISIBLE
PARALLEL    16
TABLESPACE  TBS;

Following were my session details

col sid head "Sid" form 99999
col serial# head "Ser#" form 99999
col pid head "Oracle|PID" form 9999
col module head "Module" form a20
col spid head 'OS|Process' form a8

select s.sid, s.serial#, p.pid, s.module, p.spid
from   v$session s,
       v$process p
where  s.paddr=p.addr(+)
and    sid='6533'
/
SIDSerial#ModuleOracle PIDOS
Process ID
653317538SQL*Plus2206844

After the parallel slaves were done creating the index, the coordinator session got hung up waiting on “blocking txn id for DDL”

col sid head "Wai|ting|SID" form 99999
col blocking_session head "Bloc|king|SID" form 99999
col event form a25 trunc head "Event Waiting For"
col p1 form 999999999999999
col p2 form 999999999999999

select s.sid, s.blocking_session, s.event, s.p1, s.p2
from   v$session  s
where  state in ( 'WAITING', 'WAITED SHORT TIME')
and    wait_class <> 'Idle'
order  by s.state, s.event
/
Waiting
session
Blocking
Session
Event Waiting
for
P1P2
6533blocking txn id for DDL2136496220755420757

The blocking session was null. 213649 is the object id of the table TAB1. I checked if there are any other sessions holding a lock on the table – select * from v$lock where id1=213649; There weren’t any. Might have been a bug. So i thought i would try to give the session a nudge

 sys@DB1> oradebug setospid 6844
 Oracle pid: 220, Unix process pid: 6844, image: oracle@server1 (TNS V1-V3)

 sys@DB1> oradebug suspend
 Statement processed.

 sys@DB1> oradebug resume
 Statement processed.

Voila it worked, the session proceeds and the index creation completes

November 17, 2017

Inconsistent data across RAC standby nodes

Filed under: Uncategorized — srivenu @ 1:31 pm

I had been working on Parallel Server/RAC since 2001 but had not faced this sort of problem till now.
Environment – Oracle 12c EE 12.1.0.2.0, 64bit on RHEL 7.3
One of the CDB databases has a 2-node RAC primary and a 2-node RAC standby.
One of the users complained about getting inconsistent data across his SqlDeveloper sessions.
I saw that his 2 connections had landed on 2 different standby RAC nodes and getting different results for the same query.
Intrigued and to rule out consistency level issues, i opened sqlplus sessions on each of the nodes and following is a summary of what i saw

******
Node 1
******
alter session set container=ABCAPP;

select count(*) from ABC.MYAPP_LOG;
 
  COUNT(*)
----------
    105109

select max(LOG_ID) from ABC.MYAPP_LOG;
 
    MAX(LOG_ID)
---------------
         106115

select current_scn from v$database;
 
                   CURRENT_SCN
------------------------------
                10807769183626
 
******
Node 2
******
alter session set container=ABCAPP;

select count(*) from ABC.MYAPP_LOG;
 
  COUNT(*)
----------
    104955

select max(LOG_ID) from ABC.MYAPP_LOG;
 
    MAX(LOG_ID)
---------------
         105961

select current_scn from v$database;
 
                   CURRENT_SCN
------------------------------
                10807769183563

Following is status of managed recovery running on node 1

col name head "Metric Name" form a25
col value head "Value" form a16
col unit head "Units" form a30
col time_computed head "Metric Computation|Time" form a20
col datum_time head "Metric Data|Receipt Time" form a25
 
select	name, value, unit, time_computed, datum_time
from 	v$dataguard_stats
/
 
                                                                          Metric Computation   Metric Data
Metric Name               Value            Units                          Time                 Receipt Time
------------------------- ---------------- ------------------------------ -------------------- -------------------------
transport lag             +00 00:00:00     day(2) to second(0) interval   10/27/2017 09:20:30  10/27/2017 09:20:27
apply lag                                  day(2) to second(0) interval   10/27/2017 09:20:30
apply finish time                          day(2) to second(3) interval   10/27/2017 09:20:30
estimated startup time    32               second                         10/27/2017 09:20:30

I decided to check the ora_rowscn of the max LOG_ID on primary

select ora_rowscn from ABC.MYAPP_LOG where LOG_ID=106115;
 
                    ORA_ROWSCN
------------------------------
                10807754563045

select scn_to_timestamp(10807754563045) from dual;
 
SCN_TO_TIMESTAMP(10807754563045)
---------------------------------
26-OCT-17 10.32.22.000000000 AM    - That was 1 day ago

Tried flushing buffer cache on node 2

alter system flush BUFFER_CACHE;

System altered.
 
select count(*) from ABC.MYAPP_LOG;
 
  COUNT(*)
----------
    104955  - No Change

Tried flushing shared pool on node 2

alter system flush shared_pool;

System altered.

select count(*) from ABC.MYAPP_LOG;
 
  COUNT(*)
----------
    105109  - Now it works

select max(LOG_ID) from ABC.MYAPP_LOG;
 
    MAX(LOG_ID)
---------------
         106115

I’m eagerly waiting for the issue to recur as i plan to dump row cache at level 8 (oradebug dump row_cache 8) on both the standby rac nodes and check for discrepancies

November 14, 2017

Sessions from an inactive service can show up in a RAC node

Filed under: Uncategorized — srivenu @ 2:02 pm

If you are seeing intermittent sessions in a RAC node, for a service that is not active on that node, don’t worry. It can happen sometimes. This is one such case.
We have a 2 node-RAC system and we run direct some services to specific nodes.
Service app_prod_abc (actual name changed) is one such service which runs on Node 1 (the preferred node).
Environment – Oracle 12c EE 12.1.0.2.0, 64bit on RHEL 7.3

crsctl shows the following status for this service

ora.app01pc_abc.app_prod_abc.svc
      1        ONLINE  ONLINE       app1db01                 STABLE

A listing of active services on the nodes also shows the following

col inst_id head "Inst|ance" form 9999
col service_id head "ID" form 999
col name head "Service Name" form a35
col con_name head "Container" form a10
col network_name head "Network Name|used to connect" form a60
col goal head "Runtime|Load|Balancing|Goal" form a10
col clb_goal head "Connection|Load|Balancing|Goal" form a10
col blocked head "Blocked|on this|Instance" form a10
col stop_option head "Stop Option|for Sessions" form a10

select	inst_id, service_id, name, con_name, network_name, goal, clb_goal, blocked
from 	gv$active_services
where	service_name = 'app_prod_abc'
/

                                                                                                                                 Runtime    Connection
                                                                                                                      Load       Load       Blocked
Inst                                                     Network Name                                                 Balancing  Balancing  on this
ance   ID Service Name                        Container  used to connect                                              Goal       Goal       Instance
----- ---- ----------------------------------- ---------- ------------------------------------------------------------ ---------- ---------- ----------
    1    7 app_prod_abc                        ABC01PP    app_prod_abc                                                 NONE       LONG       NO

So the service is only active on node 1
But i was told that intermittently sessions with service “app_prod_abc” are appearing on node 2.

select  inst_id, con_id, username, schemaname, osuser, machine, program, module, service_name, count(*)
from    gv$session
where   username='APPUSER'
group   by con_id, inst_id, username, schemaname, osuser, machine, program, module, service_name
order   by inst_id, con_id, username, schemaname, osuser, machine, program, module, service_name
/

            Con
            tai                                      Client               Client          Client
   INST_ID  ner USERNAME          SCHEMANAME OSUSER  Machine              Program         Module          Service Name      COUNT(*)
---------- ---- ----------------- ---------- ------- -------------------- --------------- --------------- --------------- ----------
         1    3 APPUSER           APPUSER    tomcat  abapp01.prd.atx.merc JDBC Thin Clien JDBC Thin Clien app_prod_abc            79
         1    3 APPUSER           APPUSER    tomcat  bpapp02.prd.atx.merc JDBC Thin Clien JDBC Thin Clien app_prod_abc            72
         1    3 APPUSER           APPUSER    tomcat  accapp01.merchante-s JDBC Thin Clien JDBC Thin Clien app_prod_abc           397
         1    3 APPUSER           APPUSER    tomcat  accapp02.merchante-s JDBC Thin Clien JDBC Thin Clien app_prod_abc           414
         1    3 APPUSER           SYS        tomcat  accapp02.merchante-s JDBC Thin Clien JDBC Thin Clien app_prod_abc             2
         1    3 APPUSER           SYS        tomcat  accapp02.merchante-s oracle@abc1db01 JDBC Thin Clien app_prod_abc             1
         1    3 APPUSER           SYS        tomcat  accapp02.merchante-s oracle@abc1db01 JDBC Thin Clien app_prod_abc             1
         2    3 APPUSER           SYS        tomcat  accapp02.merchante-s oracle@abc1db02 JDBC Thin Clien app_prod_abc             1
         2    3 APPUSER           SYS        tomcat  accapp02.merchante-s oracle@abc1db02 JDBC Thin Clien app_prod_abc             1

Looking at the schemaname of SYS, for the user sessions, i suspected that these could be recursive sql.
But the sql_id for these sessions in V$SESSION on node1 was showing normal user sql.
Turning sql trace on one of the active sessions showed me the recursive sql.
SQL_ID – frjd8zfy2jfdq. This is a sql on GV$SQL. (This sql is part of 12c Adaptive features and has caused grief for quite a few people)
Any query on GV$’s is going to start slave session’s on the other rac node’s.
I also confirmed that session on node2 is executing the same sql

col sid head "Sid" form 99999
col service_name head "Service|Name" form a12
col sql_id head "Current|SQL ID" form a13
col sql_child_number head "Curr|Child" form 999999
col sql_exec_id head "Current SQL|Execution ID" form 9999999999
col sql_exec_start head "Current SQL|Execution Start" form a15

select	sid, service_name, sql_id, sql_child_number, sql_exec_id, to_char(sql_exec_start, 'dd-mon hh24:mi:dd')sql_exec_start
from	gv$session s
where	s.username='APPUSER'
and	service_name='app_prod_abc'
and	inst_id=2
/

            Service       Current   Curr  Current SQL     Current SQL
   Sid         Name        SQL ID  Child Execution ID Execution Start
------ ------------ ------------- ------ ------------ ---------------
   482 app_prod_abc frjd8zfy2jfdq    376     20820390 14-NOV 09:08:14


col qcinst_id head "Instance|No|of|Co-ordinator|Process" form 999
col qcsid head "Query|Coordinator|Sid" form 99999 trunc
col server_group head "Server|Group #" form 9999
col server_set head "Server|Set|Within|Server|Group" form 9999
col server# head "Server|No|Within|Server|Set" form 999
col sid head "Sid" form 99999 trunc
col degree head "Degree|Of|Parallelism|Being|Used" form 9999
col req_degree head "Requested|Degree|Of|Parallelism" form 9999

select 	qcinst_id, qcsid, server_group, server_set, server#, sid, degree, req_degree 
from 	v$px_session
/


    Instance                     Server Server             Degree
          No                        Set     No                 Of   Requested
          of       Query         Within Within        Parallelism      Degree
Co-ordinator Coordinator  Server Server Server              Being          Of
     Process         Sid Group #  Group    Set    Sid        Used Parallelism
------------ ----------- ------- ------ ------ ------ ----------- -----------
           1        2385       1      1      2    482           2           2

So the above output shows that the slave session is starting on node 2 with the same service name as the parent and the same username of APPUSER (but schemaname is SYS)

As a corollary of the above fact, you might also see non-zero values in v$service_stats for a disabled service on a RAC node.
For example – app_prod_abc is disabled on Node 2 in our case. But still we see stats for that service in that node

col statname form a30
col value form 999999999999

select 	stat_name, value
from 	gv$service_stats
where 	inst_id = 2
and	service_name = 'app_prod_abc'
order 	by 1
/

STAT_NAME                                                  VALUE
-------------------------------------------------- -------------
DB CPU                                               13321667111
DB time                                             110475653452
application wait time                                      53810
cluster wait time                                        3213776
concurrency wait time                                   33527828
db block changes                                             329
execute count                                            3607277
gc cr block receive time                                      12
gc cr blocks received                                        532
gc current block receive time                                585
gc current blocks received                                 11369
logons cumulative                                        3585421
opened cursors cumulative                                3607190
parse count (total)                                      3595651
parse time elapsed                                     712841694
physical reads                                            216292
physical writes                                                0
redo size                                                  34608
session cursor cache hits                                  18645
session logical reads                                   11457700
sql execute elapsed time                              2489650459
user I/O wait time                                      27672040
user calls                                              14342739
user commits                                                   1
user rollbacks                                                 0
workarea executions - multipass                                0
workarea executions - onepass                                  0
workarea executions - optimal                               3142

28 rows selected.

October 30, 2017

expdp and a case of ora-01555

Filed under: Uncategorized — srivenu @ 3:21 pm

If you want the summary and do not want to read the whole thing, please jump to the end. If you have patience and want to find out my mistakes or play detective, please read on
Environment – Oracle 12c EE 12.1.0.2.0, 64bit on RHEL 7.3
Recently in one of our databases we had a flood of ORA-01555 errors.

From the alert log

Fri Oct 27 02:00:33 2017
ORA-01555 caused by SQL statement below (SQL ID: 7wm6rfjv8gmk1, SCN: 0x09d4.60a63d24):
Fri Oct 27 02:00:33 2017
SELECT * FROM RELATIONAL("USER1"."TABLE1") PARTITION ("PART_OCT17")
Fri Oct 27 03:12:02 2017
ORA-01555 caused by SQL statement below (SQL ID: fdatdp8p5b7by, SCN: 0x09d4.60a63d24):
Fri Oct 27 03:12:02 2017
SELECT * FROM RELATIONAL("USER1"."TABLE1") PARTITION ("PART_NOV10")
Fri Oct 27 03:20:13 2017
ORA-01555 caused by SQL statement below (SQL ID: fsqn6ny5bxwdb, SCN: 0x09d4.60aa1ebc):
Fri Oct 27 03:20:13 2017
SELECT * FROM RELATIONAL("USER1"."TABLE2") PARTITION ("PART_JAN13")

Let us look at the first set of message and see what it means

Fri Oct 27 02:00:33 2017
ORA-01555 caused by SQL statement below (SQL ID: 7wm6rfjv8gmk1, SCN: 0x09d4.60a63d24):
Fri Oct 27 02:00:33 2017
SELECT * FROM RELATIONAL("USER1"."TABLE1") PARTITION ("PART_OCT17")

This means that at 2 AM on Fri Oct 27, someone tried to query PART_OCT17 of TABLE1 as of scn 0x09d4.60a63d24. We will shortly see how we can convert that hex scn to timestamp.

My first task was to identify the session facing/causing this error. I immediately checked v$open_cursor to identify the sessions which opened these sql_id’s. Also a quick check in v$sql showed the culprit

col action head "Action" form a25
col module head "Module" form a33
col program_id head "Program|Object id" form 999999999
col program_line# head "Program|Line#" form 999999

select 	s.parsing_schema_name, action, module,
	s.program_id, s.program_line#
from	v$sql 		s
where 	sql_id='7wm6rfjv8gmk1'
order 	by 1, 2
/

Parsing                                                                        Program Program
Username        Action                    Module                             Object id   Line#
--------------- ------------------------- --------------------------------- ---------- -------
SYS             P1_15636                  Data Pump Worker                           0       0

Quick check of the data_pump_sessions showed the job

select	to_char(sysdate,'YYYY-MM-DD HH24:MI:SS') dat, s.sid,
	s.status, s.username, d.job_name, p.spid, s.serial#, p.pid
from	v$session 		s, 
	v$process 		p, 
	dba_datapump_sessions 	d
where	p.addr=s.paddr and s.saddr=d.saddr
/




DAT                     Sid STATUS USERNAME   JOB_NAME             SPID         Ser#        PID
-------------------- ------ ------ ---------- -------------------- ---------- ------ ----------
2017-10-27 06:46:31     500 ACTIVE USER1      P1_191612            192069      61611        738
2017-10-27 06:46:31    3346 ACTIVE USER1      P1_191612            191619      64526        750

Someone has started a data_pump jobs(My case was complicated, there were actually 2 jobs, but for the purpose of this post, one is enough). I quickly checked the contents of export master tables P1_191612. It showed that a full schema export of USER1 schema.

col OBJECT_SCHEMA head "Object|Schema" form a10
col OBJECT_NAME head "Object" form a30
col PARTITION_NAME head "Partition Name" form a25
col PROCESS_ORDER head "Pro|ce|ss|Or|der" form 9,999,999

select	OBJECT_SCHEMA, OBJECT_NAME, PARTITION_NAME, PROCESS_ORDER
from	USER1.P1_191612
where	OBJECT_TYPE_PATH = 'SCHEMA_EXPORT/TABLE/TABLE_DATA'
order	by 1, 2, 3
/

After some digging i identified the script which was used by the user to start the expdp job.
They were something like this

expdp user1@database1 job_name=P1_$$ directory=dp_dir1 \
        dumpfile=data1.dmp    \
        logfile=logfile_$$.log \
        exclude=index_statistics,table_statistics metrics=yes reuse_dumpfiles=yes

I converted the hex scn to decimal and then to time stamps.
You can refer to this website for details – oaktable – SCN – What, why, and how?
The hex 0x09d4.60a63d24 was actually 0x09d460a63d24 – which is 10807759224100 in decimal. (I just used the windows calc – programmer mode for this conversion)
Then using the following i got the timestamp for that scn

select 	scn_to_timestamp(10807759224100) 
from 	dual
/

SCN_TO_TIMESTAMP(10807759478460)
--------------------------------
26-OCT-17 09.44.54.000000000 PM

The job started at around 9:30PM on 26-OCT-17.
The time stamp at which the error occurred was 27-OCT-17 2:00 AM. Work on that partition (PART_OCT17 of TABLE1) started around 2AM and the error message means that the expdp worker is trying to get data for that partition as of timestamp 26-OCT-17 09.44.54.000000000 PM, which is almost at the starting time of the job? This was surprising. The expdp job does not have either the FLASHBACK_SCN or FLASHBACK_TIME parameters or the legacy CONSISTENT parameter set. Then why is it trying to perform a consistent export?

Our UNDO_RETENTION is only for 3 hours and its natural that we can get an error if we try to retrieve old data. Quick check of V$UNDOSTAT (for the time period from the start of the job till the error), did not show any space issue. Also it showed 0 values for the following columns
UNXPSTEALCNT – Number of attempts to obtain undo space by stealing unexpired extents from other transactions
UNXPBLKRELCNT – Number of unexpired blocks removed from certain undo segments so they can be used by other transactions
UNXPBLKREUCNT – Number of unexpired undo blocks reused by transactions from same segment

I thought it might be a LOB space issue or LOB corruption and checked the export log.

ORA-31693: Table data object "USER1"."TABLE1":"PART_OCT17" failed to load/unload and is being skipped due to error:
ORA-02354: error in exporting/importing data
ORA-01555: snapshot too old: rollback segment number 31 with name "_SYSSMU31_2063816273$" too small

For LOB’s, the rollback segment name will be null. So i ruled out LOB issue
For more information, I decided to set event 10442

alter system set events '10442 trace name context forever, level 10';

Then i saw the data pump workers dump diagnostic information in their trace file on the next ORA-01555 error. The alert started showing the trace files from the next error onward.

Fri Oct 27 06:14:22 2017
ORA-01555 caused by SQL statement below (SQL ID: 18zcvw0n081qb, SCN: 0x09d4.60aa1ebc):
Fri Oct 27 06:14:22 2017
SELECT * FROM RELATIONAL("USER1"."TABLE2") PARTITION ("PART_APR10")
Fri Oct 27 06:14:22 2017
Errors in file /oracle/app/oracle/diag/rdbms/app01/app01pc/trace/app01pc1_dw01_15792.trc:

The trace file app01pc1_dw01_15792.trc had contents like this

ORA-01555 occurred - Dump of diagnostics information 

*** 2017-10-27 06:12:20.134
 --- This does not indicate any errors
SSOLD: SQL ID: 18zcvw0n081qb, Statement: 
SELECT * FROM RELATIONAL("USER1"."TABLE2") PARTITION ("PART_APR10")
SSOLD: Scan Time Dump
env [0x7ffc3dda2b70]: (scn: 0x09d4.60aa1ebc   xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid:  0x0000.000.00000000  st-scn: 0x0000.00000000  hi-scn: 0x0000.00000000  ma-scn: 0x0000.00000000  flg: 0x00000000)
 SSOLD: recent SCN: scn: 0x09d4.611d7192
SSOLD: Data Block Dump

The env [0x7ffc3dda2b70]: (scn: 0x09d4.60aa1ebc shows that it is trying to get data from scn 0x09d4.60aa1ebc (which is 10807759478460 in decimal and is 26-OCT-17 10.16.38.000000000 PM).

This scn and timestamp were different from the previous error.
If we think the data pump was performing a consistent export, it should have errored out for the same scn for all tables.
I quickly checked the distinct scn’s that the ORA-01555 errors were reporting in the alert log

grep "ORA-01555 caused by SQL statement below " alert_app01ab1.log|grep -v "Query Duration"|awk '{print $11}'|sort -u
0x09d4.60a63d24):
0x09d4.60aa1ebc):
0x09d4.60f45fd7):

Out of ideas, I was scratching my head as to explain the 3 distinct scn’s. (My case was even more complicated as i had 2 jobs and 8 distinct scn’s)
Finally when i was reading the export log – i saw this at the top

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options
FLASHBACK automatically enabled to preserve database integrity.

Searching for “FLASHBACK automatically enabled to preserve database integrity” on metalink gave me this docid – 377218.1.
This is from that note

The message: FLASHBACK automatically enabled to preserve database integrity only means that some of the tables will be assigned special SCNs (needed for Streams and Logical Standby). There is no consistency guaranteed between exported tables.

But that message does not explain the whole story. What sort of consistency will expdp try to provide? Why are we trying to select at 3 different scn’s?

Then i had my “aha” light bulb moment. I Patiently ordered the ORA-01555 messages in the alert log as per the table names and this is what i get

TABLE1 - PART_OCT17 - 0x09d4.60a63d24
TABLE1 - PART_NOV10 - 0x09d4.60a63d24
TABLE1 - PART_AUG10 - 0x09d4.60a63d24
TABLE1 - PART_MAR10 - 0x09d4.60a63d24
TABLE1 - PART_OCT10 - 0x09d4.60a63d24
TABLE1 - PART_JAN11 - 0x09d4.60a63d24
TABLE1 - PART_JUL10 - 0x09d4.60a63d24
TABLE1 - PART_APR10 - 0x09d4.60a63d24
TABLE1 - PART_FEB11 - 0x09d4.60a63d24
TABLE1 - PART_JAN10 - 0x09d4.60a63d24

TABLE2 - PART_JAN13 - 0x09d4.60aa1ebc
TABLE2 - PART_MAY11 - 0x09d4.60aa1ebc
TABLE2 - PART_MAY11 - 0x09d4.60aa1ebc
TABLE2 - PART_JUN11 - 0x09d4.60aa1ebc
TABLE2 - PART_JUN14 - 0x09d4.60aa1ebc
TABLE2 - PART_JUN17 - 0x09d4.60aa1ebc
TABLE2 - PART_OCT17 - 0x09d4.60aa1ebc
TABLE2 - PART_DEC10 - 0x09d4.60aa1ebc
TABLE2 - PART_OCT10 - 0x09d4.60aa1ebc
TABLE2 - PART_JUL10 - 0x09d4.60aa1ebc
TABLE2 - PART_APR10 - 0x09d4.60aa1ebc

TABLE3 - PART_OCT17 - 0x09d4.60f45fd7

So expdp is trying to get all partition data for a table at the same scn, even though we do not specify FLASHBACK_SCN or FLASHBACK_TIME parameters.
My next question was how is this scn determined?
To answer this, we need to understand in what order expdp performs the export. With modesty i would refer to my previous post – Cross platform Database upgrade to 12c using Data Pump with NETWORK_LINK
In summary – expdp processed tables and table partitions/subpartitions (table_data_objects) in the decreasing order of their size.
Let us PART_DEC17 is the largest partition of TABLE1. Obviously this partition will be processed ahead of other partitions on TABLE1. If expdp has performed export of PART_DEC17 at 9:40 PM, it will perform a consistent export of the rest of the partitions of TABLE1 as of the same scn(timestamp) ie 9:40 PM. Looks like expdp will record the timestamp of the first export time of each partitioned table and will try to maintain consistency by exporting all partitions of that table as of that timestamp.

Summary – Even if you have not specified FLASHBACK_SCN or FLASHBACK_TIME or the legacy CONSISTENT parameter, expdp might still try to get consistent data, which might cause unexpected ORA-01555 errors.
If you have large partitioned tables, you have more chance of facing the ORA-01555 error. Try to have separate expdp jobs for large partitioned tables to avoid this error

October 27, 2017

Securing and Configuring third party certificates in Oracle Enterprise Manager Cloud Control 13c

Filed under: Uncategorized — srivenu @ 11:40 pm

Following is the how i configured third party certificates for Oracle Enterprise Manager Cloud Control 13c.
The environment I used here is the following

ORACLE_BASE=/opt/oracle
ORACLE_HOME=/opt/oracle/occ/13.1.0.0
ORACLE_INSTANCE_HOME_LOCATION=/opt/oracle/occ/13.1.0.0/gc_inst
ADMIN_SERVER_HTTPS_PORT=7101
EM_CONSOLE_HTTPS_PORT=7799
AGENT_BASE=/opt/oracle/occ/13.1.0.0/agentbase
AGENT_PORT=3872
EM_UPLOAD_HTTPS_PORT==1159

You may need to make changes depending on your environment.

Step 1 – Create Oracle wallet for the OMS

/opt/oracle/occ/13.1.0.0/oraclehome/oracle_common/bin/orapki \
	wallet create \
	-wallet /opt/oracle/occ/13.1.0.0/occwallet \
	-auto_login \
	-pwd walletpassword

Display the wallet contents

/opt/oracle/occ/13.1.0.0/oraclehome/oracle_common/bin/orapki \
	wallet display \
	-wallet /opt/oracle/occ/13.1.0.0/occwallet

You get the output like this

Requested Certificates:
User Certificates:
Trusted Certificates:
Subject:        OU=Class 2 Public Primary Certification Authority,O=VeriSign\, Inc.,C=US
Subject:        OU=Class 3 Public Primary Certification Authority,O=VeriSign\, Inc.,C=US
Subject:        CN=GTE CyberTrust Global Root,OU=GTE CyberTrust Solutions\, Inc.,O=GTE Corporation,C=US
Subject:        OU=Class 1 Public Primary Certification Authority,O=VeriSign\, Inc.,C=US

Step 2 – Create a key within the wallet(Use larger key size)
Your dn will change based on your organization

/opt/oracle/occ/13.1.0.0/oraclehome/oracle_common/bin/orapki \
	wallet add \
	-wallet /opt/oracle/occ/13.1.0.0/occwallet \
   	-dn "cn=yourcn, ou=yourou, o=yourorg, c=yourc" \
	-keysize 2048 \
	-pwd 'walletpassword'

Display the wallet contents once more

Requested Certificates:
Subject:        CN=%1,CN=yourcn,CN=yourcn,OU=yourou,O=yourorg,C=yourc
User Certificates:
Trusted Certificates:
Subject:        OU=Class 1 Public Primary Certification Authority,O=VeriSign\, Inc.,C=US
Subject:        OU=Class 3 Public Primary Certification Authority,O=VeriSign\, Inc.,C=US
Subject:        OU=Class 2 Public Primary Certification Authority,O=VeriSign\, Inc.,C=US
Subject:        CN=GTE CyberTrust Global Root,OU=GTE CyberTrust Solutions\, Inc.,O=GTE Corporation,C=US

Step 3 – Create a certificate signing request (CSR) based on this key
Make sure the -dn you specify exactly matches the -dn specified earlier.
Provide a filename in the -request argument in which to store the certificate signing request (CSR).

/opt/oracle/occ/13.1.0.0/oraclehome/oracle_common/bin/orapki \
	wallet export \
	-wallet /opt/oracle/occ/13.1.0.0/occwallet \
	-dn "cn=yourcn, ou=yourou, o=yourorg, c=yourc" \
	-request /tmp/EM12cCSR.txt

Check the CSR – /tmp/EM12cCSR.txt
It will be something like this

-----BEGIN NEW CERTIFICATE REQUEST-----
ABCDEFIICijCCAX3cxDjAMBgNVBAoTBXN3aWZ0MRMwEQYDVQQLEwpzc2xz
.........
ABCDEF98kjdf9s6at1D36mrOtmk4xyz
-----END NEW CERTIFICATE REQUEST-----

Step 4 – Generate third party certificates, from your Certificate Signing Authority, using the above CSR (Certificate Signing Request)
You might need “Reference number” & “Authorization code” for your Signing Authority

You will get 2 certificate files – One root certificate and one user certificate (Sometimes you might also have an intermediate certificate)
Let’s say you placed the certificates in the following locations
root certificate in – /opt/oracle/occ/13.1.0.0/occwallet/ca.cer
user certificate in – /opt/oracle/occ/13.1.0.0/occwallet/cert.cer

Step 5 – Import the root, intermediate, and user certificates into the OMS wallet
root & intermediate certificates must be imported using -trusted_cert

/opt/oracle/occ/13.1.0.0/oraclehome/oracle_common/bin/orapki \
	wallet add \
	-wallet /opt/oracle/occ/13.1.0.0/occwallet \
	-trusted_cert -cert /opt/oracle/occ/13.1.0.0/occwallet/ca.cer \
	-pwd 'walletpassword'

You must import the user certificate using -user_cert.

/opt/oracle/occ/13.1.0.0/oraclehome/oracle_common/bin/orapki \
	wallet add \
	-wallet /opt/oracle/occ/13.1.0.0/occwallet \
	-user_cert -cert /opt/oracle/occ/13.1.0.0/occwallet/cert.cer \
	-pwd 'walletpassword'

Display the wallet contents

Requested Certificates:
User Certificates:
Subject:        CN=%1,CN=yourcn,CN=yourcn,OU=yourou,O=yourorg,C=yourww
Trusted Certificates:
Subject:        O=yourorg
Subject:        OU=Class 3 Public Primary Certification Authority,O=VeriSign\, Inc.,C=US
Subject:        CN=GTE CyberTrust Global Root,OU=GTE CyberTrust Solutions\, Inc.,O=GTE Corporation,C=US
Subject:        OU=Class 2 Public Primary Certification Authority,O=VeriSign\, Inc.,C=US
Subject:        OU=Class 1 Public Primary Certification Authority,O=VeriSign\, Inc.,C=US

Step 6 – Secure the OMS console using the OMS wallet
Now using emctl from the $OMS_HOME, secure the OMS console using the certificate contained in your wallet.

$OMS_HOME/bin/emctl \
	secure console \
	-wallet /opt/oracle/occ/13.1.0.0/occwallet \
	-sysman_pwd yoursysmanpassword

Step 7 – Secure the OMS upload port using the OMS wallet

$OMS_HOME/bin/emctl \
	secure oms \
	-wallet /opt/oracle/occ/13.1.0.0/occwallet \
	-trust_certs_loc /opt/oracle/occ/13.1.0.0/occwallet/ca.cer \
	-sysman_pwd yoursysmanpassword \
	-reg_pwd youragentregistrationpassword

Step 8 – Re-secure all agents
Apply patch 22568679 – otherwise the agents will not be able to communicate with the oms server

on the agents

- backup the file ewallet.p12 in /opt/oracle/occ/13.1.0.0/agentbase/agent_inst/sysman/config/server

- backup any cwallet.sso files

- Run secure agent command /opt/oracle/occ/13.1.0.0/agentbase/agent_13.1.0.0.0/bin/emctl \
	secure agent youragentregistrationpassword

Step 9 – Create Oracle wallets for agents
The OMS connects to the agents at this URL to submit management requests. At the moment, the agents still use self-signed certificates to secure this URL. For this process we create an Oracle wallet, on the OMS host, using the same ORAPKI command as for the OMS wallet. We will generate a certificate signing request from each agent wallet, submit those CSRs to a certificate authority, and import the received certificates. As with the OMS, the agents must use single-host certificates, not wildcard or subject alternate name (SAN) certificates.

Create directories on OMS server to store agent wallets

mkdir /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets
mkdir /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets/agenthost1
mkdir /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets/agenthost2

Create the Agent wallet – This time do NOT use -auto_login_local, use only -auto_login, as you will distribute these wallets to the agent hosts after generating them on the OMS host.

/opt/oracle/occ/13.1.0.0/oraclehome/oracle_common/bin/orapki \
	wallet create \
	-wallet /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets/agenthost1 \
	-auto_login -pwd 'walletpassword'

Add key to the Agent wallet
To determine the correct fully qualified domain name for each agent, execute emctl status agent from the agent home.
/opt/oracle/occ/13.1.0.0/agentbase/agent_13.1.0.0.0/bin/emctl status agent
in the output of above command, use the below part

Agent URL              : https://YOURAGENTHOST:3872/emd/main/
/opt/oracle/occ/13.1.0.0/oraclehome/oracle_common/bin/orapki \
	wallet add \
	-wallet /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets/agenthost1 \
	-dn "cn=yourcn, ou=yourou, o=yourorg, c=yourc" \
	-keysize 2048 \
	-pwd 'walletpassword'

Create certificate signing request (CSR) based on this key

/opt/oracle/occ/13.1.0.0/oraclehome/oracle_common/bin/orapki \
	wallet export \
	-wallet /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets/agenthost1 \
	-dn "cn=yourcn, ou=yourou, o=yourorg, c=yourc" \
	-request /tmp/CSR_agenthost1.txt

Generate third party certificates, from your Certificate Signing Authority, using the above CSR (Certificate Signing Request)
Get the third party certificate and put them in following locations
root in /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets/ca.cer
user in /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets/agenthost1/cert.cer

Import the root, intermediate, and user certificates into the agent wallets
root & intermediate certificates must be imported using -trusted_cert

/opt/oracle/occ/13.1.0.0/oraclehome/oracle_common/bin/orapki \
	wallet add \
	-wallet /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets/agenthost1 \
	-trusted_cert -cert /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets/ca.cer \
	-pwd 'walletpassword'

You must import the user certificate using -user_cert.

/opt/oracle/occ/13.1.0.0/oraclehome/oracle_common/bin/orapki \
	wallet add \
	-wallet /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets/agenthost1 \
	-user_cert -cert /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets/agenthost1/cert.cer \
	-pwd 'walletpassword'

Step 10 – Configure the agents to use their wallets
Inside the agent wallets you’ve just created on OMS server, you will find a cwallet.sso file. Take this file from each agent’s wallet and copy it to the agent host.

On the agent host
– Stop the agent – emctl stop agent
– Backup existing cwallet.sso files
ex –
mv /opt/oracle/occ/13.1.0.0/agentbase/agent_inst/sysman/config/cwallet.sso /opt/oracle/occ/13.1.0.0/agentbase/agent_inst/sysman/config/cwallet.sso.20161216
– Copy the cwallet.sso file from OMS server (location – /opt/oracle/occ/13.1.0.0/agentbase/occagentwallets/agenthost1/cwallet.sso) and copy it to the agent host to this directory – $AGENT_INSTANCE_DIR/sysman/config/server/
– change permissions on $AGENT_INSTANCE_DIR/sysman/config/server/cwallet.sso to 640
– Start the agent – emctl start agent

Check agent certificate with the following methods
– Check Certificate by accessing url https://agenthost1:3872/emd/main/
– from oms server call this – /opt/oracle/occ/13.1.0.0/oraclehome/bin/emctl secdiag openurl -url https://agenthost1:3872/emd/main/

Step 11 – Secure WebLogic with the OMS wallet
Securing WebLogic with a wallet only works as of EM12c R3, earlier versions must use a Java keystore. See note Metalink note – 1527874.1 for more information.

Import the root and intermediate certificates to the keystore on the OMS host’s agent
Use the default password welcome for the agent keystore, and alias names rootcacert and intercacert.

/opt/oracle/occ/13.1.0.0/agentbase/agent_13.1.0.0.0/bin/emctl \
	secure add_trust_cert_to_jks \
	-trust_certs_loc /opt/oracle/occ/13.1.0.0/occwallet/ca.cer \
	-alias rootcacert \
	-password welcome

Back up some WLS configuration files, just in case, before securing WLS with your certificate.

mkdir /home/ora_occ/wlscertbak/
cp -a /opt/oracle/occ/13.1.0.0/gc_inst/em/EMGC_OMS1/emgc.properties /home/ora_occ/wlscertbak/
cp -a /opt/oracle/occ/13.1.0.0/gc_inst/user_projects/domains/GCDomain/nodemanager/nodemanager.properties /home/ora_occ/wlscertbak/
cp -a /opt/oracle/occ/13.1.0.0/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/components/OHS/ohs1/keystores/proxy /home/ora_occ/wlscertbak/
cp -a /opt/oracle/occ/13.1.0.0/gc_inst/user_projects/domains/GCDomain/config/config.xml /home/ora_occ/wlscertbak

Stop OMS

/opt/oracle/occ/13.1.0.0/oraclehome/bin/emctl stop oms

Secure WLS using the OMS wallet created earlier

/opt/oracle/occ/13.1.0.0/oraclehome/bin/emctl secure wls \
	-wallet /opt/oracle/occ/13.1.0.0/occwallet \
	-sysman_pwd sysmanpassword

Stop Webtier

/opt/oracle/occ/13.1.0.0/oraclehome/bin/emctl stop oms -all

Start OMS

/opt/oracle/occ/13.1.0.0/oraclehome/bin/emctl start oms

Check the Certificate
– Check the certificate by accessing admin server port –
ex – this is the default port – https://OMSSERVER:7101/
– from oms server call this – /opt/oracle/occ/13.1.0.0/oraclehome/bin/emctl secdiag openurl -url https://OMSSERVER:7101/

Troubleshooting
To Check certificates
openssl s_client -connect :7799
You can use any used port in place of 7799

To Check certificate dates
openssl s_client -connect :7799| openssl x509 -noout -dates

To Check the certificates used by OMS upload
/opt/oracle/occ/13.1.0.0/oraclehome/bin/emctl secdiag openurl -url https://yourserver:1159/empbs/upload

To Check the certificates used by Target Agent
/opt/oracle/occ/13.1.0.0/oraclehome/bin/emctl secdiag openurl -url https://youragentmachine:3872/emd/main/

Reference – EM 12c, EM 13c: Troubleshooting Guide for Enterprise Manager Cloud Control Agent Communication Errors Like: peer Not authenticated (Doc ID 1556543.1)

Next Page »

Create a free website or blog at WordPress.com.