Monday, February 19, 2018

Gather stats slowness on partitioned table

In our environment, We have a common Oracle Home 12.1.0.2 for our Unit & QA database hosted on same database server where we are gathering stats on a partitioned (156 partitions) table manually using below command. The table has incremental stats enabled in both Unit & QA environment and there is an partitioned index PK_1894 (156 partition) on the table. The testing team is loading the data in exact same way in both the environments and we confirmed that they are not loading or updating data in older partitions.

 dbms_stats.set_table_prefs('OIDFATM','FCT_COMMON_POLICY_SUMMARY','INCREMENTAL','TRUE');

dbms_stats.gather_table_stats(ownname=>'OIDFATM',tabname=>'FCT_COMMON_POLICY_SUMMARY',cascade=>TRUE,degree => DBMS_STATS.DEFAULT_DEGREE);

 In unit environment the gather stats job is taking 35 mins and in QA env it is completing within 2 mins. Upon further analysis we found that in Unit the gather stats job is gathering stats for all 156 partitions both on table & index causing longer run time.  In QA env the gather stats job is just gathering stats only for 4-5 partitioning due to which it is completing within 2 min. 

I wanted to check why does optimizer think that the stats are stale for all partitions for UNIT database.

Further investigation revealed that Histogram creation on few columns are being generating as developement team has been running some select aggregasion  query that was accessing multiple partitions in UNIT region. As histograms are gathered , hence gather stats program was gathering stats on all partitions causing longer run time. Testing team was not running those SQL in QA region. I decided to stop gathering histogram in UNIT database and after that the program is completed 2 min.

dbms_stats.gather_table_stats(ownname=>'OIDFATM',tabname=>’FCT_COMMON_POLICY_SUMMARY’, METHOD_OPT=>'for all columns size 1', cascade=>TRUE, degree => DBMS_STATS.DEFAULT_DEGREE);

 METHOD_OPT=>'for all columns size 1' is omitting histograms creation. In our QA environment, so far histograms are not created.

SQL Plan baseline from AWR

One of the concurrent program in EBS Development region was reported as running slow and the issue was fixed after i created SQL plan baselines after taking up better execution plan for 2 sql_id in AWR history reports.   Below are the steps

SQL_ID             PLAN_HASH          RESPONSE_IN_SEC
-----------           -----------           ----------------
36qn50vfsgzjh    2709963026         .000257534  AWR history  --  36,750 exec
36qn50vfsgzjh    584524445           .509160245  cursor memory --  23,544 exec

a07y05uya8653  1271677127          .000286352  AWR history    --  17,356 exec   
a07y05uya8653  4293861213          .681914067  cursor memory - 14398 exec


create sql tuning set  ( RUN all below step as SYS user)

exec dbms_sqltune.create_sqlset(sqlset_name => 'a07y05uya8653_sqlset_dev2',description => 'dev2_ebstax_sql2');

Load the sql plan from AWR & use filter option for any specific plan hash value

declare
baseline_ref_cur DBMS_SQLTUNE.SQLSET_CURSOR;
begin
open baseline_ref_cur for
select VALUE(p) from table(
DBMS_SQLTUNE.SELECT_WORKLOAD_REPOSITORY(&begin_snap_id, &end_snap_id,'sql_id='||CHR(39)||'&sql_id'||CHR(39)||'',NULL,NULL,NULL,NULL,NULL,NULL,'ALL')) p;
DBMS_SQLTUNE.LOAD_SQLSET('a07y05uya8653_sqlset_dev2', baseline_ref_cur);
end;
/

OR

declare
baseline_ref_cur DBMS_SQLTUNE.SQLSET_CURSOR;
begin
open baseline_ref_cur for
select VALUE(p) from table(
DBMS_SQLTUNE.SELECT_WORKLOAD_REPOSITORY(&begin_snap_id, &end_snap_id,'sql_id='||CHR(39)||'&sql_id'||CHR(39)||' and plan_hash_value=1271677127',NULL,NULL,NULL,NULL,NULL,NULL,'ALL')) p;
DBMS_SQLTUNE.LOAD_SQLSET('a07y05uya8653_sqlset_dev2', baseline_ref_cur);
end;
/

Verify sql plan value

select * from table(dbms_xplan.display_sqlset('a07y05uya8653_sqlset_dev2','&sql_id'));

Load into baseline from sql tuning set

set serveroutput on
declare
my_int pls_integer;
begin
my_int := dbms_spm.load_plans_from_sqlset (
sqlset_name => 'a07y05uya8653_sqlset_dev2',
sqlset_owner => 'SYS',
fixed => 'NO',
enabled => 'YES');
DBMS_OUTPUT.PUT_line(my_int);
end;
/

SQL> SELECT SQL_HANDLE,CREATOR,  PARSING_SCHEMA_NAME, CREATED, ENABLED, ACCEPTED, FIXED, module FROM   DBA_SQL_PLAN_BASELINES order by created;
SQL_HANDLE                     CREATOR    PARSING_SCHEMA_NAME            CREATED                        ENA ACC FIX MODULE
------------------------------ ---------- ------------------------------ ------------------------------ --- --- --- ----------------------------------------------------------------
SQL_8d714864a2f6cad0           SYS        APPS                           14-AUG-17 05.13.28.000000 PM   YES YES NO  ARHGNRW
SQL_851abbb354b962d0           SYS        APPS                           14-AUG-17 05.21.43.000000 PM   YES YES NO  ARHGNRW

SELECT NAME,OWNER,CREATED,STATEMENT_COUNT FROM DBA_SQLSET where name in ('36qn50vfsgzjh_sqlset_dev2','a07y05uya8653_sqlset_dev2');
NAME                           OWNER                          CREATED                        STATEMENT_COUNT
------------------------------ ------------------------------ ------------------------------ ---------------
36qn50vfsgzjh_sqlset_dev2      SYS                            14-AUG-17                                    1
a07y05uya8653_sqlset_dev2      SYS                            14-AUG-17                                    1


select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'SQL_8d714864a2f6cad0',format=>'basic'));
select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'SQL_851abbb354b962d0',format=>'basic'));

Drop sql tuning set if required

BEGIN
  DBMS_SQLTUNE.DROP_SQLSET( sqlset_name => 'a07y05uya8653_sqlset_dev2' );
END;

/

ORA-00607 Error 607 encountered recovering transaction db_block_checking

It was fine morning but by afternoon, it was an ooopss !! usual DBA days :) .

In our production  Oracle EBS R12.1.3  PROD system, we received data corruption issues that caused the bounce SMON to go down and after which SMON could not recover the corrupt data block and instance could not be brought up.  DB alert log we found below error

ORACLE Instance EBSXX (pid = 14) - Error 607 encountered while recovering transaction (7, 5) on object 46989.
Errors in file /move/prod/db/tech_st/11.2.0/admin/EBSXX_ebsp-pkg/diag/rdbms/ebsp/EBSXX/trace/EBSXX_smon_16908566.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [6856], [0], [65], [], [], [], [], [], [], [], [], []
PMON (ospid: 21430580): terminating the instance due to error 474
Thu Oct 12 07:53:55 2017
opiodr aborting process unknown ospid (23396816) as a result of ORA-1092
Thu Oct 12 07:53:55 2017
ORA-1092 : opitsk aborting process
Thu Oct 12 07:53:55 2017
System state dump requested by (instance=1, osid=21430580 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /move/XXX/trace/EBSXX_diag_30999036_20171012075355.trc
Instance terminated by PMON, pid = 21430580

The root cause was identified as DB parameter db_block_checking value was not set. You can follow below steps to recover that database. Connect as SYSDBA and using existing spfile start the database in nomount option.

SQL> Startup nomount ;
SQL> Create pfile='/tmp/corrupt.ora' from spfile ;
SQL> Shutdown immediate;

Modify the corrupt.ora and add below line at the end of the file to recover SMON with trace details
event = '10513 trace name context forever, level 2'

SQL> Startup pfile='/tmp/corrupt.ora'

Oracle support also suggested another option to set below parameter dynamically to start DB

set "_smu_debug_mode"=1024 to stop Transaction recovery (it can be done dynamically):
alter system set "_smu_debug_mode"=1024;

The above step will allow the database to startup by making SMON to skip the recovery of corrupt data block. Below SQL will help to find more detail of block affected by data corruption issue.

SQL> select useg.segment_name, useg.segment_id, useg.tablespace_name, useg.status from dba_rollback_segs useg where useg.segment_id = '7';

SQL> select XIDUSN, XIDSLOT, XIDSQN, USED_UBLK, USED_UREC from gv$transaction where XIDUSN = 7;

Edit your init@SID.ora and add below

_CORRUPTED_ROLLBACK_SEGMENTS= _SYSSMU7_1302289624$

In our case , we were lucky that we just truncated the table and ran EBS concurrent program that successfully repopulated the table. But in most cases, you have to recover the table

The issue can be prevented by setting below parameter in database
Alter system set db_block_checking = MEDIUM scope=both;

NOTE : Setting the value for db_block_checking to TRUE or FULL might add overhead of 1% to 10% in overall performance issue. 

Oracle database performance notes

Master Note: Database Performance Overview (Doc ID 402983.1) SQL Tuning Health-Check Script (SQLHC) (Doc ID 1366133.1) Achieving Optimal Per...