Automatic Statistics Gathering Fails #3

Automatic Statistics Gathering Fails #3 July 29, 2009

Posted by mwidlake in internals, performance.
Tags: data dictionary, performance, statistics
trackback

I’m steeling a few minutes from my lunch time to finish off this thread on the automated statistics gathering failing.
<<first post … <second post

There is another way to spot what the automated stats gathering choked on but it is not that reliable. But it does involve digging a little into Oracle’s internals, which is fun.

Under 10.2 onwards, when you gather stats on a segment, either directly or via a schema, database or the automatic stats collection, Oracle stores the stats that are replaced for you automatically {I say 10.2 as I am not sure that segment-level gather_table/index_stats are automatically backed up like this. I have some old notes saying it seemed not but I am not sure if this was 9i, 10.1 or me just getting confused at the time. If anyone knows or can check back, let me know 🙂 }

This means you can get the stats back using the dbma_stats.restore_xxxxxxxx_stats procedures. eg dbms_stats.restore_table_stats(ownname=>user,tabname=>’TEST_P’,as_of_timestamp=>systimestamp-1);
This will restore the stats of my table TEST_P to what they were at this time yesterday. You do not need to create a stattab table and store the prevous stats manually.
I’m not going to blog any more right now about this handy feature, the “PL/SQL packages and types” manual will tell you what you need, but I will comment that by default you can only go back 31 days.

Oracle gives you a table to see the history of stats stored, DBA_TAB_STATS_HIST

desc dba_tab_stats_history
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                              VARCHAR2(30)
 TABLE_NAME                                         VARCHAR2(30)
 PARTITION_NAME                                     VARCHAR2(30)
 SUBPARTITION_NAME                                  VARCHAR2(30)
 STATS_UPDATE_TIME                                  TIMESTAMP(6) WITH TIME ZONE

select owner,table_name, partition_name,stats_update_time
from dba_tab_stats_history
where table_name='TEST_P'  and owner=user

OWNER    TABLE_NAME PARTITION_NAME  STATS_UPDATE_TIME
-------- ---------- --------------- --------------------------------
WIDLAKEM TEST_P                     28-JUL-09 13.59.43.647785 +00:00
WIDLAKEM TEST_P     ID_10K          28-JUL-09 13.59.43.647785 +00:00
WIDLAKEM TEST_P     ID_30K          28-JUL-09 13.59.43.647785 +00:00
WIDLAKEM TEST_P     ID_MAX          28-JUL-09 13.59.43.647785 +00:00
WIDLAKEM TEST_P     ID_40K          28-JUL-09 13.59.43.647785 +00:00
WIDLAKEM TEST_P     ID_20K          28-JUL-09 13.59.43.647785 +00:00
WIDLAKEM TEST_P                     28-JUL-09 13.37.42.043557 +00:00
WIDLAKEM TEST_P     ID_MAX          28-JUL-09 13.37.42.043557 +00:00
WIDLAKEM TEST_P     ID_40K          28-JUL-09 13.37.42.043557 +00:00
WIDLAKEM TEST_P     ID_30K          28-JUL-09 13.37.42.043557 +00:00
WIDLAKEM TEST_P     ID_20K          28-JUL-09 13.37.42.043557 +00:00
WIDLAKEM TEST_P     ID_10K          28-JUL-09 13.37.42.043557 +00:00
WIDLAKEM TEST_P                     28-JUL-09 09.11.40.098445 +00:00
WIDLAKEM TEST_P     ID_MAX          28-JUL-09 09.11.40.098445 +00:00
WIDLAKEM TEST_P     ID_40K          28-JUL-09 09.11.40.098445 +00:00
WIDLAKEM TEST_P     ID_30K          28-JUL-09 09.11.40.098445 +00:00
WIDLAKEM TEST_P     ID_20K          28-JUL-09 09.11.40.098445 +00:00
WIDLAKEM TEST_P     ID_10K          28-JUL-09 09.11.40.098445 +00:00

You can see that I probably gathered stats on my test table 3 times yesterday, each time I gathered at a granularity of all, ie global and partition stats. Note also, all partitions and the table get the same timestamp. I think this is because oracle records the timestamp as when the stats for the set of segments was swapped into the data dictionary {note, not started}.

That’s quite useful. However, there is no similar view for indexes. That’s a shame. But you can recover index stats and this view is saying nothing about the stats as they were?…Is there more to be found?… Let’s go see what that view is looking it

@vw_txt
Enter value for vw_name: dba_tab_stats_history
OWNER    VIEW_NAME                      TEXT_LENGTH
-------- ------------------------------ -----------
TEXT
--------------------------------------------------------------------------
SYS      DBA_TAB_STATS_HISTORY                  876
select u.name, o.name, null, null, h.savtime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 2 and o.owner# = u.user#
  union all
  -- partitions
  select u.name, o.name, o.subname, null, h.savtime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 19 and o.owner# = u.user#
  union all
  -- sub partitions
  select u.name, osp.name, ocp.subname, osp.subname, h.savtime
  from  sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.tabsubpart$ tsp,
        sys.wri$_optstat_tab_history h
  where h.obj# = osp.obj# and osp.type# = 34 and osp.obj# = tsp.obj# and
        tsp.pobj# = ocp.obj# and osp.owner# = u.user#
  union all
  -- fixed tables
  select 'SYS', t.kqftanam, null, null, h.savtime
  from  sys.x$kqfta t, sys.wri$_optstat_tab_history h
  where
  t.kqftaobj = h.obj#

Now that is interesting. Let’s go look at that table:

desc sys.wri$_optstat_tab_history
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------------------
 OBJ#                                      NOT NULL NUMBER
 SAVTIME                                            TIMESTAMP(6) WITH TIME ZONE
 FLAGS                                              NUMBER
 ROWCNT                                             NUMBER
 BLKCNT                                             NUMBER
 AVGRLN                                             NUMBER
 SAMPLESIZE                                         NUMBER
 ANALYZETIME                                        DATE
 CACHEDBLK                                          NUMBER
 CACHEHIT                                           NUMBER
 LOGICALREAD                                        NUMBER
 SPARE1                                             NUMBER
 SPARE2                                             NUMBER
 SPARE3                                             NUMBER
 SPARE4                                             VARCHAR2(1000)
 SPARE5                                             VARCHAR2(1000)
 SPARE6                                             TIMESTAMP(6) WITH TIME ZONE

Some of those columns look interesting. Unforturnately CACHEDBLK onwards are empty, even in version 11 {but it shows that Oracle have built in the ability to use those yet-to-be used columns about average cached blocks and cache hit ratios for segments you might have spotted in eg DBA_TAB_STATISTICS}.
Could there be an index version? Of course there is :-

desc sys.wri$_optstat_ind_history
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------------------
 OBJ#                                      NOT NULL NUMBER
 SAVTIME                                            TIMESTAMP(6) WITH TIME ZONE
 FLAGS                                              NUMBER
 ROWCNT                                             NUMBER
 BLEVEL                                             NUMBER
 LEAFCNT                                            NUMBER
 DISTKEY                                            NUMBER
 LBLKKEY                                            NUMBER
 DBLKKEY                                            NUMBER
 CLUFAC                                             NUMBER
 SAMPLESIZE                                         NUMBER
 ANALYZETIME                                        DATE
 GUESSQ                                             NUMBER
 CACHEDBLK                                          NUMBER
 CACHEHIT                                           NUMBER
 LOGICALREAD                                        NUMBER
 SPARE1                                             NUMBER
 SPARE2                                             NUMBER
 SPARE3                                             NUMBER
 SPARE4                                             VARCHAR2(1000)
 SPARE5                                             VARCHAR2(1000)
 SPARE6                                             TIMESTAMP(6) WITH TIME ZONE

I’ve written a couple of scripts to extract data from these underlying tables, with the extra data you can grab. Use these links to download the table script and the index script.
Here below is the index script though:-

-- show_ish_full
-- Martin Widlake 14/4/08
-- this gets the index stats history - there is no index equiv of dba_tab_stats_hist
-- that I can see in 10.2 (or 11)
--
col owner form a12
col ind_full_name form a40
col stat_upd_time form a15
col blevel form 99 head bl
select owner,ind_full_name,
to_char(stat_upd_time,'MMDD HH24:MI:SS') stat_upd_time
,numrows,numleafs,blevel,dist_kys,lf_p_ky, dbl_p_ky,clufac,samp_size,obj_hash
,greatest(nvl(samp_size,1),1)/greatest(nvl(numrows,1),1) samp_pct
 from (
select u.name owner, o.name ind_name,o.name ind_full_name
, h.savtime stat_upd_time
,h.rowcnt numrows,h.leafcnt numleafs, h.blevel blevel,h.distkey dist_kys
,h.lblkkey lf_p_ky,h.dblkkey dbl_p_ky,h.clufac clufac, h.samplesize samp_size,h.obj# obj_hash
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_ind_history h
  where  h.obj# = o.obj# and o.type# = 1 and o.owner# = u.user#
  union all
  -- partitions
  select u.name owner, o.name ind_name, o.name||'-'||o.subname ind_full_name
, h.savtime stat_upd_time
,h.rowcnt numrows,h.leafcnt numleafs, h.blevel blevel,h.distkey dist_kys
,h.lblkkey lf_p_ky,h.dblkkey dbl_p_ky,h.clufac clufac, h.samplesize samp_size,h.obj# obj_hash
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_ind_history h
  where  h.obj# = o.obj# and o.type# = 20 and o.owner# = u.user#
  union all
  -- sub partitions
  select u.name owner, osp.name ind_name, osp.name||'-'||ocp.subname||'='|| osp.subname ind_full_name
, h.savtime stat_upd_time
,h.rowcnt numrows,h.leafcnt numleafs, h.blevel blevel,h.distkey dist_kys
,h.lblkkey lf_p_ky,h.dblkkey dbl_p_ky,h.clufac clufac, h.samplesize samp_size,h.obj# obj_hash
  from  sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.indsubpart$ tsp,
        sys.wri$_optstat_ind_history h
  where h.obj# = osp.obj# and osp.type# = 35 and osp.obj# = tsp.obj# and
        tsp.pobj# = ocp.obj# and osp.owner# = u.user#
) where ind_name like upper(nvl('&indname','whoops')||'%')
and owner like upper('&indown')||'%'
order by owner,ind_name,stat_upd_time desc
/
clear colu

I’ll leave you to play with these as you wish, especially as I ran out of time to finish this blog an hour ago and now owe my current employer some of my weekend in return, but I’ll finish off with a use for the scripts.

One thing to bear in mind, these tables are holding the stats that were replaced, not the stats that were created. So they hold historical information.

If you read the earlier posts, you may remember that the automatic stats job can choke and fail on extremely large segments. Well, sometimes {and I have to admit, only sometimes} you can spot what segments it failed on. That is because, at the end of the window when the job gets stopped, it attempts to clean up after itself. It appears to check for stats that have changed during the run but it failed to process correctly and resets them. It copes the stats back. So you sometimes see this:-

@show_tsh_full
OWNER        TAB_FULL_NAME                            STAT_UPD_TIME           NUMROWS    NUMBLKS AVG_RL  SAMP_SIZE OBJ_HASH                           SAMP_PCT
------------ ---------------------------------------- -------------------- ---------- ---------- ------ ---------- --------                           --------
TABOWNER1   HUGETABLE4                               0522 06:00:01        1473990266   27319245    128    5660375    54714                               .004
TABOWNER1   HUGETABLE4                               0522 05:48:22        1327119100   24544873    128   13271191    54714                               .010

In the above case, you can see that this massive table had stats set at 5:48am and then again at 6:00.01am. That second record is the automated stats swapping the last known good stats back in place. That is the segment it choked on.

How do you stop the automatic stats job chocking on this extent? You do a manual gather of course. You will see the above job attempted a 0.1% sample size that almost worked at 05:48. The value it swapped back is 0.04 – which was the sample size of the manual gather I did a few days earlier to get over the last choke. {I also tweaked the histograms I gathered to save time}. I had not yet finished converting the manual fix to an automated one when I recorded the above.

Why do I think that the automatic job sets the timestamp in the WRI$_OPTSTAT_xxx_HIST tables when it swaps in the new stats and not when it starts? Because of the time of the entries after a large object has been analysed. Why do I think it checks for stats that have changed since the start of the run and replaces them rather than keeping track of the information as it goes? Because, when I first came across this choking issue, I was veryifying the problem after 10pm at night. When I realised there was an issue at around 1am, I started manually gathering stats. It took me until 4am, I checked they were in place and went to bed. Imagine my bad temper the next day when I found out that the automatic job had gone and re-wound most of the stats I had gathered, resetting them to what they had been at 10am the previous night. It was a seriously bad temper.

Addition – I’ve had a comment that may or may not be spam, saying they do not understand “the last bit” (If you are a real person molamola, I apologise). Reading it, it is maybe not clear, so this is what happened and why I was angry 🙂

1) I had realised the auto stats job was failing overnight.
2) I logged on from home at just before 10pm and watched the system
3) at 10pm the auto stats job started and collected stats on some small tables and indexes
4) around 11,11:30pm I realised is was not progressing,
5) from 11:30 to 3am in the morning, I created my own gather_table_stats and gather_index_stats statements and ran them.
6) at around 4am I went to bed, having checked stats were gathered and execution plans were good.
7) Got to work around 9am, a bit tired.
8) Found lots of performance issues due to old stats.
9) Shouted randomly at people and got angry, started re-gathering stats.
10) Worked out that at 6am, when the auto stats job finished, all the tables/indexes IT had intended to gather stats on but had failed to do so, it reset the stats to what they had been set to at 10pm the previous night, when it started.
ie very old and bad stats.

Comments»

1. Amit - July 31, 2009: Excellent post Martin..I feel that if we know that there are big tables to be analyzed then we should lock the table stats so that gather_stats_job doesn’t spend time on these and have a separate job with force option to gather stats on these objects..

Cheers
Amit
http://askdba.org/weblog/

Reply
2. mwidlake - August 1, 2009: You are spot-on Amit, A very valid response is to lock the statistics on the large object, handle gathering stats on the table/partitions manually and also the dependent indexes {don’t forget the indexes, and for large tables, don’t use the cascade option, it under-gathers for indexes – I need to post about that!}.

Many site alter the automatic job to gather stats on only the internal oracle segments and gather stats on user segments with a custom job, but that needs some thought in developing. Locking table stats and either gathering your own stats with the FORCE =>true option or allowing dynamic sampling are both useful alternatices.

Reply
3. Automated Statistics Gathering Silently Fails #2 « Martin Widlake’s Yet Another Oracle Blog - February 20, 2010: […] Automated Statistics Gathering Silently Fails #2 July 23, 2009 Posted by mwidlake in performance. Tags: performance, statistics trackback <previous post…next post> […]

Reply
4. Graham Oakes - June 9, 2010: Martin,

I knew this blog post would come in handy for me down the line. Just used your table script to get some stats history. Worked like a dream.

Cheers
Graham

Reply
mwidlake - June 10, 2010: Hey, someone used one of my scripts! 🙂

I’m glad it was of use Graham.

The script IS copyright though. You now owe me 0.0000001p (which covers you for a lifetime of use)

Reply
5. anon - October 29, 2010: Sloppy kisses Martin. Its late I’m in the office and your query has been a lifesaver. I started writing my own, then google and you came to the rescue and, you know, copy and paste is so much easier.
Being a bloke you might not want to take me up on the first offer. In fact I retract it. But thanks for the query anyway.

Reply
mwidlake - October 30, 2010: Well my personal preferences for kisses is cute brunette ladies, but I appreciate the sentiments. I’m very glad I could help

Reply

Martin Widlake's Yet Another Oracle Blog