jump to navigation

Decrypting Histogram Data #3 – is 7 or 15 Characters Significant? September 15, 2009

Posted by mwidlake in internals, performance.
Tags: , , ,

At last I am going to wrap up about looking at Histogram data and endpoint_value. Sorry about the delay, life is a tad complex at present. This is also a very, very long post. You may want to skip to the end to get the conclusions

I first posted about how to translate endpoint_values.
I then posted in the second post a week ago about how endpoint_value is limited to 7 or so characters for varchar2 columns.
{Addition – this later post discusses the content of endpoinit_value for dates}

Coskan contacted me to say he believed I had missed something, and of course he was right – but I’d laid big hints that I was using theory and not practical test and there was more to know. {There is usually more to know, only practical tests confirm whether your theory works or not. And even after that, it seems there is still more to know!}

OK, I have a test table, created with the following {oh, this is under, Enterprise Edition on linux}:

create table hist_test
as select rownum  id
,lpad(chr(mod(rownum,6)+65),5,chr(mod(rownum,6)+65) )vc_1
,lpad('A',2,'A')||dbms_random.string('U',1) vc_2
,lpad('B',3,'B')||dbms_random.string('U',1) vc_3
,lpad('C',4,'C')||dbms_random.string('U',1) vc_4
,lpad('D',5,'D')||dbms_random.string('U',1) vc_5
,lpad('E',6,'E')||dbms_random.string('U',1) vc_6
,lpad('F',7,'F')||dbms_random.string('U',1) vc_7
,lpad('G',8,'G')||dbms_random.string('U',1) vc_8
,lpad('H',16,'H')||dbms_random.string('U',1)) vc_16
,lpad('I',40,'I')||dbms_random.string('U',1) vc_40
from dba_objects
where rownum < 1001

I need to skew the data in some columns so that different WHERE clauses should see different expected numbers of rows
I now need to add some skew to the data so you can see the histogram in action .

update hist_test set
where mod(id,10)=1

Gather histogram stats, 10 buckets:
,method_opt=>’FOR ALL COLUMNS SIZE 10′)

And looking at ALL_TAB_COLUMNS we can see the general stucture of the data (just the VC_ columns are shown)

COL_NAME NUM_DIST LOW_V                HI_V                 N_BUCK  AVG_L
ID          1,000 1                    1000                     10      4
VC_1            6 AAAAA                FFFFF                     6      6
VC_2           26 AAA                  AAZ                      10      4
VC_3           26 BBBA                 BBBZ                     10      5
VC_4           26 CCCCA                CCCCZ                    10      6
VC_5           26 DDDDDA               DDDDDZ                   10      7
VC_6           26 EEEEEEA              EEEEEEZ                  10      8
VC_7           26 FFFFFFFA             FFFFFFFZ                 10      9
VC_8           26 GGGGGGGGA            GGGGGGGGZ                10     10
VC_16          26 HHHHHHHHHHHHHHHHA    HHHHHHHHHHHHHHHHZ        10     18

I have created a table with varchar2 columns set to a fixed sting of characters the length indicated by the column name with the last column varying from A to Z. So VC_5 contains EEEEEx, where x is A to Z. 26 distinct values each column.

If you look at the above, all columns are showing as having 26 values and 10 buckets EXCEPT VC_40, but it does have 26 distinct values:

select count(distinct(VC_40)) from hist_test;

I have my function to unpack the endpoint_value for varchar2 columns from the number stored to the character string it represents. My function is not perfect, but it is reasonably good. Go checkout post one for the function.
Looking at the histograms with my function you can see the below {I show only some records}

colname                               END_VAL ROWCOUNT MOD_REAL
VC_5     354460798875962000000000000000000000        1 DDDDDA
VC_5     354460798875972000000000000000000000        2 DDDDDD
VC_5     354460798875986000000000000000000000        3 DDDDDG
VC_5     354460798876071000000000000000000000        9 DDDDDY
VC_5     354460798876080000000000000000000000       10 DDDDDZ
VC_7     364886116489977000000000000000000000        1 FFFFFGK
VC_7     364886116489977000000000000000000000        2 FFFFFGK
VC_7     364886116489977000000000000000000000        3 FFFFFGK
VC_7     364886116489977000000000000000000000        4 FFFFFGK
VC_7     364886116489977000000000000000000000        9 FFFFFGK
VC_7     364886116489977000000000000000000000       10 FFFFFGK
VC_16    375311434103976000000000000000000000        1 HHHHHI:
VC_16    375311434103976000000000000000000000        2 HHHHHI:
VC_16    375311434103976000000000000000000000        3 HHHHHI:
VC_16    375311434103976000000000000000000000        9 HHHHHI:
VC_16    375311434103976000000000000000000000       10 HHHHHI:
VC_40    380524092910976000000000000000000000    1,000 IIIIIJM

For VC_5, there are 10 buckets, all with different endpoint_values.
The VC_7 has 10 buckets, but most have the same endpoint_value.
The VC_16 has 10 records in the HISTOGRAM table, all with the same endpoint_value.
VC_40 has only one record in the HISTOGRAM table.

Basically, if I am right, WHERE filters on a column holding values that are the same for the first seven or so characters will not be well supported by the histograms. We should see the estimated cost for such a WHERE filter to be wrong.

I have skewed the data so that Histograms should show more expected values for DDDDDA than DDDDDS, for FFFFFFFA than FFFFFFFS etc

update hist_test
where mod(id,10)=1

And now I’ll see how the CBO estimates the rows coming back for some WHERE clauses.

select count(*) from hist_test
where VC_5 = 'DDDDDA'

Execution Plan
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)|
|   0 | SELECT STATEMENT   |           |     1 |     7 |     8   (0)|
|   1 |  SORT AGGREGATE    |           |     1 |     7 |            |
|*  2 |   TABLE ACCESS FULL| HIST_TEST |   100 |   700 |     8   (0)|

select count(*) from hist_test
where VC_5 = 'DDDDDS'

Execution Plan
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)|
|   0 | SELECT STATEMENT   |           |     1 |     7 |     8   (0)|
|   1 |  SORT AGGREGATE    |           |     1 |     7 |            |
|*  2 |   TABLE ACCESS FULL| HIST_TEST |    49 |   343 |     8   (0)|

For column VC_5 it all works as we would expect. the CBO estimates 100 values for DDDDDA and 49 for DDDDDS. I’ve shown the actual count(*)’s as well to show there is a real variation in numbers. There are good reasons why the estimates do not match reality, I won’t go into them now but this does highlight that histograms can help but do have flaws.

What about the other rows? We are only interested in the WHERE clause and the estimate for the number of rows from the table access, so I’ll show only those.

where VC_7 = 'FFFFFFFA'
|*  2 |   TABLE ACCESS FULL| HIST_TEST |   100 |   900 |     8   (0)|

where VC_7 = 'FFFFFFFS'
|*  2 |   TABLE ACCESS FULL| HIST_TEST |    49 |   441 |     8   (0)|

|*  2 |   TABLE ACCESS FULL| HIST_TEST |   100 |  7200 |     8   (0)|

|*  2 |   TABLE ACCESS FULL| HIST_TEST |    47 |   216 |     8   (0)|

|*  2 |   TABLE ACCESS FULL| HIST_TEST |  1000 | 42000 |     8   (0)|

|*  2 |   TABLE ACCESS FULL| HIST_TEST |  1000 | 42000 |     8   (0)|

{and just to show there is no typo in the string of 'I's above...
where VC_40 = lpad('I',40,'I')||'S'
|*  2 |   TABLE ACCESS FULL| HIST_TEST |  1000 | 42000 |     8   (0)|

In the above you can see that the CBO estimates the costs for VC_7 exactly the same as it did for VC_5. Despite the endpoint_values no longer distinguishing the buckets.

Even more telling is tha for VC_16 the CBO also is able to predict there are more values ending in A than S.

As for VC_40, the CBO utterly fails to spot the skew. In fact it utterly fails to spot that there is more than one value in the column (look back at the section showing min and max values and number of distinct values), and there is only one histogram bucket. so I supposed it is no surprise.

How does the CBO still detect the skew for VC_7 and VC_16, even though the endpoint_valuess are the same?
Because it swaps to using column ENDPOINT_ACTUAL_VALUE.

ENDPOINT_ACTUAL_VALUE is only populated for varchar2 columns when the precision of endpoint_value is exceeded:

------- ---------------------------------------------
VC_5           7 DDDDDR
VC_5           8 DDDDDV
VC_5           9 DDDDDY
VC_5          10 DDDDDZ
VC_6           0 EEEEEF8           EEEEEEA
VC_6           1 EEEEEF8           EEEEEEC
VC_6           2 EEEEEF8           EEEEEEF
VC_6           3 EEEEEF8           EEEEEEH
VC_6           4 EEEEEF8           EEEEEEK
VC_6           5 EEEEEF8           EEEEEEN
VC_6           6 EEEEEF8           EEEEEEP
VC_6           7 EEEEEFn           EEEEEES
VC_6           8 EEEEEFn           EEEEEEU
VC_6           9 EEEEEFn           EEEEEEX
VC_6          10 EEEEEFn           EEEEEEZ
VC_7           1 FFFFFGK           FFFFFFFA
VC_7           2 FFFFFGK           FFFFFFFC
VC_7           3 FFFFFGK           FFFFFFFF
VC_7           4 FFFFFGK           FFFFFFFI
VC_7           5 FFFFFGK           FFFFFFFK
VC_7           6 FFFFFGK           FFFFFFFO
VC_7           7 FFFFFGK           FFFFFFFR
VC_7           8 FFFFFGK           FFFFFFFT
VC_7           9 FFFFFGK           FFFFFFFW
VC_7          10 FFFFFGK           FFFFFFFZ
VC_16          1 HHHHHI:           HHHHHHHHHHHHHHHHA
VC_16          2 HHHHHI:           HHHHHHHHHHHHHHHHD
VC_16          3 HHHHHI:           HHHHHHHHHHHHHHHHG
VC_16          4 HHHHHI:           HHHHHHHHHHHHHHHHJ
VC_16          5 HHHHHI:           HHHHHHHHHHHHHHHHL
VC_16          6 HHHHHI:           HHHHHHHHHHHHHHHHO
VC_16          7 HHHHHI:           HHHHHHHHHHHHHHHHS
VC_16          8 HHHHHI:           HHHHHHHHHHHHHHHHU
VC_16          9 HHHHHI:           HHHHHHHHHHHHHHHHX
VC_16         10 HHHHHI:           HHHHHHHHHHHHHHHHZ
VC_40      1,000 IIIIIJM

You can see that VC_5 columns have no ENDPOINT_ACTUAL_VALUEs but VC_6, VC_7 and VC_16 do. VC_40 does not.

So, at what point does the CBO stop storing values in ENDPOINT_ACTUAL_VALUES? 32 characters. I created  another test table and these are the low/high values, number of distinct values and number of histogram buckets:

ID 1,000 1 1000 10 4
NUM_1 10 0 9 10 3

The name of the column is the length of the fixed string, so the columns are actually one character longer, as can be seen by the “AVG_L” column.

And just to check {as after all this is just a nice theory until you test}, I skew the number of records ending in ‘A’ for VC_31 and VC_32:-

update hist_test2
where mod(id,10)=1

And see what the CBO thinks the rows to be identified are:-

where VC_31 = lpad(‘B’,31,’B’)||’A’
|* 2 | TABLE ACCESS FULL| HIST_TEST2 | 100 | 3300 |

where VC_31 = lpad(‘B’,31,’B’)||’S’
|* 2 | TABLE ACCESS FULL| HIST_TEST2 | 49 | 1617 |

where VC_32 = lpad(‘C’,32,’C’)||’A’
|* 2 | TABLE ACCESS FULL| HIST_TEST2 | 1000 | 34000 |

where VC_32 = lpad(‘C’,32,’C’)||’S’
|* 2 | TABLE ACCESS FULL| HIST_TEST2 | 1000 | 34000 | [/sourcecode ]

Yes. The CBO can see the skew for the 32 characters of VC_31 and not for the 33 characters of VC_32

So in conclusion

  • endpoint_value is only accurate for varchar2 columns up to around 7 characters.
  • If the varchar2 exceeds the precision of the endpoint_value column then endpoint_actual_value is used.
  • endpoint_actual_value is used for varchar2 columns up to 32 characters and not for columns less than 7 characters.
  • Histograms will not help with columns having many values with the same leading 32 or more characters.
  • If you column has the same leading 32 characters or more, even the number of distinct values is not gathered by DBMS_STATS.
  • You can’t tell how SQL/CBO will work from theory, you have to test. And still consider you knowledge as open to correction after that 🙂

Another Day, Another Obscure Oracle Error September 11, 2009

Posted by mwidlake in internals, performance.
Tags: , ,

I seem to be spending my life hitting odd bugs or errors in Oracle’s performance-related areas at the moment. I guess part of it is because I am using features not widely touched, part is I’m not working as the SYS or SYSTEM user {in my experience a lot of people working on such things do it when connected as sysdba or at least as SYSTEM} and part is that Larry Ellisson personally hates me {he fills in for God when it comes to Oracle}.

I’m seeing this a lot today,and it seems virtually unknown:-

ERROR at line 1:
ORA-20003: Unable to gather system statistics
ORA-06512: at “SYS.DBMS_STATS”, line 15882
ORA-06512: at line 1

I’m also seeing this, but this error is documented:-

ERROR at line 1:
ORA-20000: Unable to gather system statistics : insufficient privileges
ORA-06512: at “SYS.DBMS_STATS”, line 15822
ORA-06512: at line 1

The rest of this post will explain my path to only partial enlightenment. If you have hit the above errors and found this page via Google, this might help you out and you might like to skip towards the end.
If you just read my blog, regard this as a very long “tweet” or whatever the hell they are. Again, feel free to skip to the end. Or just skip.

The task I’m doing which is causing me to hit these issues is that I’m trying to assure myself of the ability to role back gathering system statistics before getting the DBA team to do it on a production system. {I am not in the DBA team on this one}.

No system statistics have been gathered yet so I have the default set, I’m sure many of you will recognise them:

SNAME           PNAME                     PVAL1 PVAL2
SYSSTATS_MAIN   CPUSPEEDNW         1,567.798516
SYSSTATS_MAIN   IOSEEKTIM             10.000000
SYSSTATS_MAIN   IOTFRSPEED         4,096.000000
SYSSTATS_INFO   DSTART                          07-25-2006 12:39
SYSSTATS_INFO   DSTOP                           07-25-2006 12:39
SYSSTATS_INFO   FLAGS                  1.000000
SYSSTATS_INFO   STATUS                          COMPLETED

I decided to pull the current system statistics into a statistics table:-

exec dbms_stats.create_stat_table(‘DWPERF’,’AD_BF_STATS’,’DW_COMMON_MG’)


07-25-2006 12:39 07-25-2006 12:39 CPU_SERIO

SYSSTAT0909101309 S 4 1
10 4096 1567.79852

Two records created, first is the header record about what this statid is all about, the second contains the stored info. I’ve highlighted the relevant records.

All well and good, I can save my stats. Why do I want to? Because we may gather system statistics on an infrequent but regular basis, and I want a record of them. The automtically stored history only goes back a month {by default}.

Of course, there is no need for me to explicitly export the stats to the table, it can be done as part of the gathering process, by stating the statistics table, owner and an ID when I do the gather, so I did:-

exec dbms_stats.gather_system_stats(gathering_mode => ‘INTERVAL’,interval => 15,stattab=> ‘

I went off for a cup of coffee, came back, checked my system stats.

SNAME           PNAME                     PVAL1 PVAL2
SYSSTATS_MAIN   CPUSPEEDNW         1,567.798516
SYSSTATS_MAIN   IOSEEKTIM             10.000000
SYSSTATS_MAIN   IOTFRSPEED         4,096.000000
SYSSTATS_INFO   DSTART                          07-25-2006 12:39
SYSSTATS_INFO   DSTOP                           07-25-2006 12:39
SYSSTATS_INFO   FLAGS                  1.000000
SYSSTATS_INFO   STATUS                          COMPLETED

Oh. No change. No system stats gathered. Damn.

Well, it’s a test box, there may be too little worload to detect anything – like none! I checked the SGA for recent code and the only SQL was that for recording the stats gathering :-).
So, repeated with me running some workload in another window for 15 minutes.

Still no change, still no system stats. Double Damn!

I remembered from reading around that if MREADTIME is less than SREADTIME the stats gathered could be ignored so I tried again.
And again.
And then decided it would not magically work, there was a problem.

I’ll gather the system stats with START and STOP and really hammer the box for a while, ensuring with autotrace
exec dbms_stats.gather_system_stats(gathering_mode => ‘START’,stattab=>’AD_BF_STATS’,statown


exec dbms_stats.gather_system_stats(gathering_mode => ‘STOP’);
BEGIN dbms_stats.gather_system_stats(gathering_mode => ‘STOP’); END;
ERROR at line 1:
ORA-20003: Unable to gather system statistics
ORA-06512: at “SYS.DBMS_STATS”, line 15882
ORA-06512: at line 1


Oh Hell. Well, at least I now know that the chances are my interval systenm stats collection failed with this, but silently.

I bet if I dug around I would find a few failed jobs recorded where the INTERVAL version of the stats collection failed with this. I have not checked, I am just up to my eyes right now.

I found nothing at all via Google or Metalink about this exact error for gathering system stats. BUT maybe I lacked the correct priveleges on my non-SYS account. So I granted ANALYZE AND DICTIONARY and ANALYZE ANY to my user, as those or the priveleges mentioned for many of the DBMS_STATS management procedures.

Nope, did not help.

Blow it, If DBMS_STATS won’t let me keep the stats in my table, I’ll just rely on the default store of them, and not state my stats table in the call:

DWD2_DWPERF> exec dbms_stats.gather_system_stats(gathering_mode => ‘START’)
BEGIN dbms_stats.gather_system_stats(gathering_mode => ‘START’); END;
ERROR at line 1:
ORA-20000: Unable to gather system statistics : insufficient privileges
ORA-06512: at “SYS.DBMS_STATS”, line 15822
ORA-06512: at line 1

Oh. Well, that’s interesting.
If I don’t state a stats table the gather fails when initiated.
If I state a stats table the gather fails when I stop.
How frustrating.

I decided to give up. Then I decided I won’t be beaten and to grant myself DBA role and see if it works.

DWDEV2> exec dbms_stats.gather_system_stats(gathering_mode => ‘START’)
PL/SQL procedure successfully completed.


But will it now error when I stop the gather…

DWDEV2> exec dbms_stats.gather_system_stats(gathering_mode => ‘STOP’)
PL/SQL procedure successfully completed.

Deep Joy.

And it gathered stats.

SNAME           PNAME                     PVAL1 PVAL2
SYSSTATS_MAIN   CPUSPEED           1,545.000000
SYSSTATS_MAIN   CPUSPEEDNW         1,567.798516
SYSSTATS_MAIN   IOSEEKTIM             10.000000
SYSSTATS_MAIN   IOTFRSPEED         4,096.000000
SYSSTATS_MAIN   SREADTIM               5.000000
SYSSTATS_INFO   DSTART                          09-11-2009 13:02
SYSSTATS_INFO   DSTOP                           09-11-2009 13:02
SYSSTATS_INFO   FLAGS                  1.000000
SYSSTATS_INFO   STATUS                          COMPLETED

{ignore the lack of mreadtime and sreadtime, nothing happend on the box in the few seconds I was collecting for}

SO, the lack of DBA role was a problem but had been hidden by my attempts to use a stats table. Now surely I can do what I wanted to and store the stats in my stats table…

exec dbms_stats.gather_system_stats(gathering_mode => ‘START’,stattab=>’AD_BF_STATS’,statown

PL/SQL procedure successfully completed.


exec dbms_stats.gather_system_stats(gathering_mode => ‘STOP’)
BEGIN dbms_stats.gather_system_stats(gathering_mode => ‘STOP’); END;
ERROR at line 1:
ORA-20003: Unable to gather system statistics
ORA-06512: at “SYS.DBMS_STATS”, line 15882
ORA-06512: at line 1


Let’s check back.

exec dbms_stats.gather_system_stats(gathering_mode => ‘START’)
PL/SQL procedure successfully completed.


exec dbms_stats.gather_system_stats(gathering_mode => ‘STOP’)
PL/SQL procedure successfully completed.

That works, because I am not involving a stats table. If you look back at the two errors, they come from different (but very similar) line numbers. I’d missed that for a couple of hours, I blame dyslexia.


Gathering and storing System Stats is as buggy and random as the rest of the dbms_stats functionality. {yes, I am peeved about this}. It Seems..

  • If you gather system stats with the interval method, it can silently fail. This might only be if you involve a stats table.
  • You need the DBA role to gather system stats,  otherwise you get ORA-20000, via ORA-06512 at line 15822
  • analyze and dictionary and analyze any are not enough to allow a non DBA account to gather system stats.
  • If you try and store the previous system stats in your stats table as part of the gather you get ORA-20003, via ORA-06512 at line 15882 {different line}, even if you have DBA role.
  • If you just rely on Oracle preserving the previous version of the stats, you can gather system stats fine.
  • You can work around the issue by exporting the system stats to your stats table as a single step first, then gathering system stats without involving your stats table.

There is no metalink for the ORA-20003 via line 15882 in metalink and no google hit for it, so I reckon it is a damned rare event. And since the last time I blogged about a damned rare error, I see 1 or 2 hits on that blog page every day since :-).

Decrypting Histogram Data #2 September 3, 2009

Posted by mwidlake in internals, performance.
Tags: , , ,

Hint – if you have a column where most or all of the entries have the same 15 plus characters, don’t bother gathering histograms on it. They will not help you a lot. Or at all. eg you have a column that holds “CUSTOMER IDENTIFIER xxxxxxxxx”, “CUSTOMER IDENTIFIER yyyyyyyy”, etc. Of course, good design suggests that the “CUSTOMER IDENTIFIER” bit is probably redundant and can be got rid of, but we live in the real world and may not have the power or ability to enforce that change, or we might have 3 or 4 such “prefix” strings.

Further, histograms on dnumerics longer than 15 significant digits will also potentially behave not as you would expect.

I better justify my assertion.

In the previous post on decrypting histogram data I covered how Oracle turns a varchar2 value into a numeric value that is held in the ENDPOINT_VALUE of DBA_TAB_HISTOGRAMS and I also gave you a cunning (but slightly flawed) function to convert it back. I use it in this post so you might want to go look at the prior post. Sorry it is long, I can’t stop rambling.

First, I’ll create some test data {Oh, this is on on Linux}. The below script create a table HIST_TEST with columns NUM_1 through to NUM7, which hold numbers padded out to greater lengths and then 0-9 as the last value. Similarlry columns VC_2 to VC_8 are padded out and have a random character added. VC_1 is a random 5-character string.

create table hist_test
as select rownum  id
,mod(rownum,10)  num_1
,trunc(dbms_random.value(1,10)) num_2
,1000+trunc(dbms_random.value(1,10)) num_3
,1000000+trunc(dbms_random.value(1,10)) num_4
,1000000000+trunc(dbms_random.value(1,10)) num_5
,1000000000000+trunc(dbms_random.value(1,10)) num_6
,1000000000000000+trunc(dbms_random.value(1,10)) num_7
,lpad(chr(mod(rownum,6)+65),5,chr(mod(rownum,6)+65) )vc_1
,lpad('A',2,'A')||dbms_random.string('U',1) vc_2
,lpad('B',3,'B')||dbms_random.string('U',1) vc_3
,lpad('C',4,'C')||dbms_random.string('U',1) vc_4
,lpad('D',5,'D')||dbms_random.string('U',1) vc_5
,lpad('E',6,'E')||dbms_random.string('U',1) vc_6
,lpad('F',7,'F')||dbms_random.string('U',1) vc_7
,lpad('G',8,'G')||dbms_random.string('U',1) vc_8
--below skews the table to AB.
                                       ,   dbms_random.string('U',1)) vc_16
,lpad('I',40,'I')||dbms_random.string('U',1) vc_40
from dba_objects
where rownum < 1001
  ,method_opt=>'FOR ALL COLUMNS SIZE 10');

The below is a simple select against DBA_TAB_COLUMNS to see the information for the column {Oh, I say simple, but you have to use the functions utl_raw.cat_to_number and utl_raw.cast_to_varchar2 to turn the raw values held in the columns LOW_VALUE and HIGH_VALUE to something we humans can read. Why does Oracle Corp have to make life so difficult? *sigh*.

Oh, click on “view plain” on the box below to get a better layout.

,num_distinct num_dist
                 ,          low_value
       ) low_v
                 ,          high_value
       ) hi_v
,num_nulls    n_nulls
,num_buckets  n_buck
,avg_col_len  avg_l
from dba_tab_columns
where table_name ='HIST_TEST'
and owner=USER
order by owner,column_id

COLUMN_NAME       NUM_DIST LOW_V              HI_V                    N_NULLS N_BUCK  AVG_L
ID                   1,000 1                  1000                          0     10      4
NUM_1                   10 0                  9                             0     10      3
NUM_2                    9 1                  9                             0      9      3
NUM_3                    9 1001               1009                          0      9      4
NUM_4                    9 1000001            1000009                       0      9      6
NUM_5                    9 1000000001         1000000009                    0      9      7
NUM_6                    9 1000000000001      1000000000009                 0      9      9
NUM_7                    9 1000000000000001   1000000000000009              0      9     10
VC_1                     6 AAAAA              FFFFF                         0      6      6
VC_2                    26 AAA                AAZ                           0     10      4
VC_3                    26 BBBA               BBBZ                          0     10      5
VC_4                    26 CCCCA              CCCCZ                         0     10      6
VC_5                    26 DDDDDA             DDDDDZ                        0     10      7
VC_6                    26 EEEEEEA            EEEEEEZ                       0     10      8
VC_7                    26 FFFFFFFA           FFFFFFFZ                      0     10      9
VC_8                    26 GGGGGGGGA          GGGGGGGGZ                     0     10     10
VC_16                   26 HHHHHHHHHHHHHHHHA  HHHHHHHHHHHHHHHHZ             0     10     18
VC_40                    1 IIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIII            0      1     42/

I now select the data out of the DBA_TAB_HISTOGRAMS table to see what is really being stored. For the sake of brevity {which means, “so I can finish this post tonight”} I’ll just show bits, but if you want, download the script right at the end and, if you created the functions from the prior post, you can play with this yourself.

First, here you can see some of the values actually stored in ENDPOINT_VALUE and what they translate into:-

TABLE_NAME   colname                        ENDPOINT_VALUE ROWCOUNT REAL_VAL
HIST_TEST    NUM_5                              1000000007      760 1000000007
HIST_TEST    NUM_5                              1000000008      870 1000000008
HIST_TEST    NUM_5                              1000000009    1,000 1000000009
HIST_TEST    NUM_6                           1000000000001      103 1000000000001
HIST_TEST    NUM_6                           1000000000002      219 1000000000002
HIST_TEST    NUM_6                           1000000000003      328 1000000000003
HIST_TEST    NUM_6                           1000000000004      427 1000000000004
HIST_TEST    VC_2     338824386822815000000000000000000000        8 AAU
HIST_TEST    VC_2     338824624507302000000000000000000000        9 AAW
HIST_TEST    VC_2     338824782963627000000000000000000000       10 AAY
HIST_TEST    VC_3     344035480872391000000000000000000000        0 BBB@
HIST_TEST    VC_3     344035481491361000000000000000000000        1 BBBB
HIST_TEST    VC_3     344035482419816000000000000000000000        2 BBBE
HIST_TEST    VC_3     344035483348271000000000000000000000        3 BBBH
HIST_TEST    VC_3     344035483967241000000000000000000000        4 BBBJ

Note that for numerics the number itself is stored and I do not need to translate it.
For VARCHAR2 columns the value held is the 15-digit number padded with zeros.
Also note, for VC_2 the range covered seems to end at AAY not AAZ and column VC_3 starts at BBB@ not BBBA {I am creating values with the last character set to A-Z}. Also, bucket 8 for VC_2 ends in a control character.

To reduce this I add a fudge to my function {again, see previous post}. It helps:

HIST_TEST    VC_2           9 AAW               AAX
HIST_TEST    VC_2          10 AAY               AAZ
HIST_TEST    VC_3           0 BBB@              BBBA
HIST_TEST    VC_3           1 BBBB              BBBC
HIST_TEST    VC_3           2 BBBE              BBBF
HIST_TEST    VC_3           3 BBBH              BBBI
HIST_TEST    VC_3           4 BBBJ              BBBK
HIST_TEST    VC_3           5 BBBM              BBBN
HIST_TEST    VC_3           6 BBBO              BBBP
HIST_TEST    VC_3           7 BBBR              BBBS
HIST_TEST    VC_3           8 BBBT              BBBU
HIST_TEST    VC_3           9 BBBW              BBBX
HIST_TEST    VC_3          10 BBBY              BBBZ
HIST_TEST    VC_4           0 CCCC@             CCCCA
HIST_TEST    VC_4           1 CCCCB             CCCCC
HIST_TEST    VC_4           2 CCCCD             CCCCE
HIST_TEST    VC_4           3 CCCCH            CCCCH
HIST_TEST    VC_4           4 CCCCJ            CCCCJ

As you can see, I now get a better translation, but it still goes wrong sometimes (see last couple of rows). So, feel free to take my functions and use them, but be aware even the modified version is not perfect. If YOU perfect it, can I have a copy please 🙂

The below shows that Numeric value histograms break when you exceed 15 digits:

colname                  END_VAL ROWCOUNT REAL_VAL
------- ------------------------ -------- --------
NUM_5                 1000000009    1,000 1000000009
NUM_6              1000000000001      103 1000000000001
NUM_6              1000000000002      219 1000000000002
NUM_6              1000000000003      328 1000000000003
NUM_6              1000000000004      427 1000000000004
NUM_6              1000000000005      542 1000000000005
NUM_6              1000000000006      651 1000000000006
NUM_6              1000000000007      771 1000000000007
NUM_6              1000000000008      881 1000000000008
NUM_6              1000000000009    1,000 1000000000009
NUM_7           1000000000000000      133 1000000000000000
NUM_7           1000000000000000      256 1000000000000000
NUM_7           1000000000000000      367 1000000000000000
NUM_7           1000000000000000      467 1000000000000000
NUM_7           1000000000000010      567 1000000000000010
NUM_7           1000000000000010      665 1000000000000010
NUM_7           1000000000000010      784 1000000000000010
NUM_7           1000000000000010      896 1000000000000010
NUM_7           1000000000000010    1,000 1000000000000010

This is the point at which storage of numbers for histograms breaks. You can see that NUM_6 is fine but NUM_7 is not. That is because NUM_6 is below 15 significant digits and NUM_7 is over 15 significant digits and effectively gets truncated.

Histograms on numeric values with more than 15 significant digits will not work as you expect, possible not at all.

With Varchar(2) values, histogram END_VALUES break even sooner, at around 7 digits:

colname                               END_VAL ROWCOUNT MOD_REAL
------- ------------------------------------- -------- --------
VC_5     354460798876024000000000000000000000        5 DDDDDO
VC_5     354460798876038000000000000000000000        6 DDDDDR
VC_5     354460798876047000000000000000000000        7 DDDDDS
VC_5     354460798876061000000000000000000000        8 DDDDDV
VC_5     354460798876071000000000000000000000        9 DDDDDY
VC_5     354460798876080000000000000000000000       10 DDDDDZ
VC_6     359673457682977000000000000000000000        0 EEEEEF8
VC_6     359673457682977000000000000000000000        1 EEEEEF8
VC_6     359673457682977000000000000000000000        2 EEEEEF8
VC_6     359673457682977000000000000000000000        3 EEEEEF8
VC_6     359673457682977000000000000000000000        4 EEEEEF8
VC_6     359673457682977000000000000000000000        5 EEEEEF8
VC_6     359673457682977000000000000000000000        6 EEEEEF8
VC_6     359673457682977000000000000000000000        7 EEEEEF8
VC_6     359673457682978000000000000000000000        8 EEEEEFn
VC_6     359673457682978000000000000000000000        9 EEEEEFn
VC_6     359673457682978000000000000000000000       10 EEEEEFn
VC_7     364886116489977000000000000000000000        0 FFFFFGK
VC_7     364886116489977000000000000000000000        1 FFFFFGK
VC_7     364886116489977000000000000000000000        2 FFFFFGK
VC_7     364886116489977000000000000000000000        3 FFFFFGK
VC_7     364886116489977000000000000000000000        4 FFFFFGK
VC_7     364886116489977000000000000000000000        5 FFFFFGK
VC_7     364886116489977000000000000000000000        6 FFFFFGK
VC_7     364886116489977000000000000000000000        7 FFFFFGK
VC_7     364886116489977000000000000000000000        8 FFFFFGK
VC_7     364886116489977000000000000000000000        9 FFFFFGK
VC_7     364886116489977000000000000000000000       10 FFFFFGK

You can see that for column VC_5 the actual ENDPOINT_VALUE is varying at the 14th and 15th significant digit and my translated value changes. But for VC_6 the numeric ENDPOINT_VALUE is hardly changing. Each row translated to one of two values.
For VC_7 the ENDPOINT_VALUE is static. All histogram END_VALUES are the same.

This means Histograms will not work properly for any VARCHAR(2) columns which do not vary for the first 7 or more characters and any characters after the 7th will be ignored.

Or does it? My logic is correct but tomorrow (or soon after) I’ll try some actual tests over theory… {So don’t ping me with corrections just yet}

I’ve not mentioned VC_16 and VC_40 have I? I will tomorrow. Maybe :-|.

Finally, Here is the selecting script, as promised.

col column_name form a7 head colname
col rowcount form 99,999
col real_val form a17
col end_val form 999999999999999999999999999999999999
col end_string form a10
col endpoint_actual_value form a40
col mod_real form a17
select --owner
,dth.endpoint_value  end_val
,dth.endpoint_number rowcount
                 ,          dth.endpoint_value
       ) real_val

                 ,          dth.endpoint_value
       ) mod_real
from dba_tab_histograms dth
,dba_tab_columns dtc
where dth.table_name ='HIST_TEST'
and  dth.owner=USER
and dth.owner=dtc.owner
and dth.table_name=dtc.table_name
and dth.column_name=dtc.column_name
order by dth.table_name,dth.column_name,dth.endpoint_number

RAC GV$SQL type Bugs August 28, 2009

Posted by mwidlake in internals.

This week, I have mostly been hitting obscure bugs with Oracle RAC and GV$SQL/GV$SQL_PLAN.
ORA-00600 with GV$SQL plus ORA-12085 with GV$SQL_PLAN and ORA-03113 with V$SQL_PLAN. I’ve listed them in the hope anyone else hitting them will pick up this Blog entry (they are obscure but, even so, there is not a lot “out there” about them.)

I’m basically looking at getting a handle on overall Oracle SQL performance variation. This is driven by two areas of consideration:

  • SQL suddenly going rogue (IE a good plan swapping to a very bad plan)
  • The system-wide impact of changes, like gathering system statistics or changing initialisation parameters.

In an ideal world you would have a fully developed test system to try out the latter and your monitoring tools (like OEM/AWR or Toad or whatever you set up) would pick up the former. Back in reality, you can find yourself lacking both and I have had the idea for a looooooong time  to set up something light-weight that checks the V$SQL… views directly to pick these up and now I am implementing something to do this. And finding out some of the pains and aggravations in doing so…

Enough meandering, these are some bugs some other poor sods are going to hit and these are the answers I have found. They may not be the best answers but it is all I have to offer :-).

Select count(*) from GV$SQL or select count(*) form GV$SQLAREA gives an ORA-00600 error and the usual trc file to wonder over. Happens in to on any platform and is fixed on No backport I am aware of.  Check out doc ID  357016.1 (Sorry if this is wrong but I am having major cut n paste issues over crap Virign wireless Broadband sevices, amongst other crap Virgin Wireless Broadband issues). Inclusion of the SQL_FULLTEXT (CLOB) column is suggested as the issue.

select * from gv$sql on a RAC enviornment
ORA-00600: internal error code, arguments: [kolaslGetLength-1], [], [], [], [], [], [], [].
Doc ID: 357016.1 Type: PROBLEM
Modified Date : 10-SEP-2008 Status: PUBLISHED

Our solution was to select the exact columns we wanted and this excluded the SQL_FULLTEXT column. The error went away. I did not pursue any further as it is a known but generally “swept under the Oracle Carpet” error. Heck, it only impacts those nutters using RAC… plus multibyte character set,  like al32utf8.

Next up, selecting * or a set of columns from GV$SQL_PLAN on linux on gives an ORA-12805 error.

ORA-12805: parallel query server died unexpectedly
ORA-06512: at “MWPERF.PKG_XXX_MONITOR”, line 311
ORA-06512: at line 1

Now, we really did NOT expect that as we had turned off parallel processing across the DB but I figure if you query against a GV$ view it is going to run across all nodes and compile the results, so parallel is forced. Well, on on HP hardware/Redhat linux it barfs.

I reproduced the error by pulling the insert statement driven off GV$SQL_PLAN into a simple insert statement, it gave the same error.

Just using V$QL_PLAN you get ORA-03113 end-of-file on communication channel. I think this is occuring on the remote nodes and passing back the 12805 error.

INSERT INTO ad_sql_runtime_plan
ERROR at line 1:
ORA-03113: end-of-file on communication channel

Anyway, google and metalink both threw up no 10.xxxxx errors when I searched but did reveal a 9i errorr when trying to look at FILTER_PREDICATES column, fixed in 10.1 (Bug 3545517). 
Also, I found bug 5166445 linking to 5933643. Which is no bloody {sorry} use to anyone as 5933643 is one of those oh-so-common oracle bug/issue numbers that links to a page saying “you can’t see this as it is private” which is just an insult to people trying to solve their own problems. Bad Oracle Corp, Bad Boy.
{Yes, I am annoyed by this, it is just simply poor by Oracle to reference an SR on public metalink that is not publically viewable and TOO MUCH is not publically viewable – I’ve know incidents when my own damn SR is not publically viewable and I had no idea why}.

It looks like there was a known and specific bug with FILTER_PREDICATES that impacted to Maybe to commemorate it Oracle introduced a similar bug on the same table for to On a different column of course.

SO, the solution. Well, I managed to narrow it down to one column. I could select count(column_name) from GV$SQL_PLAN for every column except OTHER_XML which is a Clob and PROJECTION, which is the problem. THat was the only column that errored and it gave the error codes I stated above.

Do select * or include PROJECTION in your column list and selects against GV$SQL_PLAN and V$SQL_PLAN may well go wrong under on linux or wherever you get the problem. Remove the column and you could be OK. If not, test by using “select count {column-name} from V$SQL_AREA;” for all the other columns one by one and see what errors for you.

I did check to see if GV$sql_plan resolved to a complex view on many x$ objects but it did not, all of the columns (including PROJECTION) come off the X$KQLFXL internal object.

Now, the question is, having solved the problem for my own specific requirement on a RAC/linux platform and done a Blog entry, is it worth spending half a day raising this obscure bug with Oracle Corp…?

Decrypting Histogram Data August 11, 2009

Posted by mwidlake in internals, performance.
Tags: , , ,

{part two is here}

I’m looking at Histograms at the moment and I’ve hit a problem I always hit when looking at histograms. So I’ve decided once and for all to see if I can find a solution.

The value held in DBA_TAB_HISTOGRAMS.ENDPOINT_VALUE for VARCHAR2 columns is a number with no apparent link to reality.

{Before anyone who has read my previous posts suggests it, I’ve already looked at the view and the relevant column resolves to data in either sys.histgrm$ or sys.hist_head$ and the underlying values held are still numbers and not processed in the view, so they are numbers. I think}.
{{I have a sneaking suspicion this could turn into a set of increasingly obscure postings…}}
{{{THIS one might be my most obscure so far!}}}

Let’s have a look:

select --owner
,endpoint_value  end_val
,endpoint_number rowcount
from all_tab_histograms
where table_name ='TEST_P'
and  owner='WIDLAKEM'
order by table_name,column_name,endpoint_number
--------------- --------------- ---------- ---------------
TEST_P          STATUS                   0          41,523
TEST_P          STATUS                   1          42,065
TEST_P          STATUS                   2          42,099
TEST_P          VC_1            3.3882E+35               0
TEST_P          VC_1            3.4951E+35               1
TEST_P          VC_1            3.6487E+35               2
TEST_P          VC_1            3.7558E+35               3
TEST_P          VC_1            3.9095E+35               4
TEST_P          VC_1            4.0162E+35               5
TEST_P          VC_1            4.1697E+35               6
TEST_P          VC_1            4.3235E+35               7
TEST_P          VC_1            4.4305E+35               8
TEST_P          VC_1            4.5841E+35               9
TEST_P          VC_1            4.6914E+35              10

So what in heck is 4.6914E+35 {last value listed}?

Well, it is a number of course. If I set my column heading appropriately I can see this:-

col end_val form 999,999,999,999,999,999,999,999,999,999,999,999
col column_name form a10
select column_name
,endpoint_value  end_val
,endpoint_number rowcount
from all_tab_histograms
where table_name ='TEST_P'
and  owner='WIDLAKEM'
and column_name='VC_1'
order by table_name,column_name,endpoint_number
COLUMN_NAM                                          END_VAL   ROWCOUNT
---------- ------------------------------------------------ ----------
VC_1        338,822,823,410,931,000,000,000,000,000,000,000          0
VC_1        349,512,448,932,628,000,000,000,000,000,000,000          1
VC_1        364,867,102,368,954,000,000,000,000,000,000,000          2
VC_1        375,575,265,424,979,000,000,000,000,000,000,000          3
VC_1        390,949,334,699,583,000,000,000,000,000,000,000          4
VC_1        401,618,352,253,998,000,000,000,000,000,000,000          5
VC_1        416,972,612,321,579,000,000,000,000,000,000,000          6
VC_1        432,345,727,450,272,000,000,000,000,000,000,000          7
VC_1        443,054,364,035,286,000,000,000,000,000,000,000          8
VC_1        458,409,658,691,912,000,000,000,000,000,000,000          9
VC_1        469,139,289,515,351,000,000,000,000,000,000,000         10

The ENDPOINT_VALUE is an encoded representation of the varchar2 value, based on the trick of turning each character into it’s ascii equivalent (0-255) and then multiplying it by 256*no-of-characters-in to the string and adding it all together. Let’s use “Cat” as an example.

  • Last character is ‘t’, ascii value is 116=116;
  • Next-to-last character is ‘b’, ascii value 97, *(256)= 24832;
  • Chr 3 in is ‘C’, ascii value 67, *(256*256) =4390912
  • Total is 4390912+24832+116=4415860

What Oracle actually does is slightly different {and I could have this slightly wrong, my brain waved a white flag a couple of hours back} it takes the first 15 characters and multiplies the first character ascii value by power(256*15), second character ascii value by power(256*14) etc until it runs out of characters or gets to character 15.

How do I know this? I decoded the ‘C’ in the kernel :-). No, I didn’t, I found a link to an encode script Jonathan Lewis had written {it’s also in his latest book, or at least in the set of scripts the book tells you how to download}. I’ve lost the original link I found, and thus a nice little article on the whole topic, but this link will do until I can re-find the original. {NB this referenced article mentions using the hexstr function which may be neater but, as I said earlier, my brain has given up}.

I’ve outrageously stolen Jonathan’s script which encodes a varchar2 string into the relevant numeric and used it as a template to create a decode version too. Just a couple of notes:

– I can imagine this might not work if you have multibyte characters.
– Oracle confuse things by doing something like translating the 15 character into a 36-digit string – and then taking only the first 15 significant digits. This kind of messes up the untranslate.

So, I create three functions:

  • hist_chartonum that converts a varchar2 into something very similar to what is stored in the histogram views. Jonathan will find it incredibly familiar. An optional second paramter turns off the truncation that oracle does for the histogram data, so you can get the full glorious 36 significant digits for 15 characters if you wish.
  • hist_numtochar which converts the numeric back to a string. It gets it slightly wrong due to the truncation of the least significant 20 or so characters. Again, an optional second paramater allows it to not replicate the trunaction and work with all 15 characters.
  • hist_numtochar2 which is an attempt to allow for the truncation errors. I add power(256,9) to the value if it has been truncated. It seems to make eg the numeric representation of AAAAA translate back to AAAAA not AAAA@. Yes, it’s a fudge.

This is the script:-

-- cre_hist_funcs.sql
-- Heavily borrowed from Jonathan Lewis, sep 2003
-- MDW 11/8/09 - all mistakes mine.
-- hist_chartonum converts a string to a number in a very similar way
-- to how Oracle does so for storing in column ENDPOINT_VALUE in XXX_TAB_HISTOGRAMS
-- Optionally get it to not truncate the value in the same way to only 15 digits.
-- hist_numtochar converts the ENDPOINT_VALUE back to human-readable format.
-- Optinally get it to go beyond 15 numerics (7 or 8 characters)
-- hist_numtochar2 attempts to allow for truncation errors.
-- JLs version has been checked against, and
-- I've used on
set timing off
create or replace function hist_chartonum(p_vc varchar2
                                         ,p_trunc varchar2 :='Y') return number
m_vc varchar2(15) := substr(rpad(p_vc,15,chr(0)),1,15);
m_n number := 0;
  for i in 1..15 loop
/*  dbms_output.put(ascii(substr(m_vc,i,1)));
    dbms_output.put_Line(to_char( power(256,15-i) * ascii(substr(m_vc,i,1)),
                        ); */
    m_n := m_n + power(256,15-i) * ascii(substr(m_vc,i,1));
  end loop;
-- this converts it from a 36 digit number to the 15-digit number used
-- in the ENDPOINT_VALUE Column.
  If p_trunc = 'Y' then
    m_n := round(m_n, -21);
  end if;
-- dbms_output.put_line(to_char(m_n,'999,999,999,999,999,999,999,999,999,999,999,999'));
return m_n;

create or replace function hist_numtochar(p_num number
                                         ,p_trunc varchar2 :='Y') return varchar2
  m_vc varchar2(15);
  m_n number :=0;
  m_n1 number;
  m_loop number :=7;
m_n :=p_num;
if length(to_char(m_n))&lt;36 then
--dbms_output.put_line ('input too short');
  m_vc:='num format err';
  if p_trunc !='Y' then
    m_loop :=15;
  end if;
  for i in 1..m_loop loop
--    dbms_output.put_line(to_char(m_n1));
    if m_n1!=0 then m_vc:=m_vc||chr(m_n1);
    end if;
  end loop;
end if;
return m_vc;
create or replace function hist_numtochar2(p_num number
                                         ,p_trunc varchar2 :='Y') return varchar2
  m_vc varchar2(15);
  m_n number :=0;
  m_n1 number;
  m_loop number :=7;
m_n :=p_num;
if length(to_char(m_n))&lt;36 then
--dbms_output.put_line ('input too short');
  m_vc:='num format err';
  if p_trunc !='Y' then
    m_loop :=15;
  end if;
  for i in 1..m_loop loop
--    dbms_output.put_line(to_char(m_n1));
    if m_n1!=0 then m_vc:=m_vc||chr(m_n1);
    end if;
  end loop;
end if;
return m_vc;
rem Sample of use:

col id1 format 999,999,999,999,999,999,999,999,999,999,999,999
col id3 format 999,999,999,999,999,999,999,999,999,999,999,999
  'short'  text
 ,hist_chartonum('short') id1
 ,hist_numtochar(hist_chartonum('short')) id2
 ,hist_chartonum('short','N') id3
 ,hist_numtochar(hist_chartonum('short','N'),'N') id4      
from dual;
  'alongteststring'  text
 ,hist_chartonum('alongteststring') id1
 ,hist_numtochar(hist_chartonum('alongteststring')) id2
 ,hist_chartonum('alongteststring','N') id3
 ,hist_numtochar(hist_chartonum('alongteststring','N'),'N') id4      
from dual;
  'amuchlongerteststring'  text
 ,hist_chartonum('amuchlongerteststring') id1
 ,hist_numtochar(hist_chartonum('amuchlongerteststring')) id2
 ,hist_chartonum('amuchlongerteststring','N') id3
 ,hist_numtochar(hist_chartonum('amuchlongerteststring','N'),'N') id4      
from dual;
spool off

The test at the end produces the following output:-

strng  histgrm_translation
short  599,232,339,077,851,000,000,000,000,000,000,000

alongteststring  505,852,124,009,532,000,000,000,000,000,000,000

amuchlongerteststring  505,872,878,384,947,000,000,000,000,000,000,000

Final comment. Oracle trims the value stored in sys.histgrm$ and knackers up the untranslate. Why? Well, this is only me thinking out aloud, but this reduces the number from 36 to 15 significant digits and numerics are stored as one byte per two digits plus an offset byte {am I missing a length byte?}. So 10 bytes or so. This will keep the size of data stored down. I mean, as I posted here, the sys.histgrm table gets big enough as it is!

Tomorrow {or whenever my brain recovers} I’ll show how I use it and one or two oddities I have found or come across from other people’s blogs.

Friday Philosophy – Should the Software or the User be the Stupid One? August 7, 2009

Posted by mwidlake in internals, performance.
Tags: , ,

Oracle’s performance engine is complex and copes with a lot of database situations automatically – or to be more precise, it tries to cope with lots of database situations automatically.

Over the last few versions, Oracle has added many, many things to allow the database to cope automatically with all sorts of different data volumes, spreads of data, relationships between tables, use of different oracle technologies (By this I mean bitmap tables, index tables, partitions, clusters, external tables). All of these things aim to allow the database to just “cope” with whatever you need it to do, with less and less effort by the users {by users, I mean technical users; DBAs and Developers}. Thus it allows for “stupid” users. {I mean no offence, maybe read “inexperienced” instead of stupid}.

As an example, you can now have some very large tables consisting of several partitions and some status look-ups. You query against them. Oracle’s CBO will automatically ignore partitions it can ignore, use indexes or full table scans to use the least amount of IO,use histograms to spot where clauses are on low-cardinality values, Hash joins rather then nested loops as appropriate depending on memory availability, use bitmap indexes when it thinks it can and merge the results from several bitmap indexes, use function based indexes to support functions in where clauses….
It even self-gathers the information to look after all this. Column usage and table modifications are tracked, statistics are gathered when needed and in ways to support data skew, PGA and SGA can be automonitoring and managing…

It all sounds great. In fact, most of the time, for most people, it is great. {I know, most people reading this post are probably people who have encountered the problem systems and so know it goes wrong and so you need more knowledge to cope – you are a biased set of people. In the nicest way, I should add 🙂 } The idea is, I believe, that you do not neet to be smart to look after Oracle.

If it is not great, if this highly complex system gets it wrong and tries to satisfy SQL statements in sub-optimal ways, then the User has to step in and fix things. ie You.

It is now horrifically complex for us technical users to understand what is going on. You have to not only be “not stupid”, but “not average” either. Sometimes you have to be “not great”, ie brilliant.

In my example, we need to look at if the SQL is constructed to allow the indexes to be used, are functions correctly laid out to use function indexes, are partitions being maintained correctly, when were stats last gathered, did it include histograms and do they help, has oracle missed the need for histograms, are the indexes analyzed at a high enough sample size, are the bitmaps greatly slowing down inserts, have hints been used on the code, are initialisation parameters set to override default fucntionality…

You get the idea, I won’t drone on further. I didn’t even mention memory considerations though {OK, I’ll shut up}.

My point is, the more complex the software, the more “intelligent” it is, the more it is designed to allow for “stupid” users, then the more super-intelligent the user has to be to cope when it breaks.

How about an alternative?

How would it be if we went back to the Rule Based Optimizer and no automatic management of complex situations?

Oracle would maybe need to add a few rules to the RBO for it to cope with later developments, so it would be slightly more complex than V6 but not a lot.
Everything else, the User decides. You only gather stats you decide to gather, on objects you decide need them. No you don’t, it’s a Rule Based Optimizer – no stats gathering! {But see below}.

No automatic memory management. No automatic anything.

The User {the technical user, the DBA and Developer} would have to be smart. Not brilliant, just smart. You would probably have to do more, but most of it would be easier as the levels of complexity and interdependence are reduced. All those tweaks and tricks in the CBO and all the monitoring to cope with “complex” would not exist to go wrong.

Plus it might solve another concern I have. I think there is a chasm growing as there is no need to solve simple problems as Oracle copes but then having to solve complex problems when Orcle does not cope. If you don’t develop skills and experience solving the simple problems, how do you solve the complex ones? I think this is why most Oracle performance and architecture experts are old {Sorry, pleasantly middle-aged}. Young people new to the arena have a massive learning mountain to climb.

So, if we have stupid software, maybe we can get away with more stupid “smart” expert users. ie ALL of us can cope. You cut your teeth on smaller, simpler systems and learn how to cope with the stupid software beast. As you learn more, you learn to cope with more complex situations and they never get that complex as the database is not so “clever”

I’d actually still argue that all the intelligence gathering the Oracle database does should still continue – stats gathered on objects, the ability to gather information on memory usage and thus advice on changes, tracking column usages and table changes. But We, the Stupid Users get to look at it and use it as we see fit for our systems.

I’m sure many systems would not work quite so fast in my senario, but I’d rather have a system working at 75% it’s theoretical fastest all the time rather than one working at 95% and breaking regularly, and in ways so complex it needs weeks to work out and fix.

I now await all the comments to tell me how stupid I am {I can be blindlingly stupid, especially on Fridays}.

Looking inside v$ performance views August 5, 2009

Posted by mwidlake in internals, performance.
Tags: ,

I keep forgetting how to look inside the V$ performance objects, I’ve got a mental block on it. I had to ask a colleague again this week how to do it. So I’m blogging it. This should fix it in my mind.

You use the V$FIXED_VIEW_DEFINITION dynamic view to seem them. You have to use this as the damned things hide in a circular data dictionary black hole.

Here is an example. I want to know what the v$locked_object is actually looking at.

Name                                      Null?    Type
 XIDUSN                                             NUMBER
 XIDSLOT                                            NUMBER
 XIDSQN                                             NUMBER
 OBJECT_ID                                          NUMBER
 SESSION_ID                                         NUMBER
 ORACLE_USERNAME                                    VARCHAR2(30)
 OS_USER_NAME                                       VARCHAR2(30)
 PROCESS                                            VARCHAR2(12)
 LOCKED_MODE                                        NUMBER

OK, what is v$locked_object, is it a table, a view, a synonym…?

select owner,object_name,object_type
from dba_objects
where object_name ='V$LOCKED_OBJECT'

---------- --------------- ---------------

It’s a synonym. Not a view, even though people refer to the v$ views or performance views.

What is it a synonym for?

select synonym_name,table_owner,table_name
from dba_synonyms
where synonym_name='V$LOCKED_OBJECT'

--------------- --------------- --------------------

OK, it’s a synonym on sys.v_$locked_object. What is that?

select owner,object_name,object_type
from dba_objects
where object_name ='V_$LOCKED_OBJECT'

---------- ---------------- ---------------

It’s a view. Good, so what is commonly called a view IS a view, hidden by a synonym. What is also good is that we have been here before, I posted about looking inside data dictionary viewsabout a month ago. You can just extract the SQL of the view…

select owner,view_name,text
from dba_views
where view_name='V_$LOCKED_OBJECT'

--------------- --------------------
S","LOCKED_MODE" from v$locked_object

Bingo! Its a simple view on top of v$locked_object.
Sorry? v$locked_object? That’s where we started.
v$locked_object is a synonym for
v_$locked_object that is a view on top of

This is the circular black hole I mentioned. Grrrr.

The solution is the afore mentioned v$fixed_view_definition.

desc v$fixed_view_definition
 Name                    Null?    Type
 ----------------------- -------- ----------------
 VIEW_NAME                        VARCHAR2(30)
 VIEW_DEFINITION                  VARCHAR2(4000)

select * from v$fixed_view_definition
where view_name = 'V$LOCKED_OBJECT'

select xidusn,xidslot,xidsqn,object_id,session_id,
from gv$locked_object
where inst_id = USERENV('Instance')

It is a view on gv$locked_object. OK, let’s go look at that {anyone getting a bad feeling about this becoming another circular reference?}

select * from v$fixed_view_definition
where view_name = 'GV$LOCKED_OBJECT'

select x.inst_id,x.kxidusn, x.kxidslt, x.kxidsqn, l.ktadmtab,
s.indx, s.ksuudlna, s.ksuseunm, s.ksusepid, l.ksqlkmod
from x$ktcxb x, x$ktadm l, x$ksuse s
where x.ktcxbxba = l.kssobown and x.ktcxbses = s.addr

Now that’s better. It is showing what that view is really looking at.

I don’t know why the v$xxxx synonym to v_$xxxx view on v$xxxx synonym circular reference exists but I suspect it is something to do with allowing Oracle’s internal code to realise it has to look elsewhere to resolve the v$ objects, ie as being built on top of the x$ objects – which are themselves a way of exposing C objects {I think they are C objects} on memory structures… I’ll stop now as I am in deeper than I can swim. Maybe someone more adept with oracle internals has worked this out and blogged about it but I have not found it yet {maybe Jonathan Lewis has, he started mentioning the x$ and v$ views back in Oracle 7! Or René Nyffenegger, who’s pages on the v$ and x$ stuff I find very useful}

Sadly, you can’t look at the x$ objects at all unless you are logged on as SYS. If you have access to the SYS user on your company systems you should know enough to not go poking around on such things on production systems. Install Oracle on your PC and poke around on there. It can be addictive though.

I think I’ll leave it there for tonight.

Why is my SYSTEM Tablespace so Big? August 3, 2009

Posted by mwidlake in internals, VLDB.
Tags: , ,

How big is your SYSTEM tablespace? 512M? 1GB? 4GB? Bigger? Is it bigger than the whole of the first critical database you ever looked after? Why has the SYSTEM tablespace got so big?

You can easily check the size of the SYSTEM tablespace with any oracle development/admin GUI or with this script.

Enter value for ts_name: system
old   8: where tablespace_name like upper('&ts_name'||'%')
new   8: where tablespace_name like upper('system'||'%')

FILE_NAME                                  F_ID  TS_NAME         SIZE_M   THOU_BL
----------------------------------------- ----- ---------------- -------- -------
+DISCG/sid/datafile/system.272.596722817      1 SYSTEM             24,920   3,190

That is 24GB. It is not the biggest one I have seen, but it is up there in the top 3.

Not so many versions ago, it was not unusual to have a system tablespace of less then 128MB in size. In fact, if you did not use stored PL/SQL this was probably the case up to and through 8i and into 9i. So – it’s PL/SQL causing the bloat? No, probably not. Ahhh, what about all that AWR automatic real-time monitoring gubbins you hear so much about? Nope, that goes into SYSAUX.

No, it’s probably down to one or both of two internal tables, SYS.AUD$ and SYS.HISTGRM$.

Let’s check out what are the biggest objects in this particular SYSTEM tablespace:

select owner,segment_name,segment_type
 ,bytes/(1024*1024) size_m
 from dba_segments
 where tablespace_name = 'SYSTEM'
 and    bytes/(1024*1024) &gt;1
 order by size_m desc

------------ -------------------- ------------------ --------
SYS          C_OBJ#_INTCOL#       CLUSTER              13,313
SYS          AUD$                 TABLE                 5,688
SYS          I_H_OBJ#_COL#        INDEX                 1,863
SYS          I_AUD1               INDEX                 1,606
SYS          HIST_HEAD$           TABLE                   311
SYS          SOURCE$              TABLE                   224
SYS          IDL_UB1$             TABLE                   224
SYS          C_FILE#_BLOCK#       CLUSTER                 208
SYS          INDPART$             TABLE                   160
SYS          OBJ$                 TABLE                   144
SYS          I_HH_OBJ#_COL#       INDEX                   128
SYS          I_HH_OBJ#_INTCOL#    INDEX                   128
SYS          I_OBJ2               INDEX                    80
SYS          I_FILE#_BLOCK#       INDEX                    62
SYS          TABPART$             TABLE                    57
SYS          I_OBJ1               INDEX                    49

To be a little obtuse, I’ll start with the second largest object, SYS.AUD$, which you can see is getting close to 6GB in size {Oh, and SYS.I_AUD1, the fourth largest object, is the PK index on AUD$}.

AUD$ is the underlying table that holds all of the system auditing information that is turned on and off with the SQL AUDIT command. You can use AUDIT to monitor particular users, changes to system priveleges, access on key tables or particular activity like table changes. Go check the SQL manual if you want to know more.
The records for this captured activity all go into this table. On this particular system a few key priveleges and logons by certain users are the only things being audited, but the data has built up over 3 or 4 years.

On version 9 and 10 the options for managing this table are…poor. Bascially, you have to do it yourself. I think I am right to say that it is the only SYS-owned table that Oracle Corp say it is OK for you to move into another tablespace yourself and that you are allowed to delete and truncate data from. If you AUDIT a lot of things or activities you probably need to delete some data. {If you AUDIT a lot of things or activities and you don’t delete any of the data, you probably did not turn on AUDIT very long ago or have not checked the size of your SYSTEM tablespace for a while…}.

Oh, and one final thing on the AUD$ table. It is lacking indexes – it has only one {ignoring LOB indexes}. This is a good thing as indexes slow down inserts quite significantly {I use a rule of thumb that if it takes “N“ms to insert a record into a table, it will take “2.2*N“ms if there is an index on the table, “3.4*N“ms if there are two indexes, “4.6*N“ms if there are three… Get the idea? However, I have not checked this for real in years. Treat this statement as a potential myth}. I would recommend you do not add further indexes to AUD$. If you need to investigate the information in there, I would strongly suggest you extract what you want into a new table with a CTAS statement, then index that copy table and do your work on that. You do not want to slow down further everything that is already slowed down a little by being audited.

What about the largest thing in the SYSTEM tablespace? It’s SYS.C_OBJ#_INTCOL# nd is over 13GB in size. I said it would be SYS.HISTGRM$ didn’t I?

Well, SYS.C_OBJ#_INTCOL# is a cluster. Clusters are structures, segments, that hold one or more tables. All records for the same key across all those tables are held together, in the same block or blocks. The key is either a normal index or a hash. The idea is that it speeds up selecting records from different tables with the same key – they are all in the same block or blocks. It does work too, but they seem to have fallen out of favour with Oracle designers, but are worth considering.

In this particular cluster there is only one table, SYS.HISTGRM$:

select owner,table_name
from dba_tables
where cluster_name='C_OBJ#_INTCOL#'

OWNER                          TABLE_NAME
------------------------------ ------------------------
SYS                            HISTGRM$
1 row selected.

So, SYS.HISTGRM$ is the only table in the cluster. It {and it’s cluster index, on columns OBJ# and INTCOL#} are hiding in this cluster segment. It IS the biggest thing in this particular SYSTEM tablespace. The only index, other than the cluster index, on the table SYS.HISTGRM$ is I_H_OBJ#_COL# and is the third largest object in the SYSTEM tablespace.

INDEX_OWNER                 ind_owner
,INDEX_NAME                  ind_Name
,TABLE_NAME                  tab_Name
,substr(column_position,1,3) Psn
,COLUMN_NAME                 Col_Name
from dba_ind_columns
where table_name like upper(nvl('&tab_name','WHOOPS')||'%')
order by 3,1,2,4,5
Enter value for tab_name: histgrm$

IND_OWNER   IND_NAME           TAB_NAME           PSN       COL_NAME
----------- ------------------ ------------------ --------- --------------
SYS         I_H_OBJ#_COL#      HISTGRM$           1         OBJ#
                                                  2         COL#
2 rows selected.
Elapsed: 00:00:02.07

-- you find the cluster index by looking on DBA_INDEXES and
--DBA_IND_COLUMNS where the "TABLE_NAME" is the name of the cluster
Enter value for tab_name: C_OBJ#_INTCOL#

IND_OWNER   IND_NAME           TAB_NAME           PSN       COL_NAME
----------- ------------------ ------------------ --------- --------------
SYS         I_OBJ#_INTCOL#     C_OBJ#_INTCOL#     1         OBJ#
                                                  2         INTCOL#
2 rows selected.
Elapsed: 00:00:00.93

What is HISTGRM$? Do you think it looks like it could be something to do with column Histograms? Well it is, it’s the underlying data dictionary table that holds column histograms. It started to become common to see a large HISTGRM$ table with Oracle 9 systems with many partitions {and of course subpartitions, if used}, where statistics were gathered on these partitioned tables aggressively and included a METHOD_OPT clause that causes column statistics to be collected. eg FOR ALL INDEXED COLUMNS SIZE 200 or FOR ALL COLUMNS SIZE 100.

Where most people have seen it balloon in size though is with Oracle 10 and the introduction of…the Automated Stats gathering job {so, there is a connection to previous recent posts 🙂 }.
Every night, every weekend, any partition or sub-partiton that is 10% or more different to last time stats were gathered, get stats gathered. The automatic job gathers states with METHOD_OPT=AUTO. Which means your Oracle software decides which columns to gather histogram stats on and how many buckets it collects.

Whoever wrote that part of the stats gathering feature really liked to gather information.

If you have lots of partitions {or sub-partitions} and gather stats with the METHOD_OPT=AUTO, the two factors added together result in a massive HISTGRM$ table and thus a massive cluster.

Is space important? No, of course not, disc acreage is cheap, just buy bigger discs ;-). {That, of course, was a highly questionable suggestion}.
Being serious, I’m personally not too concerend about my SYSTEM tablespace being that big, after all compared to the size of databases that use thousands of partitions, 32GB is usually not a major concern, but I know of people who did not expect their system tablespace to get that big and have issues with simple running out of strage available to them.

I am more concerned about the impact on parse times and general data dictionary perforance of such large objects. I’ve found the data dictionary can be very slow for some queries when there are large numbers of extents, objects, columns…Histograms are just in there with the mix. I’ve had to work around such slow data dictionary access several times in the past and I suspect {though have not had time to in any way support this with evidence} that parsing is slowed down by unneccessary histograms. As an example of this, which actually has nothing to do with SYS.HISTGRM$, if you look back up this post at the code listing the indexes on the cluster and table, the code took between a second and two seconds to run. Normally it takes less than 1/10 of a second.

Can you do anything about the size of these objects? Well, with SYS.AUD$ Oracle allow you to delete from it and move the table to a new tablespace. If you use the SQL AUDIT command, you should include housekeeping the SYS.AUD$ table in your overall implementation.

SYS.C_OBJ#_INTCOL# and SYS.HISTGRM in it are slightly trickier. It is not easy to reduce the space used by these objects and thus the SYSTEM tablespace once they have grown large, but you can stop them getting bigger by collecting fewer column histogram statistics on your partitioned tables. Obviously that is not as simple as it sounds, but many sites with large numbers of objects do turn off the automated stats gathering and run their own code.

Automatic Statistics Gathering Fails #3 July 29, 2009

Posted by mwidlake in internals, performance.
Tags: , ,

I’m steeling a few minutes from my lunch time to finish off this thread on the automated statistics gathering failing.
<<first post<second post

There is another way to spot what the automated stats gathering choked on but it is not that reliable. But it does involve digging a little into Oracle’s internals, which is fun.

Under 10.2 onwards, when you gather stats on a segment, either directly or via a schema, database or the automatic stats collection, Oracle stores the stats that are replaced for you automatically {I say 10.2 as I am not sure that segment-level gather_table/index_stats are automatically backed up like this. I have some old notes saying it seemed not but I am not sure if this was 9i, 10.1 or me just getting confused at the time. If anyone knows or can check back, let me know 🙂 }

This means you can get the stats back using the dbma_stats.restore_xxxxxxxx_stats procedures. eg dbms_stats.restore_table_stats(ownname=>user,tabname=>’TEST_P’,as_of_timestamp=>systimestamp-1);
This will restore the stats of my table TEST_P to what they were at this time yesterday. You do not need to create a stattab table and store the prevous stats manually.
I’m not going to blog any more right now about this handy feature, the “PL/SQL packages and types” manual will tell you what you need, but I will comment that by default you can only go back 31 days.

Oracle gives you a table to see the history of stats stored, DBA_TAB_STATS_HIST

desc dba_tab_stats_history
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                              VARCHAR2(30)
 TABLE_NAME                                         VARCHAR2(30)
 PARTITION_NAME                                     VARCHAR2(30)
 SUBPARTITION_NAME                                  VARCHAR2(30)
 STATS_UPDATE_TIME                                  TIMESTAMP(6) WITH TIME ZONE

select owner,table_name, partition_name,stats_update_time
from dba_tab_stats_history
where table_name='TEST_P'  and owner=user

-------- ---------- --------------- --------------------------------
WIDLAKEM TEST_P                     28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_10K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_30K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_MAX          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_40K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_20K          28-JUL-09 +00:00
WIDLAKEM TEST_P                     28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_MAX          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_40K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_30K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_20K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_10K          28-JUL-09 +00:00
WIDLAKEM TEST_P                     28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_MAX          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_40K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_30K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_20K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_10K          28-JUL-09 +00:00

You can see that I probably gathered stats on my test table 3 times yesterday, each time I gathered at a granularity of all, ie global and partition stats. Note also, all partitions and the table get the same timestamp. I think this is because oracle records the timestamp as when the stats for the set of segments was swapped into the data dictionary {note, not started}.

That’s quite useful. However, there is no similar view for indexes. That’s a shame. But you can recover index stats and this view is saying nothing about the stats as they were?…Is there more to be found?… Let’s go see what that view is looking it

Enter value for vw_name: dba_tab_stats_history
OWNER    VIEW_NAME                      TEXT_LENGTH
-------- ------------------------------ -----------
SYS      DBA_TAB_STATS_HISTORY                  876
select u.name, o.name, null, null, h.savtime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 2 and o.owner# = u.user#
  union all
  -- partitions
  select u.name, o.name, o.subname, null, h.savtime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 19 and o.owner# = u.user#
  union all
  -- sub partitions
  select u.name, osp.name, ocp.subname, osp.subname, h.savtime
  from  sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.tabsubpart$ tsp,
        sys.wri$_optstat_tab_history h
  where h.obj# = osp.obj# and osp.type# = 34 and osp.obj# = tsp.obj# and
        tsp.pobj# = ocp.obj# and osp.owner# = u.user#
  union all
  -- fixed tables
  select 'SYS', t.kqftanam, null, null, h.savtime
  from  sys.x$kqfta t, sys.wri$_optstat_tab_history h
  t.kqftaobj = h.obj#

Now that is interesting. Let’s go look at that table:

desc sys.wri$_optstat_tab_history
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------------------
 OBJ#                                      NOT NULL NUMBER
 SAVTIME                                            TIMESTAMP(6) WITH TIME ZONE
 FLAGS                                              NUMBER
 ROWCNT                                             NUMBER
 BLKCNT                                             NUMBER
 AVGRLN                                             NUMBER
 SAMPLESIZE                                         NUMBER
 ANALYZETIME                                        DATE
 CACHEDBLK                                          NUMBER
 CACHEHIT                                           NUMBER
 LOGICALREAD                                        NUMBER
 SPARE1                                             NUMBER
 SPARE2                                             NUMBER
 SPARE3                                             NUMBER
 SPARE4                                             VARCHAR2(1000)
 SPARE5                                             VARCHAR2(1000)
 SPARE6                                             TIMESTAMP(6) WITH TIME ZONE

Some of those columns look interesting. Unforturnately CACHEDBLK onwards are empty, even in version 11 {but it shows that Oracle have built in the ability to use those yet-to-be used columns about average cached blocks and cache hit ratios for segments you might have spotted in eg DBA_TAB_STATISTICS}.
Could there be an index version? Of course there is :-

desc sys.wri$_optstat_ind_history
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------------------
 OBJ#                                      NOT NULL NUMBER
 SAVTIME                                            TIMESTAMP(6) WITH TIME ZONE
 FLAGS                                              NUMBER
 ROWCNT                                             NUMBER
 BLEVEL                                             NUMBER
 LEAFCNT                                            NUMBER
 DISTKEY                                            NUMBER
 LBLKKEY                                            NUMBER
 DBLKKEY                                            NUMBER
 CLUFAC                                             NUMBER
 SAMPLESIZE                                         NUMBER
 ANALYZETIME                                        DATE
 GUESSQ                                             NUMBER
 CACHEDBLK                                          NUMBER
 CACHEHIT                                           NUMBER
 LOGICALREAD                                        NUMBER
 SPARE1                                             NUMBER
 SPARE2                                             NUMBER
 SPARE3                                             NUMBER
 SPARE4                                             VARCHAR2(1000)
 SPARE5                                             VARCHAR2(1000)
 SPARE6                                             TIMESTAMP(6) WITH TIME ZONE

I’ve written a couple of scripts to extract data from these underlying tables, with the extra data you can grab. Use these links to download the table script and the index script.
Here below is the index script though:-

-- show_ish_full
-- Martin Widlake 14/4/08
-- this gets the index stats history - there is no index equiv of dba_tab_stats_hist
-- that I can see in 10.2 (or 11)
col owner form a12
col ind_full_name form a40
col stat_upd_time form a15
col blevel form 99 head bl
select owner,ind_full_name,
to_char(stat_upd_time,'MMDD HH24:MI:SS') stat_upd_time
,numrows,numleafs,blevel,dist_kys,lf_p_ky, dbl_p_ky,clufac,samp_size,obj_hash
,greatest(nvl(samp_size,1),1)/greatest(nvl(numrows,1),1) samp_pct
 from (
select u.name owner, o.name ind_name,o.name ind_full_name
, h.savtime stat_upd_time
,h.rowcnt numrows,h.leafcnt numleafs, h.blevel blevel,h.distkey dist_kys
,h.lblkkey lf_p_ky,h.dblkkey dbl_p_ky,h.clufac clufac, h.samplesize samp_size,h.obj# obj_hash
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_ind_history h
  where  h.obj# = o.obj# and o.type# = 1 and o.owner# = u.user#
  union all
  -- partitions
  select u.name owner, o.name ind_name, o.name||'-'||o.subname ind_full_name
, h.savtime stat_upd_time
,h.rowcnt numrows,h.leafcnt numleafs, h.blevel blevel,h.distkey dist_kys
,h.lblkkey lf_p_ky,h.dblkkey dbl_p_ky,h.clufac clufac, h.samplesize samp_size,h.obj# obj_hash
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_ind_history h
  where  h.obj# = o.obj# and o.type# = 20 and o.owner# = u.user#
  union all
  -- sub partitions
  select u.name owner, osp.name ind_name, osp.name||'-'||ocp.subname||'='|| osp.subname ind_full_name
, h.savtime stat_upd_time
,h.rowcnt numrows,h.leafcnt numleafs, h.blevel blevel,h.distkey dist_kys
,h.lblkkey lf_p_ky,h.dblkkey dbl_p_ky,h.clufac clufac, h.samplesize samp_size,h.obj# obj_hash
  from  sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.indsubpart$ tsp,
        sys.wri$_optstat_ind_history h
  where h.obj# = osp.obj# and osp.type# = 35 and osp.obj# = tsp.obj# and
        tsp.pobj# = ocp.obj# and osp.owner# = u.user#
) where ind_name like upper(nvl('&indname','whoops')||'%')
and owner like upper('&indown')||'%'
order by owner,ind_name,stat_upd_time desc
clear colu

I’ll leave you to play with these as you wish, especially as I ran out of time to finish this blog an hour ago and now owe my current employer some of my weekend in return, but I’ll finish off with a use for the scripts.

One thing to bear in mind, these tables are holding the stats that were replaced, not the stats that were created. So they hold historical information.

If you read the earlier posts, you may remember that the automatic stats job can choke and fail on extremely large segments. Well, sometimes {and I have to admit, only sometimes} you can spot what segments it failed on. That is because, at the end of the window when the job gets stopped, it attempts to clean up after itself. It appears to check for stats that have changed during the run but it failed to process correctly and resets them. It copes the stats back. So you sometimes see this:-

OWNER        TAB_FULL_NAME                            STAT_UPD_TIME           NUMROWS    NUMBLKS AVG_RL  SAMP_SIZE OBJ_HASH                           SAMP_PCT
------------ ---------------------------------------- -------------------- ---------- ---------- ------ ---------- --------                           --------
TABOWNER1   HUGETABLE4                               0522 06:00:01        1473990266   27319245    128    5660375    54714                               .004
TABOWNER1   HUGETABLE4                               0522 05:48:22        1327119100   24544873    128   13271191    54714                               .010

In the above case, you can see that this massive table had stats set at 5:48am and then again at 6:00.01am. That second record is the automated stats swapping the last known good stats back in place. That is the segment it choked on.

How do you stop the automatic stats job chocking on this extent? You do a manual gather of course. You will see the above job attempted a 0.1% sample size that almost worked at 05:48. The value it swapped back is 0.04 – which was the sample size of the manual gather I did a few days earlier to get over the last choke. {I also tweaked the histograms I gathered to save time}. I had not yet finished converting the manual fix to an automated one when I recorded the above.

Why do I think that the automatic job sets the timestamp in the WRI$_OPTSTAT_xxx_HIST tables when it swaps in the new stats and not when it starts? Because of the time of the entries after a large object has been analysed. Why do I think it checks for stats that have changed since the start of the run and replaces them rather than keeping track of the information as it goes? Because, when I first came across this choking issue, I was veryifying the problem after 10pm at night. When I realised there was an issue at around 1am, I started manually gathering stats. It took me until 4am, I checked they were in place and went to bed. Imagine my bad temper the next day when I found out that the automatic job had gone and re-wound most of the stats I had gathered, resetting them to what they had been at 10am the previous night. It was a seriously bad temper.

Addition – I’ve had a comment that may or may not be spam, saying they do not understand “the last bit” (If you are a real person molamola, I apologise). Reading it, it is maybe not clear, so this is what happened and why I was angry 🙂

1) I had realised the auto stats job was failing overnight.
2) I logged on from home at just before 10pm and watched the system
3) at 10pm the auto stats job started and collected stats on some small tables and indexes
4) around 11,11:30pm I realised is was not progressing,
5) from 11:30 to 3am in the morning, I created my own gather_table_stats and gather_index_stats statements and ran them.
6) at around 4am I went to bed, having checked stats were gathered and execution plans were good.
7) Got to work around 9am, a bit tired.
8) Found lots of performance issues due to old stats.
9) Shouted randomly at people and got angry, started re-gathering stats.
10) Worked out that at 6am, when the auto stats job finished, all the tables/indexes IT had intended to gather stats on but had failed to do so, it reset the stats to what they had been set to at 10pm the previous night, when it started.
ie very old and bad stats.

Peeking under the Data Dictionary Hood July 7, 2009

Posted by mwidlake in internals.
Tags: ,

Have you ever wondered how some people seem to be able to find out how the Oracle database does things internally? How they work out where some of the interesting stuff is contained within the data dictionary and can pull it out?

Well, one of the simpler ways to peek under the data dictionary hood is to just look at the objects in the data dictionary itself. I’ve found some interesting things by looking at the data dictionary views. A lot of the dictionary objects areviews, like DBA_TABLES and DBA_COLUMNS. I’ll look at DBA_TABLESPACES {as it’s a bit smaller!}

You can get my script here – vw_txt.sql . As you will see, it is a terribly complex script… The only trick is to make sure that “set long 32767” as the view text is held in column of the ancient LONG datatype.

> @vw_txt
Enter value for vw_name: dba_tablespaces
old 6: where view_name like upper (nvl('&vw_name','WHOOPS')||'%')
new   6: where view_name like upper (nvl('dba_tablespaces','WHOOPS')||'%')
Any Key...>

OWNER                          VIEW_NAME                      TEXT_LENGTH
------------------------------ ------------------------------ -----------
select ts.name, ts.blocksize, ts.blocksize * ts.dflinit,
          decode(bitand(ts.flags, 3), 1, to_number(NULL),
ts.blocksize * ts.dflincr),
          decode(ts.contents$, 1, to_number(NULL), ts.dflmaxext),
decode(bitand(ts.flags, 4096), 4096, ts.affstrength, NULL),
decode(bitand(ts.flags, 3), 1, to_number(NULL), ts.dflextpct),
ts.blocksize * ts.dflminlen,
decode(ts.online$, 1, 'ONLINE', 2, 'OFFLINE',
                 4, 'READ ONLY', 'UNDEFINED'),
decode(ts.contents$, 0, (decode(bitand(ts.flags, 16), 16, 'UNDO',
                 'PERMANENT')), 1, 'TEMPORARY'),
decode(bitand(ts.dflogging, 1), 0, 'NOLOGGING', 1, 'LOGGING'),
decode(bitand(ts.dflogging, 2), 0, 'NO', 2, 'YES'),
          decode(ts.bitmapped, 0, 'DICTIONARY', 'LOCAL'),
decode(bitand(ts.flags, 3), 0, 'USER', 1, 'SYSTEM', 2, 'UNIFORM',
          decode(ts.plugged, 0, 'NO', 'YES'),
decode(bitand(ts.flags,32), 32,'AUTO', 'MANUAL'),
decode(bitand(ts.flags,64), 64,'ENABLED', 'DISABLED'),
decode(bitand(ts.flags,16), 16, (decode(bitand(ts.flags, 512), 512,
                 'GUARANTEE', 'NOGUARANTEE')), 'NOT APPLY'),
decode(bitand(ts.flags,256), 256, 'YES', 'NO'),
decode(tsattr.storattr, 1, 'STORAGE', 'HOST'),
decode(bitand(ts.flags,16384), 16384, 'YES', 'NO'),
decode(bitand(ts.flags,64), 0, null,
decode(bitand(ts.flags,65536), 65536,'FOR ALL OPERATIONS',
                 'DIRECT LOAD ONLY'))
from sys.ts$ ts, sys.x$kcfistsa tsattr
where ts.online$ != 3
and bitand(flags,2048) != 2048
and ts.ts# = tsattr.tsid

1 row selected.

You see a lot of decodes of bitand functions in these internal views. What is more interesting is to see what they are decoded into as it confirms what possible options there are.

Are you curious as to what all the possible object types in the database are? Whether your database happens to have examples of them or not? Peek inside the DBA_OBJECTS view {the following is part of that view}:-

OWNER                          VIEW_NAME                      TEXT_LENGTH
------------------------------ ------------------------------ -----------
select u.name, o.name, o.subname, o.obj#, o.dataobj#,
       decode(o.type#, 0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE', 3, 'CLUSTER',
                      4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE',
                      7, 'PROCEDURE', 8, 'FUNCTION', 9, 'PACKAGE',
                      11, 'PACKAGE BODY', 12, 'TRIGGER',
                      13, 'TYPE', 14, 'TYPE BODY',
                      19, 'TABLE PARTITION', 20, 'INDEX PARTITION', 21, 'LOB',
                      22, 'LIBRARY', 23, 'DIRECTORY', 24, 'QUEUE',
                      28, 'JAVA SOURCE', 29, 'JAVA CLASS', 30, 'JAVA RESOURCE',
                      32, 'INDEXTYPE', 33, 'OPERATOR',
                      34, 'TABLE SUBPARTITION', 35, 'INDEX SUBPARTITION',
                      40, 'LOB PARTITION', 41, 'LOB SUBPARTITION',
                               FROM sum$ s
                               WHERE s.obj#=o.obj#
and bitand(s.xpflags, 8388608) = 8388608),
                              'MATERIALIZED VIEW'),
                      43, 'DIMENSION',
                      44, 'CONTEXT', 46, 'RULE SET', 47, 'RESOURCE PLAN',
                      48, 'CONSUMER GROUP',
                      51, 'SUBSCRIPTION', 52, 'LOCATION',
                      55, 'XML SCHEMA', 56, 'JAVA DATA',
                      57, 'EDITION', 59, 'RULE',
                      60, 'CAPTURE', 61, 'APPLY',
                      62, 'EVALUATION CONTEXT',
                      66, 'JOB', 67, 'PROGRAM', 68, 'JOB CLASS', 69, 'WINDOW',
                      72, 'WINDOW GROUP', 74, 'SCHEDULE', 79, 'CHAIN',
                      81, 'FILE GROUP', 82, 'MINING MODEL', 87, 'ASSEMBLY',
                      90, 'CREDENTIAL', 92, 'CUBE DIMENSION', 93, 'CUBE',
                      94, 'MEASURE FOLDER', 95, 'CUBE BUILD PROCESS',

Scanning the view definitions for something you become interested in is a little more tricky as the text is, as I said, stored in a long and you can’t run sql functions against a long. I use a bit of PL/SQL to get around the problem :-

find_vw_txt.sql .

-- find_vw_txt
-- martin widlake 13/8/08
-- can't search view text as it is in a long.
-- so let's try PL/SQL
set serveroutput on size unlimited
spool find_vw_txt.lst
set trims on  lines 120 pages 32
v_name varchar2(4000) :='&viewname';
v_search varchar2(100) :='&search_txt';
cursor get_vw_txt is
select u.name owner, o.name name, v.textlength textlen, v.text text
from sys.obj$ o
, sys.view$ v
   , sys.user$ u
where o.obj#     = v.obj#
and   o.owner#   = u.user#
and   o.name     like upper(v_name)||'%'
and v.textlength &lt;32000; -- cant think how I stop a numeric
                          -- error trying to pull a larger long
-- into a varchar. bloody longs
v_text varchar2(32000);
v_where number;
  v_name :=upper (v_name);
  v_search := upper(v_search);
for vtr in get_vw_txt loop
--dbms_output.put_line('looking at '||vtr.name);
    v_text := vtr.text;
    v_text := upper(v_text);
    v_where := instr(v_text,v_search);
    if v_where !=0 then
dbms_output.put_line('view '||vtr.owner||'.'||vtr.name||':** '
||substr(v_text,greatest(0,v_where),80)||' **'
    end if;
  end loop;
spool off

It can be fun to dig into the internals this way. And useful. A few times I’ve been able to find out one or two things about how oracle is storing information.

It might occur to you that there are lots of internal pl/sql packages and you can look into them too:-

pkg_txt.sql will show you the contents of a package or stored function/procedure.
find_plsql_txt.sql will scan the stored PL/SQL for a text string.

However, a lot of the internal PL/SQL is wrapped, ie converted into a form you can’t peek into easily {well, in my case, at all, but some people know how}.

One last word of warning. You can find what look like interesting undocumented features when you peek under the covers. In V10 I came across the REVERSE function:

select reverse('Martin was here')
from dual

ereh saw nitraM

Nice :-).

That could be useful, yes?

I’ve checked in the 11g documentation and it is still not documented. I googled “oracle reverse function” and several people have also found it. A couple ask why such a useful thing is not documented…

Well, I was “lucky” enough to find out why it is undocumented. I tried to use the REVERSE function on a number { this wason v10.1 I think}.

My session core-dumped.

I was on a production system when I tried this {Yes I KNOW!!! It was a momentary lapse}!

It seems to work OK on and 11.1 but don’t. You have no idea if it will always work OK and no come-back to Oracle Corp if it blows up on you. My guess is they wrote the REVERSE function for internal use and did not test it for people using it “not as intended” {as I had done}.

So, my warning is, though it is fun to look under the covers and try things out, never, ever ever, ever do so on a production system. Or a test system used for proper testing. Or a development system those touchy developers are using. Or any system that you can’t afford to trash and recreate on a whim. Try it at home on your own PC, OK?