Decoding high_value and low_value

Decoding high_value and low_value January 3, 2010

Posted by mwidlake in internals, performance.
Tags: data dictionary, performance, SQL
trackback

The table DBA_TAB_COLUMNS holds the LOW_VALUE and HIGH_VALUE for columns. This information is potentially very useful to us, especially as when the CBO looks at a WHERE clause on a column for a given value, the High and Low values for that column are considered. If the value in the WHERE clause is outside the known range, the expected number of values to be found is decreased in proportion to the distance outside the range.

What is not so helpful is that Oracle stores, and displays, the information in an internal raw format. Which is utterly unhelpful to us of course. Why can’t they at least expose the information in a view in a human-readable format? Tsch.

So here is some code on how to decode low_value and high_value. I pulled most of this together a few years back but always struggled to decode the low and high values for dates, until I found this post by Gary Myers.

-- col_stats
-- Martin Widlake mdw 21/03/2003
-- MDW 11/12/09 enhanced to include more translations of low_value/high_value
-- pilfered from Gary Myers blog
col owner        form a6 word wrap
col table_name   form a15 word wrap
col column_name  form a22 word wrap
col data_type    form a12
col M            form a1
col num_vals     form 99999,999
col dnsty        form 0.9999
col num_nulls    form 99999,999
col low_v        form a18
col hi_v         form a18
col data_type    form a10
set lines 110
break on owner nodup on table_name nodup
spool col_stats.lst
select --owner
--      ,table_name
      column_name
      ,data_type
      ,decode (nullable,'N','Y','N')  M
      ,num_distinct num_vals
      ,num_nulls
      ,density dnsty
,decode(data_type
  ,'NUMBER'       ,to_char(utl_raw.cast_to_number(low_value))
  ,'VARCHAR2'     ,to_char(utl_raw.cast_to_varchar2(low_value))
  ,'NVARCHAR2'    ,to_char(utl_raw.cast_to_nvarchar2(low_value))
  ,'BINARY_DOUBLE',to_char(utl_raw.cast_to_binary_double(low_value))
  ,'BINARY_FLOAT' ,to_char(utl_raw.cast_to_binary_float(low_value))
  ,'DATE',to_char(1780+to_number(substr(low_value,1,2),'XX')
         +to_number(substr(low_value,3,2),'XX'))||'-'
       ||to_number(substr(low_value,5,2),'XX')||'-'
       ||to_number(substr(low_value,7,2),'XX')||' '
       ||(to_number(substr(low_value,9,2),'XX')-1)||':'
       ||(to_number(substr(low_value,11,2),'XX')-1)||':'
       ||(to_number(substr(low_value,13,2),'XX')-1)
,  low_value
       ) low_v
,decode(data_type
  ,'NUMBER'       ,to_char(utl_raw.cast_to_number(high_value))
  ,'VARCHAR2'     ,to_char(utl_raw.cast_to_varchar2(high_value))
  ,'NVARCHAR2'    ,to_char(utl_raw.cast_to_nvarchar2(high_value))
  ,'BINARY_DOUBLE',to_char(utl_raw.cast_to_binary_double(high_value))
  ,'BINARY_FLOAT' ,to_char(utl_raw.cast_to_binary_float(high_value))
  ,'DATE',to_char(1780+to_number(substr(high_value,1,2),'XX')
         +to_number(substr(high_value,3,2),'XX'))||'-'
       ||to_number(substr(high_value,5,2),'XX')||'-'
       ||to_number(substr(high_value,7,2),'XX')||' '
       ||(to_number(substr(high_value,9,2),'XX')-1)||':'
       ||(to_number(substr(high_value,11,2),'XX')-1)||':'
       ||(to_number(substr(high_value,13,2),'XX')-1)
,  high_value
       ) hi_v
from dba_tab_columns
where owner      like upper('&tab_own')
and   table_name like upper(nvl('&tab_name','WHOOPS')||'%')
ORDER BY owner,table_name,COLUMN_ID
/
clear colu
spool off
clear breaks

Most of the translation is done via the utl_raw.cast_to_datatype functions but no such function is provided for dates, which is where most of us come unstuck. Gary recognised that the data was being stored in hex format, with an offset for the year.

I hope the script helps some of you.
{There are also functions under DBMS_STATS, DBMS_STATS.CONVERT_RAW_VALUES, that can also be called to translate many datatypes}

If anyone knows how to decode timestamps, I’d love to know as it would save me trying to work it out/find it on the Net somewhere. I’ll add it to the post so that there is one place to find all translatations.

Here is a quick output for a test table

@col_stats
Enter value for tab_own: dwperf
old  40: where owner      like upper('&tab_own')
new  40: where owner      like upper('dwperf')
Enter value for tab_name: ad_sql_exec_p
old  41: and   table_name like upper(nvl('&tab_name','WHOOPS')||'%')
new  41: and   table_name like upper(nvl('ad_sql_exec_p','WHOOPS')||'%')
any key> 

COLUMN_NAME            DATA_TYPE  M   NUM_VALS  NUM_NULLS   DNSTY LOW_V              HI_V
---------------------- ---------- - ---------- ---------- ------- ------------------ ---------------
INST_ID                NUMBER     Y          4          0  0.2500 1                  4
SQL_ID                 VARCHAR2   Y     87,104          0  0.0000 008d71chrks14      gzw309ttkx862
PLAN_HASH_VALUE        NUMBER     Y      1,884          0  0.0005 2646229            4294043874
SPC_ID                 NUMBER     N         83          0  0.0120 1076               2269
ADDRESS                RAW        N    118,198          0  0.0000 00000003E33123A0   0000001342FF3FF8
HASH_VALUE             NUMBER     N     87,104          0  0.0000 2758710            4294676643
CREATED_DATE           DATE       N        120          0  0.0083 2009-10-23 8:19:10 2009-12-10 9:19:13
LATEST_FIRST_LOAD_TIME DATE       N     11,791          0  0.0001 2009-9-14 11:55:59 2009-12-13 9:33:24
TOTAL_LOADS            NUMBER     N         98          0  0.0102 1                  55047
TOTAL_PARSE_CALLS      NUMBER     N         92          0  0.0109 0                  2972
TOTAL_EXECUTIONS       NUMBER     N        235          0  0.0043 0                  834624
TOTAL_ROWS_PROCESSED   NUMBER     N        809          0  0.0012 0                  26946123
TOTAL_FETCHES          NUMBER     N        313          0  0.0032 0                  834624
TOTAL_BUFFER_GETS      NUMBER     N      3,016          0  0.0003 0                  3355576809
TOTAL_DISK_READS       NUMBER     N        985          0  0.0010 0                  28189240
TOTAL_DIRECT_WRITES    NUMBER     N         98          0  0.0102 0                  751289
TOTAL_SORTS            NUMBER     N        106          0  0.0094 0                  5283
TOTAL_CPU_TIME         NUMBER     N     94,401          0  0.0000 1337               12183936207
TOTAL_ELAPSED_TIME     NUMBER     N    115,203          0  0.0000 1337               139692482086
TOTAL_OPTIMIZER_COST   NUMBER     N      1,467          0  0.0007 0                  369740902209315000
                                                                                     0000000

Comments»

1. Greg Rahn - January 3, 2010: As mentioned, DBMS_STATS contains a method to decode the values and its pretty simple and clean:
http://structureddata.org/2007/10/16/how-to-display-high_valuelow_value-columns-from-user_tab_col_statistics/

Reply
mwidlake - January 4, 2010: Thanks Greg.
The linked page also includes creating a PL/SQL function to act as a front end to the DBMS_STATS functions, to make using them “cleaner”, so anyone planning to create such a function would do well to check it out.

Reply
2. Maxim - January 4, 2010: Martin, i think, your formula for date datatype doesn’t reflect how it is stored, it yields correct results only per incident for some dates. The raw representation of date is:
1st byte: century + 100
2nd byte: year + 100
So, for year 2010, the first two bytes are 78,6e (hex) or 120,110 (decimal), that means 20 (120 – 100) century and 10 (110 – 100) year, or 2010, according your formula – 1780 + 120 + 110 = 2010 ( the same result). But for 1990 raw representation is 119,190 and your formula would calculate 1780 + 119 + 190 = 2089.
By timestamp first 7 bytes are the same as for date and the rest 4 bytes represent fractional seconds (it is covered in Tom Kyte’s Expert Oracle Database Architecture).

Best regards

Maxim

Reply
mwidlake - January 4, 2010: Thanks for that Maxim, I’ll work through that later and update the post.

Reply
Gary - January 5, 2010: Given the date logic was mine, I’ll own up to being the lazy one. It worked for dates in 2000 to 2099. I’ve added a PS to the blog post you linked to.
The improved (ie working) logic is:

rtrim(
ltrim(to_char(100*(to_number(substr(high_value,1,2),’XX’)-100)
+ (to_number(substr(high_value,3,2),’XX’)-100),’0000′))||’.’||
ltrim(to_char(to_number(substr(high_value,5,2),’XX’),’00’))||’.’||
ltrim(to_char(to_number(substr(high_value,7,2),’XX’),’00’))||’.’||
ltrim(to_char(to_number(substr(high_value,9,2),’XX’)-1,’00’))||’.’||
ltrim(to_char(to_number(substr(high_value,11,2),’XX’)-1,’00’))||’.’||
ltrim(to_char(to_number(substr(high_value,13,2),’XX’)-1,’00’)))
mwidlake - January 5, 2010: Thanks for the update Gary, I’ll update the main post when I get some spare time.
Maxim - January 5, 2010: Gary, Martin, sorry, my bad – i missed that in hurry ( as usual) and put more attention on the code, than on the text, but thank you for correction anyway.
Btw., you can get the rid of all that ltrim’s, if you use to_char(…,’fm00′) – in that case the blank reserved for sign (by positive numbers) will be stripped anyway.

Best regards

Maxim
3. Asif Momen - January 4, 2010: Martin,

Good effort. But it fails when you have numeric data in a character column.

SQL> desc t1
Name Null? Type
—————————————– ——– —————–
SORTCODE VARCHAR2(6)

SQL> desc x
Name Null? Type
—————————————– ——– —————–
SORTCODE NUMBER

SQL> select min(sortcode), max(sortcode) from t1;

MIN(SO MAX(SO
—— ——
0 999999

SQL> select min(sortcode), max(sortcode) from x;

MIN(SORTCODE) MAX(SORTCODE)
————- ————-
0 999999

SQL> @get_column_low_hi
Enter value for tab_own: TEST
old 40: where owner like upper(‘&tab_own’)
new 40: where owner like upper(‘TEST’)
Enter value for tab_name: T1
old 41: and table_name like upper(nvl(‘&tab_name’,’WHOOPS’)||’%’)
new 41: and table_name like upper(nvl(‘T1′,’WHOOPS’)||’%’)

COLUMN_NAM DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V
———- ———- – ———- ———- ———- ———- ———-
SORTCODE VARCHAR2 N 997256 0 1.0028E-06 100042 999869

SQL>
SQL> /
Enter value for tab_own: TEST
old 40: where owner like upper(‘&tab_own’)
new 40: where owner like upper(‘TEST’)
Enter value for tab_name: X
old 41: and table_name like upper(nvl(‘&tab_name’,’WHOOPS’)||’%’)
new 41: and table_name like upper(nvl(‘X’,’WHOOPS’)||’%’)

COLUMN_NAM DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V
———- ———- – ———- ———- ———- ———- ———-
SORTCODE NUMBER N 1000727 0 9.9927E-07 2 999877

SQL>

Reply
mwidlake - January 4, 2010: Hi Asif,

It looks to me as though it is working – at least, the raw values are correctly being interpreted back into numbers and characters.

Try adding in the columns for the LOW_VALUE and HIGH_VALUE without modifiying them to the script. I think you will see that numerics are represented by a hex sting starting “C” (eg C102 = 1) whereas varchar2 values are represented by a hex number with twice the digits of the string it represents (unless you have multi-byte character sets).

I think what you are seeing is the impact of the sampling done when the table is analyzed. It will not necessarily get the real high and low values, depending on sample size, and the high and low values could well have changed since the last time the table was analyzed. This is the main reason why I want to be able to see the true High and Low values, so I can get a feel for if the values are significantly different to reality.

Does that make sense?

Reply
4. Asif Momen - January 5, 2010: Hi Martin,

Yes, you are right. ESTIMATE_PERCENT is playing an important role here.

SQL> exec dbms_stats.gather_table_stats(user, ‘T1’, estimate_percent => 1);

COLUMN_NAM DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V
———- ———- – ———- ———- ———- ———- ———-
SORTCODE VARCHAR2 N 1000590 0 9.9941E-07 100030 999997

SQL> exec dbms_stats.gather_table_stats(user, ‘T1’, estimate_percent => 10);

COLUMN_NAM DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V
———- ———- – ———- ———- ———- ———- ———-
SORTCODE VARCHAR2 N 1000590 0 9.9941E-07 100030 999997

SQL> exec dbms_stats.gather_table_stats(user, ‘T1’, estimate_percent => 50);

COLUMN_NAM DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V
———- ———- – ———- ———- ———- ———- ———-
SORTCODE VARCHAR2 N 1000658 0 9.9934E-07 0 999998

SQL> exec dbms_stats.gather_table_stats(user, ‘T1’, estimate_percent => 100);

COLUMN_NAM DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V
———- ———- – ———- ———- ———- ———- ———-
SORTCODE VARCHAR2 N 1000000 0 .000001 0 999999

Thanks for the wonderful post.

cheers

Reply
mwidlake - January 5, 2010: Thanks for the demo Asif, it saves me putting one up.

One of the main failings of doing a small sample size gather is that you can end up with inaccurate HIGH and LOW values and also, possibly more significantly, lower NUM_DISTINCT value. I have no proof for this, but I suspect something changed from V9 /V10.1 and V10.2 which makes the NUM_DISTINCT more vulnerable to being incorrect with small sample sizes as I have had recent issues on 10.2.0.3 with 0.5% sample sizes on very, very large tables (hundreds of millions of rows) getting very low NUM_DISTINCT value that I cannot remember being an issue with even 0.1% sample sizes against similar sized table on V9 and V10.1

Why is it a problem? Well, I think I will put up a whole post on this but in short the CBO guesses how many rows will match a value outside the min-max range by how far out the range it is. As an example, if the CBO sees a LOW_VALUE of 100 and a HIGH_VALUE of 1100, there are 1000 NUM_DISTINCT values and there are 5000 rows, it will reckon a value in the range will match 0.1% of the num rows, ie 5 rows. If it sees a value outside the range, let us say 50% outside the range at 1600, it will estimate 50% of that value, ie 3 values (rounded up – and I might be missing a tweak of 1 row as well). I know Mr Lewis covers this well in his book and I am sure there are a few good blog entries out there about it too. I’d want to do some tests and proofs to validate this before I do a proper blog though.

Reply
5. Log Buffer #174: a Carnival of the Vanities for DBAs | The Pythian Blog - January 8, 2010: […] Widlake has been busy decoding high_value and low_value for us. He writes, “The table DBA_TAB_COLUMNS holds the LOW_VALUE and HIGH_VALUE for columns. […]

Reply
6. Update to Decoding High and Low values « Martin Widlake's Yet Another Oracle Blog - April 28, 2010: […] got a little time to enhance my col_stats code with the extra info people provided as comments to this original posting on it. I also now handle the translation of dates correctly and also […]

Reply
7. Mohamed M Abdelfatah - April 19, 2015: Thanks Martin, great article.

So good, that I republished it under my own name without giving you any credit (except where your name is embedded in the sql).

Let me know what you think of whats is now my work: http (colon)//sqlunv.com/BI/?p=140

{Just a little edit there to prevent any hit counts benefiting the site of a thief}

Mohamed

Reply
mwidlake - April 19, 2015: Thanks for that. I’m assuming you are not actually the real thief but are spoofing a couple of things to point solidly at the site in question (which does not make it at all clear who the owner is, but you can look up the registration)
This happens. You can see by the style of writing that this site is cobbled together from several sources, some of whom have flawed English language skills and others who are fluent and use colloquialisms.

Some of these thieves have no idea that stealing is wrong “it’s in the public domain already” – but taking other people’s work and putting it forward as your own is only announcing to the world you are not good enough to do your own work. Others don’t care as they have psychopathic tendencies. Others are just naive and stupid.

I occasionally challenge them but, meh, whatever you want “Mohamed”.

Reply
8. Decode HIGH_VALUE and LOW_VALUE, including TIMESTAMP data types | EDUARDO CLARO - November 4, 2021: […] You probably have found on the web scripts to decode RAW values from DBA_TAB_COL_STATISTICS’ HIGH_VALUE and LOW_VALUE. Here are two good examples from Jonathan Lewis and Martin Widlake. […]

Reply