Decoding high_value and low_value January 3, 2010
Posted by mwidlake in internals, performance.Tags: data dictionary, performance, SQL
trackback
There is an update to this post here>
The table DBA_TAB_COLUMNS holds the LOW_VALUE and HIGH_VALUE for columns. This information is potentially very useful to us, especially as when the CBO looks at a WHERE clause on a column for a given value, the High and Low values for that column are considered. If the value in the WHERE clause is outside the known range, the expected number of values to be found is decreased in proportion to the distance outside the range.
What is not so helpful is that Oracle stores, and displays, the information in an internal raw format. Which is utterly unhelpful to us of course. Why can’t they at least expose the information in a view in a human-readable format? Tsch.
So here is some code on how to decode low_value and high_value. I pulled most of this together a few years back but always struggled to decode the low and high values for dates, until I found this post by Gary Myers.
-- col_stats -- Martin Widlake mdw 21/03/2003 -- MDW 11/12/09 enhanced to include more translations of low_value/high_value -- pilfered from Gary Myers blog col owner form a6 word wrap col table_name form a15 word wrap col column_name form a22 word wrap col data_type form a12 col M form a1 col num_vals form 99999,999 col dnsty form 0.9999 col num_nulls form 99999,999 col low_v form a18 col hi_v form a18 col data_type form a10 set lines 110 break on owner nodup on table_name nodup spool col_stats.lst select --owner -- ,table_name column_name ,data_type ,decode (nullable,'N','Y','N') M ,num_distinct num_vals ,num_nulls ,density dnsty ,decode(data_type ,'NUMBER' ,to_char(utl_raw.cast_to_number(low_value)) ,'VARCHAR2' ,to_char(utl_raw.cast_to_varchar2(low_value)) ,'NVARCHAR2' ,to_char(utl_raw.cast_to_nvarchar2(low_value)) ,'BINARY_DOUBLE',to_char(utl_raw.cast_to_binary_double(low_value)) ,'BINARY_FLOAT' ,to_char(utl_raw.cast_to_binary_float(low_value)) ,'DATE',to_char(1780+to_number(substr(low_value,1,2),'XX') +to_number(substr(low_value,3,2),'XX'))||'-' ||to_number(substr(low_value,5,2),'XX')||'-' ||to_number(substr(low_value,7,2),'XX')||' ' ||(to_number(substr(low_value,9,2),'XX')-1)||':' ||(to_number(substr(low_value,11,2),'XX')-1)||':' ||(to_number(substr(low_value,13,2),'XX')-1) , low_value ) low_v ,decode(data_type ,'NUMBER' ,to_char(utl_raw.cast_to_number(high_value)) ,'VARCHAR2' ,to_char(utl_raw.cast_to_varchar2(high_value)) ,'NVARCHAR2' ,to_char(utl_raw.cast_to_nvarchar2(high_value)) ,'BINARY_DOUBLE',to_char(utl_raw.cast_to_binary_double(high_value)) ,'BINARY_FLOAT' ,to_char(utl_raw.cast_to_binary_float(high_value)) ,'DATE',to_char(1780+to_number(substr(high_value,1,2),'XX') +to_number(substr(high_value,3,2),'XX'))||'-' ||to_number(substr(high_value,5,2),'XX')||'-' ||to_number(substr(high_value,7,2),'XX')||' ' ||(to_number(substr(high_value,9,2),'XX')-1)||':' ||(to_number(substr(high_value,11,2),'XX')-1)||':' ||(to_number(substr(high_value,13,2),'XX')-1) , high_value ) hi_v from dba_tab_columns where owner like upper('&tab_own') and table_name like upper(nvl('&tab_name','WHOOPS')||'%') ORDER BY owner,table_name,COLUMN_ID / clear colu spool off clear breaks
Most of the translation is done via the utl_raw.cast_to_datatype functions but no such function is provided for dates, which is where most of us come unstuck. Gary recognised that the data was being stored in hex format, with an offset for the year.
I hope the script helps some of you.
{There are also functions under DBMS_STATS, DBMS_STATS.CONVERT_RAW_VALUES, that can also be called to translate many datatypes}
If anyone knows how to decode timestamps, I’d love to know as it would save me trying to work it out/find it on the Net somewhere. I’ll add it to the post so that there is one place to find all translatations.
Here is a quick output for a test table
@col_stats Enter value for tab_own: dwperf old 40: where owner like upper('&tab_own') new 40: where owner like upper('dwperf') Enter value for tab_name: ad_sql_exec_p old 41: and table_name like upper(nvl('&tab_name','WHOOPS')||'%') new 41: and table_name like upper(nvl('ad_sql_exec_p','WHOOPS')||'%') any key> COLUMN_NAME DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V ---------------------- ---------- - ---------- ---------- ------- ------------------ --------------- INST_ID NUMBER Y 4 0 0.2500 1 4 SQL_ID VARCHAR2 Y 87,104 0 0.0000 008d71chrks14 gzw309ttkx862 PLAN_HASH_VALUE NUMBER Y 1,884 0 0.0005 2646229 4294043874 SPC_ID NUMBER N 83 0 0.0120 1076 2269 ADDRESS RAW N 118,198 0 0.0000 00000003E33123A0 0000001342FF3FF8 HASH_VALUE NUMBER N 87,104 0 0.0000 2758710 4294676643 CREATED_DATE DATE N 120 0 0.0083 2009-10-23 8:19:10 2009-12-10 9:19:13 LATEST_FIRST_LOAD_TIME DATE N 11,791 0 0.0001 2009-9-14 11:55:59 2009-12-13 9:33:24 TOTAL_LOADS NUMBER N 98 0 0.0102 1 55047 TOTAL_PARSE_CALLS NUMBER N 92 0 0.0109 0 2972 TOTAL_EXECUTIONS NUMBER N 235 0 0.0043 0 834624 TOTAL_ROWS_PROCESSED NUMBER N 809 0 0.0012 0 26946123 TOTAL_FETCHES NUMBER N 313 0 0.0032 0 834624 TOTAL_BUFFER_GETS NUMBER N 3,016 0 0.0003 0 3355576809 TOTAL_DISK_READS NUMBER N 985 0 0.0010 0 28189240 TOTAL_DIRECT_WRITES NUMBER N 98 0 0.0102 0 751289 TOTAL_SORTS NUMBER N 106 0 0.0094 0 5283 TOTAL_CPU_TIME NUMBER N 94,401 0 0.0000 1337 12183936207 TOTAL_ELAPSED_TIME NUMBER N 115,203 0 0.0000 1337 139692482086 TOTAL_OPTIMIZER_COST NUMBER N 1,467 0 0.0007 0 369740902209315000 0000000
As mentioned, DBMS_STATS contains a method to decode the values and its pretty simple and clean:
http://structureddata.org/2007/10/16/how-to-display-high_valuelow_value-columns-from-user_tab_col_statistics/
Thanks Greg.
The linked page also includes creating a PL/SQL function to act as a front end to the DBMS_STATS functions, to make using them “cleaner”, so anyone planning to create such a function would do well to check it out.
Martin, i think, your formula for date datatype doesn’t reflect how it is stored, it yields correct results only per incident for some dates. The raw representation of date is:
1st byte: century + 100
2nd byte: year + 100
So, for year 2010, the first two bytes are 78,6e (hex) or 120,110 (decimal), that means 20 (120 – 100) century and 10 (110 – 100) year, or 2010, according your formula – 1780 + 120 + 110 = 2010 ( the same result). But for 1990 raw representation is 119,190 and your formula would calculate 1780 + 119 + 190 = 2089.
By timestamp first 7 bytes are the same as for date and the rest 4 bytes represent fractional seconds (it is covered in Tom Kyte’s Expert Oracle Database Architecture).
Best regards
Maxim
Thanks for that Maxim, I’ll work through that later and update the post.
Given the date logic was mine, I’ll own up to being the lazy one. It worked for dates in 2000 to 2099. I’ve added a PS to the blog post you linked to.
The improved (ie working) logic is:
rtrim(
ltrim(to_char(100*(to_number(substr(high_value,1,2),’XX’)-100)
+ (to_number(substr(high_value,3,2),’XX’)-100),’0000′))||’.’||
ltrim(to_char(to_number(substr(high_value,5,2),’XX’),’00’))||’.’||
ltrim(to_char(to_number(substr(high_value,7,2),’XX’),’00’))||’.’||
ltrim(to_char(to_number(substr(high_value,9,2),’XX’)-1,’00’))||’.’||
ltrim(to_char(to_number(substr(high_value,11,2),’XX’)-1,’00’))||’.’||
ltrim(to_char(to_number(substr(high_value,13,2),’XX’)-1,’00’)))
Thanks for the update Gary, I’ll update the main post when I get some spare time.
Gary, Martin, sorry, my bad – i missed that in hurry ( as usual) and put more attention on the code, than on the text, but thank you for correction anyway.
Btw., you can get the rid of all that ltrim’s, if you use to_char(…,’fm00′) – in that case the blank reserved for sign (by positive numbers) will be stripped anyway.
Best regards
Maxim
Martin,
Good effort. But it fails when you have numeric data in a character column.
SQL> desc t1
Name Null? Type
—————————————– ——– —————–
SORTCODE VARCHAR2(6)
SQL> desc x
Name Null? Type
—————————————– ——– —————–
SORTCODE NUMBER
SQL> select min(sortcode), max(sortcode) from t1;
MIN(SO MAX(SO
—— ——
0 999999
SQL> select min(sortcode), max(sortcode) from x;
MIN(SORTCODE) MAX(SORTCODE)
————- ————-
0 999999
SQL> @get_column_low_hi
Enter value for tab_own: TEST
old 40: where owner like upper(‘&tab_own’)
new 40: where owner like upper(‘TEST’)
Enter value for tab_name: T1
old 41: and table_name like upper(nvl(‘&tab_name’,’WHOOPS’)||’%’)
new 41: and table_name like upper(nvl(‘T1′,’WHOOPS’)||’%’)
COLUMN_NAM DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V
———- ———- – ———- ———- ———- ———- ———-
SORTCODE VARCHAR2 N 997256 0 1.0028E-06 100042 999869
SQL>
SQL> /
Enter value for tab_own: TEST
old 40: where owner like upper(‘&tab_own’)
new 40: where owner like upper(‘TEST’)
Enter value for tab_name: X
old 41: and table_name like upper(nvl(‘&tab_name’,’WHOOPS’)||’%’)
new 41: and table_name like upper(nvl(‘X’,’WHOOPS’)||’%’)
COLUMN_NAM DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V
———- ———- – ———- ———- ———- ———- ———-
SORTCODE NUMBER N 1000727 0 9.9927E-07 2 999877
SQL>
Hi Asif,
It looks to me as though it is working – at least, the raw values are correctly being interpreted back into numbers and characters.
Try adding in the columns for the LOW_VALUE and HIGH_VALUE without modifiying them to the script. I think you will see that numerics are represented by a hex sting starting “C” (eg C102 = 1) whereas varchar2 values are represented by a hex number with twice the digits of the string it represents (unless you have multi-byte character sets).
I think what you are seeing is the impact of the sampling done when the table is analyzed. It will not necessarily get the real high and low values, depending on sample size, and the high and low values could well have changed since the last time the table was analyzed. This is the main reason why I want to be able to see the true High and Low values, so I can get a feel for if the values are significantly different to reality.
Does that make sense?
Hi Martin,
Yes, you are right. ESTIMATE_PERCENT is playing an important role here.
SQL> exec dbms_stats.gather_table_stats(user, ‘T1’, estimate_percent => 1);
COLUMN_NAM DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V
———- ———- – ———- ———- ———- ———- ———-
SORTCODE VARCHAR2 N 1000590 0 9.9941E-07 100030 999997
SQL> exec dbms_stats.gather_table_stats(user, ‘T1’, estimate_percent => 10);
COLUMN_NAM DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V
———- ———- – ———- ———- ———- ———- ———-
SORTCODE VARCHAR2 N 1000590 0 9.9941E-07 100030 999997
SQL> exec dbms_stats.gather_table_stats(user, ‘T1’, estimate_percent => 50);
COLUMN_NAM DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V
———- ———- – ———- ———- ———- ———- ———-
SORTCODE VARCHAR2 N 1000658 0 9.9934E-07 0 999998
SQL> exec dbms_stats.gather_table_stats(user, ‘T1’, estimate_percent => 100);
COLUMN_NAM DATA_TYPE M NUM_VALS NUM_NULLS DNSTY LOW_V HI_V
———- ———- – ———- ———- ———- ———- ———-
SORTCODE VARCHAR2 N 1000000 0 .000001 0 999999
Thanks for the wonderful post.
cheers
Thanks for the demo Asif, it saves me putting one up.
One of the main failings of doing a small sample size gather is that you can end up with inaccurate HIGH and LOW values and also, possibly more significantly, lower NUM_DISTINCT value. I have no proof for this, but I suspect something changed from V9 /V10.1 and V10.2 which makes the NUM_DISTINCT more vulnerable to being incorrect with small sample sizes as I have had recent issues on 10.2.0.3 with 0.5% sample sizes on very, very large tables (hundreds of millions of rows) getting very low NUM_DISTINCT value that I cannot remember being an issue with even 0.1% sample sizes against similar sized table on V9 and V10.1
Why is it a problem? Well, I think I will put up a whole post on this but in short the CBO guesses how many rows will match a value outside the min-max range by how far out the range it is. As an example, if the CBO sees a LOW_VALUE of 100 and a HIGH_VALUE of 1100, there are 1000 NUM_DISTINCT values and there are 5000 rows, it will reckon a value in the range will match 0.1% of the num rows, ie 5 rows. If it sees a value outside the range, let us say 50% outside the range at 1600, it will estimate 50% of that value, ie 3 values (rounded up – and I might be missing a tweak of 1 row as well). I know Mr Lewis covers this well in his book and I am sure there are a few good blog entries out there about it too. I’d want to do some tests and proofs to validate this before I do a proper blog though.
[…] Widlake has been busy decoding high_value and low_value for us. He writes, “The table DBA_TAB_COLUMNS holds the LOW_VALUE and HIGH_VALUE for columns. […]
[…] got a little time to enhance my col_stats code with the extra info people provided as comments to this original posting on it. I also now handle the translation of dates correctly and also […]
Thanks Martin, great article.
So good, that I republished it under my own name without giving you any credit (except where your name is embedded in the sql).
Let me know what you think of whats is now my work: http (colon)//sqlunv.com/BI/?p=140
{Just a little edit there to prevent any hit counts benefiting the site of a thief}
Mohamed
Thanks for that. I’m assuming you are not actually the real thief but are spoofing a couple of things to point solidly at the site in question (which does not make it at all clear who the owner is, but you can look up the registration)
This happens. You can see by the style of writing that this site is cobbled together from several sources, some of whom have flawed English language skills and others who are fluent and use colloquialisms.
Some of these thieves have no idea that stealing is wrong “it’s in the public domain already” – but taking other people’s work and putting it forward as your own is only announcing to the world you are not good enough to do your own work. Others don’t care as they have psychopathic tendencies. Others are just naive and stupid.
I occasionally challenge them but, meh, whatever you want “Mohamed”.
[…] You probably have found on the web scripts to decode RAW values from DBA_TAB_COL_STATISTICS’ HIGH_VALUE and LOW_VALUE. Here are two good examples from Jonathan Lewis and Martin Widlake. […]