Format of Date Histograms February 6, 2010
Posted by mwidlake in internals.Tags: data dictionary, histograms, statistics
trackback
Oracle annoyingly handles dates internally in several formats. Dates are stored in tables as seven bytes, each byte representing century, year-in-century, month, day, hour, minute and second, shifted by different amounts. For the high/low values seen in DBA_TAB_COLUMNS or DBA_TAB_COL_STATISTICS, it is stored as a RAW value, where a two-digit hex string represents the century, year-in-century,month etc as before – see this post on decoding high and low column values and check out the comments for corrections.
So what about histograms? You might want to know what is in the histogram for a date column in order to better understand the decisions made by the CBO. Below, I pull out the histogram for an example date column {I’ve trimmed the output a little to save space}:
select table_name ,column_name ,endpoint_value end_val ,endpoint_number rowcount from all_tab_histograms where table_name ='TEST_TABLE_X' and owner ='TEST_1' and column_name = 'PLACED_DT' order by endpoint_number TABLE_NAME COLUMN_NAME END_VAL ROWCOUNT --------------- ------------- ------------- --------------- TEST_TABLE_X PLACED_DT 2,452,258 0 TEST_TABLE_X PLACED_DT 2,454,334 1 TEST_TABLE_X PLACED_DT 2,454,647 2 TEST_TABLE_X PLACED_DT 2,454,737 3 TEST_TABLE_X PLACED_DT 2,454,820 4 TEST_TABLE_X PLACED_DT 2,454,867 5 TEST_TABLE_X PLACED_DT 2,454,929 6 TEST_TABLE_X PLACED_DT 2,454,981 7 TEST_TABLE_X PLACED_DT 2,455,006 8 TEST_TABLE_X PLACED_DT 2,455,024 9 TEST_TABLE_X PLACED_DT 2,455,039 10 TEST_TABLE_X PLACED_DT 2,455,050 11 ... TEST_TABLE_X PLACED_DT 2,455,205 42 TEST_TABLE_X PLACED_DT 2,455,207 43 TEST_TABLE_X PLACED_DT 2,455,209 44 TEST_TABLE_X PLACED_DT 2,455,211 45 TEST_TABLE_X PLACED_DT 2,455,213 46 TEST_TABLE_X PLACED_DT 2,455,215 47 TEST_TABLE_X PLACED_DT 2,455,216 48 TEST_TABLE_X PLACED_DT 2,455,218 49 TEST_TABLE_X PLACED_DT 2,455,220 50 TEST_TABLE_X PLACED_DT 2,455,221 51 TEST_TABLE_X PLACED_DT 2,455,222 52 TEST_TABLE_X PLACED_DT 2,455,222 53 TEST_TABLE_X PLACED_DT 2,455,222 54 TEST_TABLE_X PLACED_DT 2,455,222 55 TEST_TABLE_X PLACED_DT 2,455,223 56 TEST_TABLE_X PLACED_DT 2,455,223 57 TEST_TABLE_X PLACED_DT 2,455,223 58 TEST_TABLE_X PLACED_DT 2,455,223 59 ... TEST_TABLE_X PLACED_DT 2,455,223 69 TEST_TABLE_X PLACED_DT 2,455,223 70 TEST_TABLE_X PLACED_DT 2,455,223 71 TEST_TABLE_X PLACED_DT 2,455,223 72 TEST_TABLE_X PLACED_DT 2,455,223 73 TEST_TABLE_X PLACED_DT 2,455,224 74 TEST_TABLE_X PLACED_DT 2,455,226 75
Well, it looks like a seven digit number so it must be representing the date in a similar way to as described above, yes? No. The format is obviously something different {Oh come ON Oracle, some internal standards would be nice once in a while! 🙂 }
I pulled out the minimum and maximum values from the DBA_TAB_COLUMNS table and translated them:-
COLUMN_NAME NUM_DIST LOW_V HI_V ------------ ---------- ------------------- ------------------- PLACED_DT 375,428 2001-12-13 17:31:38 2010-01-28 23:51:38
So the above low values and high values will pretty much match the first and last values in the histograms, which are 2452258 and 2455226. Any guesses? Those values from the histogram are very close to each other, only 2968 different to cover around 9 years of values. 9 times 365…
The histogram is representing the dates in Julian format, ie number of days since 1st Jan 4712BC. As a quick proof of this:
select to_date('2452258','J'),to_date('2455226','J') from dual; TO_DATE('2452258' TO_DATE('2455226' ----------------- ----------------- 14-DEC-2001 00:00 29-JAN-2010 00:00
Well, what do you know. Very close to the 13th Dec 2001 and 28th Jan 2010
This of course makes sense, storing the date as an ascending numeric means it is simple to calculate the width of the range and how many values per day there are.
Imagine how complex it would be to do this if the date was stored in the bizarre way we humans deal with it – an ascending numeric for year, a cycling 12 digit number for month and a varying cyclic number for the day of the month. And that is ignoring other calendars in common use.
However, I’m looking at this method of representing the date and something bothers me. There is no allowance for the time portion. Maybe column histograms can’t cope with time? I feel a couple of tests coming on…
ADDITIONAL.
If you have seen the comments, you will know that Jonathan commented to ask if I might have truncated/been selective with the data in creating my test table as he is sure there is a fractional section representing the time portion.
Well, I did say I needed to check further so maybe I would have spotted my error on my own… 🙂
This is the script I use to pull out histogram data (and yes, I trimed the output from the script before copying it into this post, so it does not quite match the script):
-- chk_hist.sql -- Martin Widlake 7/4/4 -- pull out the histograms on a table set pause on pages 32 spool chk_hist.lst col owner form A8 col table_name form A15 col column_name form a20 col rowcount form 99,999,999,999 col end_val form 99,999,999,999 select owner ,table_name ,column_name ,endpoint_value end_val ,endpoint_number rowcount from all_tab_histograms where table_name like upper(nvl('&tab_name','WHOOPS')||'%') and owner like upper('&tab_own')||'%' and column_name like upper('&col_name')||'%' order by table_name,column_name,endpoint_number / spool off clear colu
Hmm wonder what happens if I change the “col end_val form 99,999,999,999” so that it would show fractions…
col end_val form 999,999,999.99999999
TABLE_NAME COLUMN_NAM END_VAL ROWCOUNT --------------- ---------- --------------------- ------------ TEST_TABLE_X PLACED_DT 2,452,257.73030093 0 TEST_TABLE_X PLACED_DT 2,454,333.76546296 1 TEST_TABLE_X PLACED_DT 2,454,647.32561343 2 TEST_TABLE_X PLACED_DT 2,454,737.25017361 3 TEST_TABLE_X PLACED_DT 2,454,820.02204861 4 TEST_TABLE_X PLACED_DT 2,454,866.98009259 5 TEST_TABLE_X PLACED_DT 2,454,928.66848380 6 TEST_TABLE_X PLACED_DT 2,454,980.94815972 7 TEST_TABLE_X PLACED_DT 2,455,005.68413194 8 TEST_TABLE_X PLACED_DT 2,455,023.67142361 9 TEST_TABLE_X PLACED_DT 2,455,039.03236111 10 TEST_TABLE_X PLACED_DT 2,455,050.39246528 11
Ahhhhh……
Thanks Jonathan 🙂
It is still a Julian date, but with the time as a fraction as he said.
[…] I first posted about how to translate endpoint_values. I then posted in the second post a week ago about how endpoint_value is limited to 7 or so characters for varchar2 columns. {Addition – this later post discusses the content of endpoinit_value for dates} […]
Martin,
Which version of Oracle ?
Are you sure that your “create table” didn’t simply restrict your example to date-only values in the column ? If you have date with time then the histogram value should show the time up as a decimal fraction of a day.
Hi Jonathan,
Version is 10.2.0.3 (I’m terrible for not saying which version). I know that the dates in question have a time portion, it is shown in the high and low values I pull out of DBA_TAB_COLUMNS.
I am wondering, though, if I have caught myself out with a sql*plus “col format” command. I’ll nip back and check…
Ahhh, indeed, my undoing is my script.
I’ll amend the post to highlight my error.
Thanks for the help Jonathan.
[…] Martin Widlake-Format of Date Histograms […]
This leads to a handy way of displaying the correct endpoint_actual_value for the bins:
select
endpoint_number,
endpoint_value + to_date(1, ‘J’) – 1 actual_endpoint_value
from
dba_histograms
where
table_name = ‘SOMETABLE’
and column_name = ‘SOMECOLUMN’
order by endpoint_number;
Indeed Mark, this would work fine if the histogram is for a date, and quite cunning it is – though I don’t really like the implicit data conversion introduced by you addition of the julian date of one minus one. I have a religious dislike of all and any implicit data conversion.
If you can find another way of doing it, I’d be interested 🙂
The issue being that the date format J converts only integers and there is not a date format conversion for “partial date as a fraction” so you can’t state “to_date(ENDPOINT_VALUE,’J’) or something like “to_date(ENDPOINT_VALUE,’J.XXXX’). If only Oracle held the “fractional value” as number of seconds it would be a simple conversion (‘J.SSSSS’) – which fits in with my whole point of the various ways oracle holds dates. {And yes, I confess, I had forgotten this about Julian dates and the ‘J’ picture format havng to be an integer}
So, you have to convert the date part and then add the remainder with something like:
TO_DATE(TRUNC(endpoint_value),’J’)+(ENDPOINT_VALUE-TRUNC(ENDPOINT_VALUE))
Maybe it is no better than your original suggestion Mark, but to my slightly retentive mind it seems to better define what is being done.
Of course, if the parent column of the histogram is not a date then the conversion is likely to fail, so you would need to wrap the conversion within a decode of the column type (via a join to dba_tab_columns) or write a little function in which you pass in the column data type and the endpoint_value and the correct conversion is done and passed back. I’ve got a script somewhere to create such a function but a quick 5-minute check failed to find it…