jump to navigation

Format of Date Histograms February 6, 2010

Posted by mwidlake in internals.
Tags: , ,
9 comments

Oracle annoyingly handles dates internally in several formats. Dates are stored in tables as seven bytes, each byte representing century, year-in-century, month, day, hour, minute and second, shifted by different amounts. For the high/low values seen in DBA_TAB_COLUMNS or DBA_TAB_COL_STATISTICS, it is stored as a RAW value, where a two-digit hex string represents the century, year-in-century,month etc as before – see this post on decoding high and low column values and check out the comments for corrections.

So what about histograms? You might want to know what is in the histogram for a date column in order to better understand the decisions made by the CBO. Below, I pull out the histogram for an example date column {I’ve trimmed the output a little to save space}:

select table_name
,column_name
,endpoint_value end_val
,endpoint_number rowcount
from all_tab_histograms
where table_name ='TEST_TABLE_X'
and owner ='TEST_1'
and column_name = 'PLACED_DT'
order by endpoint_number

TABLE_NAME      COLUMN_NAME   END_VAL       ROWCOUNT
--------------- ------------- ------------- ---------------
TEST_TABLE_X     PLACED_DT   2,452,258         0
TEST_TABLE_X     PLACED_DT   2,454,334         1
TEST_TABLE_X     PLACED_DT   2,454,647         2
TEST_TABLE_X     PLACED_DT   2,454,737         3
TEST_TABLE_X     PLACED_DT   2,454,820         4
TEST_TABLE_X     PLACED_DT   2,454,867         5
TEST_TABLE_X     PLACED_DT   2,454,929         6
TEST_TABLE_X     PLACED_DT   2,454,981         7
TEST_TABLE_X     PLACED_DT   2,455,006         8
TEST_TABLE_X     PLACED_DT   2,455,024         9
TEST_TABLE_X     PLACED_DT   2,455,039         10
TEST_TABLE_X     PLACED_DT   2,455,050         11
...
TEST_TABLE_X     PLACED_DT   2,455,205         42
TEST_TABLE_X     PLACED_DT   2,455,207         43
TEST_TABLE_X     PLACED_DT   2,455,209         44
TEST_TABLE_X     PLACED_DT   2,455,211         45
TEST_TABLE_X     PLACED_DT   2,455,213         46
TEST_TABLE_X     PLACED_DT   2,455,215         47
TEST_TABLE_X     PLACED_DT   2,455,216         48
TEST_TABLE_X     PLACED_DT   2,455,218         49
TEST_TABLE_X     PLACED_DT   2,455,220         50
TEST_TABLE_X     PLACED_DT   2,455,221         51
TEST_TABLE_X     PLACED_DT   2,455,222         52
TEST_TABLE_X     PLACED_DT   2,455,222         53
TEST_TABLE_X     PLACED_DT   2,455,222         54
TEST_TABLE_X     PLACED_DT   2,455,222         55
TEST_TABLE_X     PLACED_DT   2,455,223         56
TEST_TABLE_X     PLACED_DT   2,455,223         57
TEST_TABLE_X     PLACED_DT   2,455,223         58
TEST_TABLE_X     PLACED_DT   2,455,223         59
...
TEST_TABLE_X     PLACED_DT   2,455,223         69
TEST_TABLE_X     PLACED_DT   2,455,223         70
TEST_TABLE_X     PLACED_DT   2,455,223         71
TEST_TABLE_X     PLACED_DT   2,455,223         72
TEST_TABLE_X     PLACED_DT   2,455,223         73
TEST_TABLE_X     PLACED_DT   2,455,224         74
TEST_TABLE_X     PLACED_DT   2,455,226         75

Well, it looks like a seven digit number so it must be representing the date in a similar way to as described above, yes? No. The format is obviously something different {Oh come ON Oracle, some internal standards would be nice once in a while! :-) }

I pulled out the minimum and maximum values from the DBA_TAB_COLUMNS table and translated them:-

COLUMN_NAME  NUM_DIST   LOW_V               HI_V 
------------ ---------- ------------------- ------------------- 
PLACED_DT    375,428    2001-12-13 17:31:38 2010-01-28 23:51:38 

So the above low values and high values will pretty much match the first and last values in the histograms, which are 2452258 and 2455226. Any guesses? Those values from the histogram are very close to each other, only 2968 different to cover around 9 years of values. 9 times 365…

The histogram is representing the dates in Julian format, ie number of days since 1st Jan 4712BC. As a quick proof of this:

select to_date('2452258','J'),to_date('2455226','J')
from dual;

TO_DATE('2452258' TO_DATE('2455226'
----------------- -----------------
14-DEC-2001 00:00 29-JAN-2010 00:00

Well, what do you know. Very close to the 13th Dec 2001 and 28th Jan 2010

This of course makes sense, storing the date as an ascending numeric means it is simple to calculate the width of the range and how many values per day there are.

Imagine how complex it would be to do this if the date was stored in the bizarre way we humans deal with it – an ascending numeric for year, a cycling 12 digit number for month and a varying cyclic number for the day of the month. And that is ignoring other calendars in common use.

However, I’m looking at this method of representing the date and something bothers me. There is no allowance for the time portion. Maybe column histograms can’t cope with time? I feel a couple of tests coming on…

ADDITIONAL.
If you have seen the comments, you will know that Jonathan commented to ask if I might have truncated/been selective with the data in creating my test table as he is sure there is a fractional section representing the time portion.

Well, I did say I needed to check further so maybe I would have spotted my error on my own… :-)
This is the script I use to pull out histogram data (and yes, I trimed the output from the script before copying it into this post, so it does not quite match the script):

-- chk_hist.sql
-- Martin Widlake 7/4/4
-- pull out the histograms on a table
set pause on pages 32
spool chk_hist.lst
col owner form A8
col table_name form A15
col column_name form a20
col rowcount form 99,999,999,999
col end_val form 99,999,999,999
select owner
,table_name
,column_name
,endpoint_value  end_val
,endpoint_number rowcount
from all_tab_histograms
where table_name like upper(nvl('&tab_name','WHOOPS')||'%')
and  owner like upper('&tab_own')||'%'
and  column_name like upper('&col_name')||'%'
order by table_name,column_name,endpoint_number
/
spool off
clear colu

Hmm wonder what happens if I change the “col end_val form 99,999,999,999″ so that it would show fractions…

col end_val form 999,999,999.99999999

TABLE_NAME      COLUMN_NAM               END_VAL     ROWCOUNT
--------------- ---------- --------------------- ------------
TEST_TABLE_X    PLACED_DT     2,452,257.73030093            0
TEST_TABLE_X    PLACED_DT     2,454,333.76546296            1
TEST_TABLE_X    PLACED_DT     2,454,647.32561343            2
TEST_TABLE_X    PLACED_DT     2,454,737.25017361            3
TEST_TABLE_X    PLACED_DT     2,454,820.02204861            4
TEST_TABLE_X    PLACED_DT     2,454,866.98009259            5
TEST_TABLE_X    PLACED_DT     2,454,928.66848380            6
TEST_TABLE_X    PLACED_DT     2,454,980.94815972            7
TEST_TABLE_X    PLACED_DT     2,455,005.68413194            8
TEST_TABLE_X    PLACED_DT     2,455,023.67142361            9
TEST_TABLE_X    PLACED_DT     2,455,039.03236111           10
TEST_TABLE_X    PLACED_DT     2,455,050.39246528           11

Ahhhhh……

Thanks Jonathan :-)
It is still a Julian date, but with the time as a fraction as he said.

Follow

Get every new post delivered to your Inbox.

Join 171 other followers