jump to navigation

Format of Date Histograms February 6, 2010

Posted by mwidlake in internals.
Tags: , ,
trackback

Oracle annoyingly handles dates internally in several formats. Dates are stored in tables as seven bytes, each byte representing century, year-in-century, month, day, hour, minute and second, shifted by different amounts. For the high/low values seen in DBA_TAB_COLUMNS or DBA_TAB_COL_STATISTICS, it is stored as a RAW value, where a two-digit hex string represents the century, year-in-century,month etc as before – see this post on decoding high and low column values and check out the comments for corrections.

So what about histograms? You might want to know what is in the histogram for a date column in order to better understand the decisions made by the CBO. Below, I pull out the histogram for an example date column {I’ve trimmed the output a little to save space}:

select table_name
,column_name
,endpoint_value end_val
,endpoint_number rowcount
from all_tab_histograms
where table_name ='TEST_TABLE_X'
and owner ='TEST_1'
and column_name = 'PLACED_DT'
order by endpoint_number

TABLE_NAME      COLUMN_NAME   END_VAL       ROWCOUNT
--------------- ------------- ------------- ---------------
TEST_TABLE_X     PLACED_DT   2,452,258         0
TEST_TABLE_X     PLACED_DT   2,454,334         1
TEST_TABLE_X     PLACED_DT   2,454,647         2
TEST_TABLE_X     PLACED_DT   2,454,737         3
TEST_TABLE_X     PLACED_DT   2,454,820         4
TEST_TABLE_X     PLACED_DT   2,454,867         5
TEST_TABLE_X     PLACED_DT   2,454,929         6
TEST_TABLE_X     PLACED_DT   2,454,981         7
TEST_TABLE_X     PLACED_DT   2,455,006         8
TEST_TABLE_X     PLACED_DT   2,455,024         9
TEST_TABLE_X     PLACED_DT   2,455,039         10
TEST_TABLE_X     PLACED_DT   2,455,050         11
...
TEST_TABLE_X     PLACED_DT   2,455,205         42
TEST_TABLE_X     PLACED_DT   2,455,207         43
TEST_TABLE_X     PLACED_DT   2,455,209         44
TEST_TABLE_X     PLACED_DT   2,455,211         45
TEST_TABLE_X     PLACED_DT   2,455,213         46
TEST_TABLE_X     PLACED_DT   2,455,215         47
TEST_TABLE_X     PLACED_DT   2,455,216         48
TEST_TABLE_X     PLACED_DT   2,455,218         49
TEST_TABLE_X     PLACED_DT   2,455,220         50
TEST_TABLE_X     PLACED_DT   2,455,221         51
TEST_TABLE_X     PLACED_DT   2,455,222         52
TEST_TABLE_X     PLACED_DT   2,455,222         53
TEST_TABLE_X     PLACED_DT   2,455,222         54
TEST_TABLE_X     PLACED_DT   2,455,222         55
TEST_TABLE_X     PLACED_DT   2,455,223         56
TEST_TABLE_X     PLACED_DT   2,455,223         57
TEST_TABLE_X     PLACED_DT   2,455,223         58
TEST_TABLE_X     PLACED_DT   2,455,223         59
...
TEST_TABLE_X     PLACED_DT   2,455,223         69
TEST_TABLE_X     PLACED_DT   2,455,223         70
TEST_TABLE_X     PLACED_DT   2,455,223         71
TEST_TABLE_X     PLACED_DT   2,455,223         72
TEST_TABLE_X     PLACED_DT   2,455,223         73
TEST_TABLE_X     PLACED_DT   2,455,224         74
TEST_TABLE_X     PLACED_DT   2,455,226         75

Well, it looks like a seven digit number so it must be representing the date in a similar way to as described above, yes? No. The format is obviously something different {Oh come ON Oracle, some internal standards would be nice once in a while! :-) }

I pulled out the minimum and maximum values from the DBA_TAB_COLUMNS table and translated them:-

COLUMN_NAME  NUM_DIST   LOW_V               HI_V 
------------ ---------- ------------------- ------------------- 
PLACED_DT    375,428    2001-12-13 17:31:38 2010-01-28 23:51:38 

So the above low values and high values will pretty much match the first and last values in the histograms, which are 2452258 and 2455226. Any guesses? Those values from the histogram are very close to each other, only 2968 different to cover around 9 years of values. 9 times 365…

The histogram is representing the dates in Julian format, ie number of days since 1st Jan 4712BC. As a quick proof of this:

select to_date('2452258','J'),to_date('2455226','J')
from dual;

TO_DATE('2452258' TO_DATE('2455226'
----------------- -----------------
14-DEC-2001 00:00 29-JAN-2010 00:00

Well, what do you know. Very close to the 13th Dec 2001 and 28th Jan 2010

This of course makes sense, storing the date as an ascending numeric means it is simple to calculate the width of the range and how many values per day there are.

Imagine how complex it would be to do this if the date was stored in the bizarre way we humans deal with it – an ascending numeric for year, a cycling 12 digit number for month and a varying cyclic number for the day of the month. And that is ignoring other calendars in common use.

However, I’m looking at this method of representing the date and something bothers me. There is no allowance for the time portion. Maybe column histograms can’t cope with time? I feel a couple of tests coming on…

ADDITIONAL.
If you have seen the comments, you will know that Jonathan commented to ask if I might have truncated/been selective with the data in creating my test table as he is sure there is a fractional section representing the time portion.

Well, I did say I needed to check further so maybe I would have spotted my error on my own… :-)
This is the script I use to pull out histogram data (and yes, I trimed the output from the script before copying it into this post, so it does not quite match the script):

-- chk_hist.sql
-- Martin Widlake 7/4/4
-- pull out the histograms on a table
set pause on pages 32
spool chk_hist.lst
col owner form A8
col table_name form A15
col column_name form a20
col rowcount form 99,999,999,999
col end_val form 99,999,999,999
select owner
,table_name
,column_name
,endpoint_value  end_val
,endpoint_number rowcount
from all_tab_histograms
where table_name like upper(nvl('&tab_name','WHOOPS')||'%')
and  owner like upper('&tab_own')||'%'
and  column_name like upper('&col_name')||'%'
order by table_name,column_name,endpoint_number
/
spool off
clear colu

Hmm wonder what happens if I change the “col end_val form 99,999,999,999″ so that it would show fractions…

col end_val form 999,999,999.99999999

TABLE_NAME      COLUMN_NAM               END_VAL     ROWCOUNT
--------------- ---------- --------------------- ------------
TEST_TABLE_X    PLACED_DT     2,452,257.73030093            0
TEST_TABLE_X    PLACED_DT     2,454,333.76546296            1
TEST_TABLE_X    PLACED_DT     2,454,647.32561343            2
TEST_TABLE_X    PLACED_DT     2,454,737.25017361            3
TEST_TABLE_X    PLACED_DT     2,454,820.02204861            4
TEST_TABLE_X    PLACED_DT     2,454,866.98009259            5
TEST_TABLE_X    PLACED_DT     2,454,928.66848380            6
TEST_TABLE_X    PLACED_DT     2,454,980.94815972            7
TEST_TABLE_X    PLACED_DT     2,455,005.68413194            8
TEST_TABLE_X    PLACED_DT     2,455,023.67142361            9
TEST_TABLE_X    PLACED_DT     2,455,039.03236111           10
TEST_TABLE_X    PLACED_DT     2,455,050.39246528           11

Ahhhhh……

Thanks Jonathan :-)
It is still a Julian date, but with the time as a fraction as he said.

About these ads

Comments»

1. Decrypting Histogram Data #3 – is 7 or 15 Characters Significant? « Martin Widlake’s Yet Another Oracle Blog - February 6, 2010

[...] I first posted about how to translate endpoint_values. I then posted in the second post a week ago about how endpoint_value is limited to 7 or so characters for varchar2 columns. {Addition – this later post discusses the content of endpoinit_value for dates} [...]

2. Jonathan Lewis - February 8, 2010

Martin,
Which version of Oracle ?
Are you sure that your “create table” didn’t simply restrict your example to date-only values in the column ? If you have date with time then the histogram value should show the time up as a decimal fraction of a day.

mwidlake - February 8, 2010

Hi Jonathan,
Version is 10.2.0.3 (I’m terrible for not saying which version). I know that the dates in question have a time portion, it is shown in the high and low values I pull out of DBA_TAB_COLUMNS.

I am wondering, though, if I have caught myself out with a sql*plus “col format” command. I’ll nip back and check…

mwidlake - February 8, 2010

Ahhh, indeed, my undoing is my script.
I’ll amend the post to highlight my error.
Thanks for the help Jonathan.

3. Blogroll Report 05/02/2009 – 12/02/2010 « Coskan’s Approach to Oracle - March 18, 2010

[...] Martin Widlake-Format of Date Histograms [...]

4. Mark Steward - September 30, 2011

This leads to a handy way of displaying the correct endpoint_actual_value for the bins:

select
endpoint_number,
endpoint_value + to_date(1, ‘J’) – 1 actual_endpoint_value
from
dba_histograms
where
table_name = ‘SOMETABLE’
and column_name = ‘SOMECOLUMN’
order by endpoint_number;

mwidlake - September 30, 2011

Indeed Mark, this would work fine if the histogram is for a date, and quite cunning it is – though I don’t really like the implicit data conversion introduced by you addition of the julian date of one minus one. I have a religious dislike of all and any implicit data conversion.

Mark Steward - September 30, 2011

If you can find another way of doing it, I’d be interested :)

mwidlake - September 30, 2011

The issue being that the date format J converts only integers and there is not a date format conversion for “partial date as a fraction” so you can’t state “to_date(ENDPOINT_VALUE,’J’) or something like “to_date(ENDPOINT_VALUE,’J.XXXX’). If only Oracle held the “fractional value” as number of seconds it would be a simple conversion (‘J.SSSSS’) – which fits in with my whole point of the various ways oracle holds dates. {And yes, I confess, I had forgotten this about Julian dates and the ‘J’ picture format havng to be an integer}

So, you have to convert the date part and then add the remainder with something like:

TO_DATE(TRUNC(endpoint_value),’J’)+(ENDPOINT_VALUE-TRUNC(ENDPOINT_VALUE))

Maybe it is no better than your original suggestion Mark, but to my slightly retentive mind it seems to better define what is being done.

Of course, if the parent column of the histogram is not a date then the conversion is likely to fail, so you would need to wrap the conversion within a decode of the column type (via a join to dba_tab_columns) or write a little function in which you pass in the column data type and the endpoint_value and the correct conversion is done and passed back. I’ve got a script somewhere to create such a function but a quick 5-minute check failed to find it…


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 160 other followers

%d bloggers like this: