jump to navigation

Table High Water Mark and How Empty the Table Is April 30, 2012

Posted by mwidlake in performance, statistics.
Tags: , ,
trackback

On a recent post about reducing the high water mark in a table by shrinking it someone asked if you could tell how much “space” there was in a table that could be recovered by compressing it.

Yes. As pseudo-code:

select number_of_rows*average_row_length. That is how much data there is.
Divided by the block size, to give how many blocks-worth of data you have.
Divided this by a “overhead” fudge factor to allow for wasted space (see later) of about 20%
Compare to the number of blocks used by the table.

Here is some code to do it, showing some extra data and how we calculate the effective percentage of space used (pct_spc_used). It runs against an example table ‘HWM’

select uts.blocks                                     blks_used
      ,uts.avg_space
      ,uts.num_rows
      ,uts.avg_row_len
      ,uts.empty_blocks                               empty_blks
      ,usse.blocks                                    alloc_blks
      ,greatest(uts.blocks,1)/greatest(usse.blocks,1) pct_hwm
      ,uts.num_rows*uts.avg_row_len                   data_in_bytes
      ,(uts.num_rows*uts.avg_row_len)/8192            data_in_blks
      ,((uts.num_rows*uts.avg_row_len)/8192)*1.25     mod_data_in_blks
      ,(((uts.num_rows*uts.avg_row_len)/8192)*1.25)/usse.blocks pct_spc_used
from user_tab_statistics uts
    ,user_segments       usse
where uts.table_name='HWM'
and   uts.table_name=usse.segment_name
/
--
 BLKS_USED  AVG_SPACE   NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS  PCT_HWM
---------- ---------- ---------- ----------- ---------- ---------- --------
DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED
------------- ------------ ---------------- ------------
      1630       1221      50000         213         34       1664   .97957
     10650000   1300.04883       1625.06104    .97659918

I am collecting the data from USER_TAB_STATISTICS and USER_SEGMENTS. For this code to work you must have reasonably good stats against the table.

I’d like to run through some of the columns I have selected:
First of all, columns AVG_SPACE and EMPTY_BLKS are not populated by dbms_stats.gather_table_stats.. They are populated by the deprecated ANALYZE command that you should NOT use to gather table stats since V10 came along. These columns are populated as I did an ANALYZE to get the data in there, as well as a dbms_stats.
Next, I collect BLOCKS_ALLOCATED from DBA_SEGMENTS {and for this demo I just ignored the potential for partitioned tables) and I compare this to the BLOCKS_USED to get the High Water Mark, as a percentage of the table. I do this as EMPTY_BLOCKS is set to zero if you have never used ANALYZE and, even if you did, unless you use this deprecated command all the time, the value will not change.
On the second line of output I calculate the DATA_IN_BYTES as a simple num_rows*avg_row_len, convert it into blocks {for simplicity I do not collect the block size, I know it is 8k}. I then apply my “Overhead” fudge factor. A block has a header, using around 100 bytes {I’ve not checked the exact figure for years}, pctfree can be varied but defaults to 10% and as only whole rows fit, then an average of half a row of space is empty in each “full” block. Thus I reduce the space available by 20-25%. In this case, 25% as my rows are large.
Finally, I compare this modified data volume to the used blocks to get the actual space

Below I run through creating some test data, looking at the stats and my calculated High Water Mark and pct_space_used and finally shrink my table to see if my guesstimate is a reasonable guesstimate:

populate table

drop table hwm purge;
prompt populate table
set feed on
create table hwm
(id   number(10)
,num1 number(2)
,vc1  varchar2(100)
,vc2  varchar2(100)
)
/
insert into hwm 
select rownum
,trunc(dbms_random.value(1,100))
,lpad('A',100,'A')
,lpad('B',100,'B')
from dual connect by level < 50001
/
50000 rows created.

exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'HWM')

--where is the HWM compared to total segment size

 BLKS_USED  AVG_SPACE   NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS  PCT_HWM
---------- ---------- ---------- ----------- ---------- ---------- --------
DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED
------------- ------------ ---------------- ------------
      1630          0      50000         210          0       1664   .97957
     10500000   1281.73828       1602.17285   .962844262

NB AVG_SPC and EMPTY_BLKS are NULL. 
The high water mark is 1630 blocks out of 1664 in the segment
My calculated PCT_SPC_USED is 96%. That is probably close enough.
{remember, the last used block will be only partly used, accounting for a bit of the difference}


-- I will use ANALYZE to fill the missing columns
analyze table hwm compute statistics;

 BLKS_USED  AVG_SPACE   NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS  PCT_HWM
---------- ---------- ---------- ----------- ---------- ---------- --------
DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED
------------- ------------ ---------------- ------------
      1630       1150      50000         213         34       1664   .97957
     10650000   1300.04883       1625.06104    .97659918

Now those two columns are populated. 
Not the slightly different AVG_ROW_LEN even though dbms_stats used 100% (as the table is so small)
and ANALYZE was compute 


-- clear 90% of the data randomly

45461 rows deleted.

 BLKS_USED  AVG_SPACE   NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS  PCT_HWM
---------- ---------- ---------- ----------- ---------- ---------- --------
DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED
------------- ------------ ---------------- ------------
      1630       1150       4539         210         34       1664   .97957
       953190   116.356201       145.445251   .087407002

PCT_HWM is not altered of course but PCT_SPC_USED has dropped dramatically. 
The table is now only 8.7% used, according to my calculations (compared to
90% empty) 

The BLKS_USED does not change. The AVG_SPACE and EMPTY_BLOCKS are the same as I 
used dbms_stats to update the statistics and it DOES NOT ALTER the columns that it does not
populate. Thus you have no idea how recent those columns are if you use a mixture of commands.


-- clear some blocks completely by deleting a range

2181 rows deleted.


 BLKS_USED  AVG_SPACE   NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS  PCT_HWM
---------- ---------- ---------- ----------- ---------- ---------- --------
DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED
------------- ------------ ---------------- ------------
      1630       1150       2358        210         34       1664   .97957
       495180   60.4467773       75.5584717   .045407735

Now the PCT_SPC_USED is down to 4.5%


-- has EMPTY_BLOCKS changed if I use ANALYZE?
analyze table hwm compute statistics;

 BLKS_USED  AVG_SPACE   NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS  PCT_HWM
---------- ---------- ---------- ----------- ---------- ---------- --------
DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED
------------- ------------ ---------------- ------------
      1630       7682       2358         213         34       1664   .97957
       502254   61.3103027       76.6378784   .046056417

As you can see, if I use ANALYZE AVG_SPACE alters. But EMPTY_BLOCKS does not, even though I cleared
a chunk of the table. So there are blocks that can be reused but not listed as empty.

I'll just take a quick side-step and show a quick "oddity" about dbms_stats
--deleting the stats (using dbms_stats)

exec dbms_stats.delete_table_stats(ownname=>user,tabname=>'HWM')

 BLKS_USED  AVG_SPACE   NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS  PCT_HWM
---------- ---------- ---------- ----------- ---------- ---------- --------
DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED
------------- ------------ ---------------- ------------
                                                              1664

dbms_stats.delete_table_statistics clears ALL statistics, even the ones it does not populate

--and now collect them via dbms_stats again

 BLKS_USED  AVG_SPACE   NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS  PCT_HWM
---------- ---------- ---------- ----------- ---------- ---------- --------
DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED
------------- ------------ ---------------- ------------
      1630          0       2358         210          0       1664   .97957
       495180   60.4467773       75.5584717   .045407735


--now to shrink the table
alter table hwm enable row movement;
alter table hwm shrink space;


 BLKS_USED  AVG_SPACE   NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS  PCT_HWM
---------- ---------- ---------- ----------- ---------- ---------- --------
DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED
------------- ------------ ---------------- ------------
        72          0       2358         210          0         80   .90000
       495180   60.4467773       75.5584717   .944480896

So I calculated that there was about 75 blocks of data in that table. having shrunk it, I was
a little bit out.

Having run through those examples we can see that the accuracy of the PCT_SPC_USED is down to the fudge factor employed but is probably close enough at 25%. After all, you are only likely to shrink a table that very clearly would benefit from it.

Comments»

1. Shrinking Tables to Aid Full Scans « Martin Widlake's Yet Another Oracle Blog - April 30, 2012

[…] Mark is higher than it needs to be. Prompted by one of the comments, I wrote a follow-up post on finding the High Water Mark and tables that consist mostly of empty space, which would be candidates for […]


Leave a comment