Table High Water Mark and How Empty the Table Is April 30, 2012
Posted by mwidlake in performance, statistics.Tags: data dictionary, performance, statistics
trackback
On a recent post about reducing the high water mark in a table by shrinking it someone asked if you could tell how much “space” there was in a table that could be recovered by compressing it.
Yes. As pseudo-code:
select number_of_rows*average_row_length. That is how much data there is.
Divided by the block size, to give how many blocks-worth of data you have.
Divided this by a “overhead” fudge factor to allow for wasted space (see later) of about 20%
Compare to the number of blocks used by the table.
Here is some code to do it, showing some extra data and how we calculate the effective percentage of space used (pct_spc_used). It runs against an example table ‘HWM’
select uts.blocks blks_used ,uts.avg_space ,uts.num_rows ,uts.avg_row_len ,uts.empty_blocks empty_blks ,usse.blocks alloc_blks ,greatest(uts.blocks,1)/greatest(usse.blocks,1) pct_hwm ,uts.num_rows*uts.avg_row_len data_in_bytes ,(uts.num_rows*uts.avg_row_len)/8192 data_in_blks ,((uts.num_rows*uts.avg_row_len)/8192)*1.25 mod_data_in_blks ,(((uts.num_rows*uts.avg_row_len)/8192)*1.25)/usse.blocks pct_spc_used from user_tab_statistics uts ,user_segments usse where uts.table_name='HWM' and uts.table_name=usse.segment_name / -- BLKS_USED AVG_SPACE NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS PCT_HWM ---------- ---------- ---------- ----------- ---------- ---------- -------- DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED ------------- ------------ ---------------- ------------ 1630 1221 50000 213 34 1664 .97957 10650000 1300.04883 1625.06104 .97659918
I am collecting the data from USER_TAB_STATISTICS and USER_SEGMENTS. For this code to work you must have reasonably good stats against the table.
I’d like to run through some of the columns I have selected:
First of all, columns AVG_SPACE and EMPTY_BLKS are not populated by dbms_stats.gather_table_stats.. They are populated by the deprecated ANALYZE command that you should NOT use to gather table stats since V10 came along. These columns are populated as I did an ANALYZE to get the data in there, as well as a dbms_stats.
Next, I collect BLOCKS_ALLOCATED from DBA_SEGMENTS {and for this demo I just ignored the potential for partitioned tables) and I compare this to the BLOCKS_USED to get the High Water Mark, as a percentage of the table. I do this as EMPTY_BLOCKS is set to zero if you have never used ANALYZE and, even if you did, unless you use this deprecated command all the time, the value will not change.
On the second line of output I calculate the DATA_IN_BYTES as a simple num_rows*avg_row_len, convert it into blocks {for simplicity I do not collect the block size, I know it is 8k}. I then apply my “Overhead” fudge factor. A block has a header, using around 100 bytes {I’ve not checked the exact figure for years}, pctfree can be varied but defaults to 10% and as only whole rows fit, then an average of half a row of space is empty in each “full” block. Thus I reduce the space available by 20-25%. In this case, 25% as my rows are large.
Finally, I compare this modified data volume to the used blocks to get the actual space
Below I run through creating some test data, looking at the stats and my calculated High Water Mark and pct_space_used and finally shrink my table to see if my guesstimate is a reasonable guesstimate:
populate table drop table hwm purge; prompt populate table set feed on create table hwm (id number(10) ,num1 number(2) ,vc1 varchar2(100) ,vc2 varchar2(100) ) / insert into hwm select rownum ,trunc(dbms_random.value(1,100)) ,lpad('A',100,'A') ,lpad('B',100,'B') from dual connect by level < 50001 / 50000 rows created. exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'HWM') --where is the HWM compared to total segment size BLKS_USED AVG_SPACE NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS PCT_HWM ---------- ---------- ---------- ----------- ---------- ---------- -------- DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED ------------- ------------ ---------------- ------------ 1630 0 50000 210 0 1664 .97957 10500000 1281.73828 1602.17285 .962844262 NB AVG_SPC and EMPTY_BLKS are NULL. The high water mark is 1630 blocks out of 1664 in the segment My calculated PCT_SPC_USED is 96%. That is probably close enough. {remember, the last used block will be only partly used, accounting for a bit of the difference} -- I will use ANALYZE to fill the missing columns analyze table hwm compute statistics; BLKS_USED AVG_SPACE NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS PCT_HWM ---------- ---------- ---------- ----------- ---------- ---------- -------- DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED ------------- ------------ ---------------- ------------ 1630 1150 50000 213 34 1664 .97957 10650000 1300.04883 1625.06104 .97659918 Now those two columns are populated. Not the slightly different AVG_ROW_LEN even though dbms_stats used 100% (as the table is so small) and ANALYZE was compute -- clear 90% of the data randomly 45461 rows deleted. BLKS_USED AVG_SPACE NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS PCT_HWM ---------- ---------- ---------- ----------- ---------- ---------- -------- DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED ------------- ------------ ---------------- ------------ 1630 1150 4539 210 34 1664 .97957 953190 116.356201 145.445251 .087407002 PCT_HWM is not altered of course but PCT_SPC_USED has dropped dramatically. The table is now only 8.7% used, according to my calculations (compared to 90% empty) The BLKS_USED does not change. The AVG_SPACE and EMPTY_BLOCKS are the same as I used dbms_stats to update the statistics and it DOES NOT ALTER the columns that it does not populate. Thus you have no idea how recent those columns are if you use a mixture of commands. -- clear some blocks completely by deleting a range 2181 rows deleted. BLKS_USED AVG_SPACE NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS PCT_HWM ---------- ---------- ---------- ----------- ---------- ---------- -------- DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED ------------- ------------ ---------------- ------------ 1630 1150 2358 210 34 1664 .97957 495180 60.4467773 75.5584717 .045407735 Now the PCT_SPC_USED is down to 4.5% -- has EMPTY_BLOCKS changed if I use ANALYZE? analyze table hwm compute statistics; BLKS_USED AVG_SPACE NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS PCT_HWM ---------- ---------- ---------- ----------- ---------- ---------- -------- DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED ------------- ------------ ---------------- ------------ 1630 7682 2358 213 34 1664 .97957 502254 61.3103027 76.6378784 .046056417 As you can see, if I use ANALYZE AVG_SPACE alters. But EMPTY_BLOCKS does not, even though I cleared a chunk of the table. So there are blocks that can be reused but not listed as empty. I'll just take a quick side-step and show a quick "oddity" about dbms_stats --deleting the stats (using dbms_stats) exec dbms_stats.delete_table_stats(ownname=>user,tabname=>'HWM') BLKS_USED AVG_SPACE NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS PCT_HWM ---------- ---------- ---------- ----------- ---------- ---------- -------- DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED ------------- ------------ ---------------- ------------ 1664 dbms_stats.delete_table_statistics clears ALL statistics, even the ones it does not populate --and now collect them via dbms_stats again BLKS_USED AVG_SPACE NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS PCT_HWM ---------- ---------- ---------- ----------- ---------- ---------- -------- DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED ------------- ------------ ---------------- ------------ 1630 0 2358 210 0 1664 .97957 495180 60.4467773 75.5584717 .045407735 --now to shrink the table alter table hwm enable row movement; alter table hwm shrink space; BLKS_USED AVG_SPACE NUM_ROWS AVG_ROW_LEN EMPTY_BLKS ALLOC_BLKS PCT_HWM ---------- ---------- ---------- ----------- ---------- ---------- -------- DATA_IN_BYTES DATA_IN_BLKS MOD_DATA_IN_BLKS PCT_SPC_USED ------------- ------------ ---------------- ------------ 72 0 2358 210 0 80 .90000 495180 60.4467773 75.5584717 .944480896 So I calculated that there was about 75 blocks of data in that table. having shrunk it, I was a little bit out.
Having run through those examples we can see that the accuracy of the PCT_SPC_USED is down to the fudge factor employed but is probably close enough at 25%. After all, you are only likely to shrink a table that very clearly would benefit from it.
[…] Mark is higher than it needs to be. Prompted by one of the comments, I wrote a follow-up post on finding the High Water Mark and tables that consist mostly of empty space, which would be candidates for […]