jump to navigation

Learning stuff you know you don’t know July 17, 2009

Posted by mwidlake in Uncategorized.

A Friday, off-topic post ūüôā

OK, how many people remember the much maligned speech by Donald Rumsfeld about {and I paraphrase} “things we know, things we don’t know, things we know we don’t know and things we don’t know we don’t know”.

If you want, you can watch the dear chap himself saying it here. {I’m afraid this is a slightly maligning version, but not as bad as most I found}.

Leaving aside it was Mr Rumsfeld who said it and thus accounted for some of the endless scorn poured upon the speech, I think it was a very, very valid observation by his speech writers. I always think that people laughing at the “idiocy” of the speech have missed something. The idea is nothing new, Henry David Therou for example said

“To know that we know what we know, and that we do not know what we do not know, that is true knowledge.”

{an excellent quote, for which I thank Wikipedia and Google  РI knew I did not know any good quotes but I knew they existed and I just did not know them}.

I think it is very valid to realise you don’t know stuff in your chosen area of knowledge but that you recognise that lack and that there is thus more to learn. It also means that people who do already know an awful lot in one area of knowledge, they probably have gaps you can help fill. So speak up when you think you know something, even¬†to¬†acknowledged¬†experts. It gives you a really warm glow inside when you do teach somone you regard as an expert something new. And you know what? Most of those experts will be really appreciative for you even trying, let alone succeeding. ¬†

I could¬†of course¬† be saying all this to give my utterences about not knowing stuff a veneer of philosphical respectability, but I really¬†do believe and have done for¬†many years now, that we should be able to say we do not know something and it be respected that we acknowledge that gap. I don’t take glory in my stupidity {for even my ego could not hold up under all that amount of glory ūüôā¬†} but I like to feel that if I can be honest about my stupidity I can thus be honest about my knowledge, and¬†be quietly¬†happy with¬†what I know and what I can do.¬†

OK, so none of this is specific to Oracle but it is no less valid for being applicable to all knowledge. I guess I’m trying to say that I think it is OK {and should be publically acceptable}¬†to admit not knowing stuff and it is equally OK and good for all of us to try and teach others,¬†even if we feel individually that we maybe do not have that much to add. ¬†

What prompted this philosophical outpouring? I found out today that I don’t even know how to peel a banana. And probably most of you don’t either. But Monkeys do.

This is the video.

Go on, watch the video, the boss is in a meeting. It makes you think….


Cost of full table scans #2 June 30, 2009

Posted by mwidlake in performance, Uncategorized.
Tags: , ,

This is the second post in a series on Full Table Scans. Sorry to anyone waiting for part two, I got distracted {I suspect very few people were actually waiting :-)} I make up for it by making posting this veeeery long.

Post One – I got several comments, mostly pre-empting what I planned to say next {I am not complaining as such, I like to get feedback and Randolf Geist in particular was being very helpful}, but I’m going to say what I wanted to say anyway, especially as I was being purposefully misleading in my first post…

I’m attempting to show three things in this series of posts.

  1. How table scans are costed by the CBO under 10g and 11g, as it is quite different from before.
  2. That simple test cases of what you knew once can fail to work and leave you confused and questioning your knowledge.
  3. How levels of “truth” help you understand more as you dig deeper.
  4. As a fourth I might touch on how Oracle’s calculated cost is not actually always reflecting how it executes sql, but I need to work on that a little.

If you thought that db_file_mutliblock_read_count is the number of blocks that oracle reads in one go when scanning tables and indexes and that this improves performance, that will help you understand what is going on inside your Oracle database.
If you think that increasing the value will make full table scans look more efficient to Oracle and thus it will favour them, you are again right and it may well help you tune a system.
However, you may know more – you may know that Oracle modifies the value of the MBRC to make it more realistic. As the MBRC gets larger, things like the size of the segment, hitting the end of individual extents and the physical time taken to read data off disk and chuck it across the network makes the efficiency gains of these larger values less significant. Each layer of knowledge helps a little more. Then the technology underneath shifts and you may not realise this as the older knowledge still kind of helps.

I’m not trying to be smart or “I know better” here, I’ve posted before that none of us knows this stuff until we are told and I certainly did not know all of what is in these posts when I started preparing them 3 or 4 weeks ago. Some of what I plan to say I am still not sure of. But I’ll postpone the philosphy now for a later blog post, though as I have a rough version already written, here it is if you want.

Back to the technology and the costing of full table scans.

Here is my test script again.

It creates a couple of tables, the one of interest is TEST_BIG, holding 40,000 rows. TEST_BIG is 2304 blocks big but only 2267 blocks are occupied. As nothing has been deleted from the table, this is also the high water mark, ie the highest point in the table into which data has been inserted. Oracle keeps track of this high water mark. The table is in a locally managed, autoallocate tablespace with automatic segment space management. All the defaults.

This is my environment:-
—————————— ——————–
cpu_count 2
db_block_size 8192
db_file_multiblock_read_count 128
optimizer_mode ALL_ROWS
sga_target 0
sort_area_size 65536

Just to re-itterate, I have not gathered system statistics -the system is “out of the box”.

From Oracle 10 onwards, there are some default system statistics set. You can see these by looking at the sys.aux_stats$ table (click on “view plain” to see it properly laid out):-

 select sname,pname,pval1,pval2
 order by sname,pname

SNAME           PNAME                PVAL1 PVAL2
SYSSTATS_INFO   DSTART                     10-15-2007 11:32
SYSSTATS_INFO   DSTOP                      10-15-2007 11:32
SYSSTATS_INFO   FLAGS                    1
SYSSTATS_MAIN   IOSEEKTIM               10

There are only 3 values set {on both version 11g and 10g},

  • CPUSPEEDNW – an estimate of the speed of the CPU. This value is different on different servers. Maybe. Both my old V10 windows test boxes have a value of 484, even though one is a truely old laptop, yet the linux servers I have access to all have different values, ranging from 713 to 1778.
  • IOSEEKTIME – This is the time for the IO subsystem to locate the start of an item. In both v10 and 11, this is 10 milliseconds
  • IOTFRSPEED – This is the volume of data that can be read from the io subsystem in 1 millisecond – 4096 bytes or 4k, again in V10 and 11

These default figures for IOSEEKTIME and IOTFRSPEED are fairly conservative, especially as they have not changed since V10, but then seek time has not really improved (especially on single discs).

In my experience, and looking around on the net, it seems that many sites on V10 upwards have not gathered system stats {BTW see the end of this post for some clarification on system statistics}. This seems to be due to one of several factors:

  • Some sites do not have a DBA or one who knows about system statistics {I’m not criticising, if you have been labelled a DBA and thrown the whole of the Oracle stack to look after, there is a lot to get your head around}.
  • Some DBAs think the system stats are gathered by the automatic stats job that {sort of}¬†runs every night .
  • Changing to using system statistics would change execution plans and thus need full regression testing, some re-tuning, which there is no time or money for.
  • Hey, it works, why change things {this can also be viewed as ¬†“I am fundamentally scared of change”ūüôā }.

I can understand all of the above, and as a result of them¬†many of us benefit from understanding how the default system statistics work. I would suggest it is better to gather system statitics than not as you are giving the oracle CBO optimizer more accurate information about the actual speed of your cpu, your IO and the difference between single block reads and multi-block reads, but life is not perfect. {NB I’ve heard on the grapevine that you should not gather system stats if you run Oracle Apps 11i but I am no expert on Oracle Apps!}

So what does Oracle do with the “out of the box” information? It calculate values for SREADTIME {time to read a single block} and MREADTIM {the time taken to do a multi-block read. If you think about how discs work, to read something the head of the disc must locate where the data is and physically move to it – this is the seek time – and then read the data.
So for a single block (8k) the time taken is

IOSEEKTIME+(size of one block/IOTFRSPEED).
ie 10+(8192bytes/4096 bytes) = 12.

For a multi block read, then the disc head will seek once and then read db_file_multiblock_read_count number of blocks at once {just accept this for now, please}. In my case, 128*8k.

IOSEEKTIME+( (MBRC * Size of one block) / IOTFRSPEED)
ie 10+( (128*8192) / 4096)
= 10+(1048576/4096)
=10+(1024k/4k) =10+256 =236
I don’t know about you but I can’t divide 1048576 by 4096 just like that but 1024/4, yep easier.

The CBO can only cost something if it can treat all parts in the same units, so it converts those multiblock read counts to equivalent single block read counts by dividing the time for the multi block read by the time of the single read.
236/12 = 19.6667.
Oracle then rounds the figure to the next whole number, so 20.

{Incidentally, it does this with the CPU cost as well.
Oracle makes a guess at how much cpu effort an activity will take and then divides it by the CPU speed and the time for a single block read – I’m being a little simplistic here, but it will do for now}.

So, a single block read is cost 1
A multi block read of 128 is cost 20, and reads 128 times as much.

In previous versions of Oracle this was all different. In 8i, single block reads and multi block reads were costed the same. In 9i without cpu costing {no cpu costing being the default and I can’t remember seeing it changed on a live system I saw} a “fudge factor” was introduced where values for MBRC were reduced down to better match reality. And in my first post I misleadingly referred to a Fudge Factor and gave a table. Anyone who was used to this fudge factor may have noticed my numbers did not match, but then who of us remembers that the fudge factor for 8 as the multi block read count is 6.19? ūüôā

{That table is still useful, it shows an effective ratio for the MBRC, IF you have gatherd no stats AND you have an 8k block size. OK, maybe it is not so useful…}

Anyway, as you can now see from the above formulae, Oracle on 10g and 11g now uses a fairly sensible piece of logic to decide on the relative costs of single and multi block reads. The CBO formula is basically saying you save the cost of multiple seek times when reading multiple blocks, but the time to transfer data is directly related to the volumeof data read. This is probably reasonable on a system with a simple IO setup (like single disks or very basic RAID) but will break down with high-end storage which has read-ahead and SAN caches and the like.

OK, I have to address one more issue before I demonstrate that the formula is correct. And it is something that I did not know and confused the hell out of me when I decided to do a “simple test” of what I thought I knew, and prompted me to go on endlessly about how doing a simple test to prove what you have learned can go wrong and send you crazy with confusion. And thanks again to Randolf Geist for pointing me in the right direction.

At the end of the last post I had shown that my multiblock read count was 128 but acted like it was 8.
ie I got a cost of 617 for scanning TEST_BIG. When I manually set my MBRC to 8 I also got a cost of 617. When I manually set a value for MBRC of 128, it gave a cost of 395. Huh?!

Coskan alluded to this bit, but I missed it – I was ignorant about what he was saying.

On a newly created Oralce 11.1 test database, created as part of a simple instal on windows vista, you have db_file_multiblock_read_count set to:-

show parameter db_file_multiblock_read
NAME Value
db_file_multiblock_read_count 128

But there is a hidden parameter:-
—————————- —– —–
_db_file_optimizer_read_count 8
multiblock read count for regular clients

{To see hidden parameters you need to be logged in as sys or use a trick like intermediate views – here is a script you can run as sys to see them. Warning, there are a LOT of them! I’ll do yet another post about hidden parameters sometime soon.}

How the hell are we supposed to know this???! Do Oracle do this on purposed to keep us awake at night worrying about this???.

On Oracle 10, this hidden initialization parameter defaults to 8 and MBRC defaults to 8 {usually – I just recreated a new test database and both are set to 16 “out of the box”}. If you change MBRC, the hidden parameter alters too, to the same value, so you do not notice it so much.
On 11, MBRC is set to 128 and the hidden parameter is set to 8 initially. Once you change MBRC, the hidden _db_file_optimizer_read_count is also changed to match.

So, as my first paragraph on the first posting said, you go on to a test database to check something and you get confused as the very first “simple” test does not give you the results you expected.

OK, does the formula work?
In the below table I list:

  • The MBRC that I set by altering my session
  • The cost as reported by Explain Plan
  • The calculated MREADTIM¬†= IOSEEKTIM+ (MBRC*blocksize/IOTFRSPEED)
  • The calculated SREADTIM¬† = IOSEEKTIM+(blocksize/IOTFRSPEED)
  • MREATIME/SREADTIM { I’ll call it MT/ST}
  • Blocks in table/MBRC {I’ll call it blks/MBRC
  • (blks/MBRC)¬† * (MT/ST) to get a calculated cost in single block reads
MBRC Cost MRT SRT MT/ST blks/  Calc cost
2   1325  14    12  1.167   1133.5     1323
3   1010  16    12  1.333   755.667   1008
4    853   18    12  1.5       566.75     851
6    695   22    12  1.833   377.833   693
8    617   26    12  2.167   283.375   615
10  569   30    12  2.5       226.7       567
12  538   34    12  2.833   188.917   536
16  499   42    12  3.5       141.688   496
20  475   50    12  4.167   113.35     473
24  459   58    12  4.833   94.458     457
32  440   74    12  6.167   70.844     437
33  438   76    12  6.333   68.697     436
64  410   138  12  11.5    35.422     408
128 395  266  12  22.167  17.711   393

Yes ūüôā
The calculated cost is always 2 less than the cost given by the CBO. I know that under Oracle 10 1 is added to the cost for reading the table header. I am presuming that an additional 1 has been added for some reason under 11.

That will do for now.

Oh, as promised, some comments on system statistics

Oracle System statistics are NOT statistics collected on the tables and indexes owned by users SYS or SYSTEM. These would be “Dictionary Statistics”. These can be collected via specific calls to the DBMS_STATS.GATHER_DICTIONARY_STATISTICS and {if you have not turned off the automatics gathering of object statistics} are gathered automatically during weekday nights and over the weekend by the automated job, which gathers table and index stats when it feels they are needed. I’ll be posting about that {probably at inordinate length} in a month or so.
System Statistics are statistics about how the underlying computer system works, specifically cpu speed and IO subsystem performance.
Even Oracle Support can get confused over this. You might like to check out this unfortunate experience on the topic. I have to say, I have much sympathy with Piet as I had very similar issues trying to discuss automated stats gathering with Oracle a couple of years ago. Having said that, I in the end got good support on the topic.

UNIX SIG May 2009 May 20, 2009

Posted by mwidlake in Uncategorized.

This is just a quick post today, I was going to continue on my theme of select count(*) but it’s been a long, tiring day. I’ll do it tomorrow.

I was off over to Wolverhampton to present at the oracle user group UNIX SIG (special interest group) today. I felt there was a need for more introductory presentations as they tend to be expert ones these days, people talking about the latest, coolest things they had found out. Great though such talks are, for many users, these can be a little beyond where they are and discussions I have had suggest that a lack of easily accessible talks can turn people off SIGs. So I offered an introductory talk on tuning, how to get going with it. The talk was graceously accepted by Patrick Hurley, currently running that SIG.

The day did not start well. OK, the night before did not start well, I was suffering from trying to put too much into the talk and I was up until 1am stripping things out. I finally got it into a state I felt I could present but it lacked a few diagrams I wanted. It would have to do.

I set of OK in the morning, just past 7am. However, I got stuck on the way, a crash on the A14 had bloked the road and I missed my chance to escape. Thus I was late. I managed to call ahead but several presenters were having trouble today.

I arrived during the morning coffee break and I was up next, so it was a quick job to set up the laptop, grab a coffee and get into the right frame of mind.

The talk went OK I think, could have been a little more clear, a little more polished, but we will see what the feedback says.  I was trying to get what I spend at least a whole day (and preferrably 2 or 3 days) running a course on into just under an hour.

After that and on top of the 4 hours in the car I was bushed. I hung around for lunch, had a couple of chats with fellow OUG people, including the deputy chair of the Mangement & Infrastructure SIG, Gordon Brown, which is good as we have our SIG in 2 weeks. However, I had seen Pete Finnigan’s talk at the Northern SIG, did not have much interest in the talk on Oracle Certification (sorry Joel, just not my bag) and so I did what I dislike other people doing, I left early.

A better journey back was had but then I had to get going on my latest assignment, some project management for an established client.

Hmm, this is all a bit “what I did today”. Not very interesting to other people I guess. Maybe I’ll edit it tomorrow.