jump to navigation

Testing Methodology – How to Test #2 January 26, 2010

Posted by mwidlake in performance, Testing.
Tags: ,
1 comment so far

<previous post …

Last post I (see link above) I said something about how I like timing how long things {usually SQL queries but you can apply this to more complex situations} take as it is simple. But I highlighted that it is an approach that misses a lot of information and also can catch you out if you do not appreciate that the first run of any test script is likely to be slow (due to parsing and data caching overheads which are reduced for subsequent itterations).

I want to continue on that theme now, of simple testing with just how long things take. And how to cope with variation.

Hands up all of you who have Oracle on a PC at home? Great, you can test what you want on a dedicated machine. No one else is running anything on your machine, no one else is using your storage, no one else is using your network (if you have a network at home) and no one is running the SQL statement from hell on your database.

BUT. You might have some download of the latest security patch set running in the background, your virus scanner may kick in (or you might just have Norton full-security-everything-running-on-your-cpu-all-the-time turned on). Your miles per gallon may vary, especially on Windows but also on Linux.

For many of us we are testing on a server at work, where developers are working, where the storage is shared, where the network is dealing with that lady behind you watching “Celebrity Strictly come roller-skate dancing impersonations” {well, she sat behind me for 4 months and she was in the Quality team, what does that tell you}. Bottom line, the performance of your server will vary, your testing needs to cope with that.

How do you cope with a varying workload on the system you are running tests on? Well, you run the new and old version of your test several times, in succession, interleaved. This is ideal when , in the very un-ideal situation of having to test on a Live situation. Testing on “Live” is wrong, in the same way that spelling “initialisation” with a Z is wrong, but it is often forced upon us.

As an example, I want to check into the questionable suggestion someone made that selecting count(*) from dba_tables where owner=user is faster than selecting count(*) from user_tables.

So, I knock up a quick sql*plus script that runs the two alternatives in quick succession, interleaved.

--testa.sql
set timi on pause off
spool testa.lst
--
prompt run 1
select count(*) from user_tables;
select count(*) from dba_tables where owner=user;
--
prompt run 2
select count(*) from user_tables;
select count(*) from dba_tables where owner=user;
--
prompt run 3
select count(*) from user_tables;
select count(*) from dba_tables where owner=user;
--
prompt run 4
select count(*) from user_tables;
select count(*) from dba_tables where owner=user;
--
prompt run 5
select count(*) from user_tables;
select count(*) from dba_tables where owner=user;
--
spool off
-- eof

Just a couple of things to mention. I always spool out to a file called the same thing as the script, with .lst at the end (which is the default file extension for spool commands but I like to state it) and I like to use “prompt” to record in the output what is going on. Here it is obvious, sometimes it is not. Oh, and I like to put –EOF at the end of my files, so I can spot cut-and-paste errors. I know, I am very “command line” centric. I love pictures too Doug. Finally, I should not run the two versions with the same one first each time, I should swap them over after every two runs (not each run as then you are running A,A,B,B,A,A,B,B. A more complex A,B,A,B,B,A,B,A is better). But the results look messy, so I didn’t.

Here is the output (trimmed a little, for the sake of brevity):-

run 1

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.58

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.73

run 2

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.28

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.10

run 3

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.28

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.10

run 4

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.28

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.10

run 5

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.29

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.09

At this point I might pull the results into Excel, as it is so good at drawing pretty graphs, but I think we can interpret the results “by eye” in this case.

Run    User Tabs   DBA_Tabs
1        0.58          0.73
2        0.28          0.10
3        0.28          0.10
4        0.28          0.10
5        0.29          0.09

AVG      0.28          0.10

My Average is ignoring the first run for each version of the code, as previously explained {parse overhead/warming caches}

You can see that version B is considerably faster than A. In this case, there is no overlap between the two result sets, by which I mean every run o version B is faster than every run of version A. Also, there is little variation in the performance of each version.

What if you got a result set like:-

Run    User Tabs   DBA_Tabs
1        0.77          0.68
2        0.35          0.25
3        0.42          0.34
4        0.48          0.38
5        0.48          0.30
6        0.41          0.29
7        0.40          0.28

AVG      0.42          0.31

If you look at the figures, both A and B slow down and then start to speed up. Something happened to slow down the system in the middle of the test and it was recovering but not back to “normal” by the end. If we had done 7 runs of A followed by 7 runs of B, B would have been hard hit by the temporary system slow-down {I opened a WORD document :-) }.

Also, there is some overlap in the results for A and B. The slowest result for B, 0.38, is slower than for the fastest result for A, 0.35. When the spread of times for test A and test B overlap like this, you need to do a few more tests to ensure the results are consistent. I did 7 but I should really have done 9 or maybe 15.
In any case, B is consistently 25% faster than A. It is Faster.

How about this result set:

Run    User Tabs   DBA_Tabs
1        0.87          0.71
2        0.41          0.42
3        0.47          0.43
4        0.37          0.42
5        0.39          0.34
6        0.51          0.38
7        0.42          0.40

AVG      0.43          0.40

The variation in the result sets is higher and if fact if you only looked at results for paired runs 2,3 and 4 you would come to the conclusion A was very slightly faster than B. In fact, B is slightly faster than A.

Or is it? There are only 6 tests {remember, ignore run 1}, the variation within the sets is high (.37 to .51 in A, .34 ot .42 in B) and the overall difference is low. You could run a statistical analysis on this, a “Student T” test I think, to see if the difference was significant. But unless I was looking at something utterly business critical at this point where the smallest fraction of improvement was important, I would not bother. I would be far better off looking for solution C or doing something else completely. If you ever find yourself doing scientific statistical analysis to decide which of two options is faster, it is probably time to go home and consider what else you could be doing to help your employer more…

Enough for tonight and HOPEFULLY the wordpress pixies will not eat the end of this post (yesterday’s effort was better than tonights I think…)

Testing Methodology – How To Test January 26, 2010

Posted by mwidlake in performance, Testing.
Tags: , ,
1 comment so far

<Previous Post…

On those rare but pleasant occasions when I find myself running a training course on performance, something I always want to press home is “How to Test”.

I believe you can learn everthing in every manual on oracle technology and internals and still be very poor at performance tuning. Similarly, I think you can be lacking an awful lot of knowledge and still be good at performance tuning. It comes down to how you test. Not just how to test, but how you as an individual design and run your tests.

I joke with whoever I am talking to that what you need most of all to test is a watch and someone calling out “start” and “Stop”. ie you need to be able to time things. It is a bit of a throw-away statement, but actually most of us will do exactly this on a regular basis. We will be working on something, run it and it will come back in a few seconds. Then we will tweak it and run it again and mentally {or in my case, just very quietly under my breath} count. We even discuss it with people like that “How long did it take that time?” Ohh, a couple of seconds faster than last time”.

I like tuning by timing very, very, very much.

Firstly, it is simple – If a SQL query runs faster, it runs faster. If you tune by looking at the explain plan and you see a full table scan being replace with a nested loop and an index look up, is it faster? It depends. If you tune by looking at the buffer gets and other statistics from the SGA (or wherever), if the “buffer gets” go down, is it faster? This depends again. If the disk reads went up, even by a realtively small amount, maybe not. If the memory usage went through the roof because of a sort that spills down to disc, well quite probably not. Obviously if all of buffer gets, disk reads and cpu usage went down, you can be pretty certain you are having a positive impact in speeding up the statement. But timing the statement from start to finish gives you a nice, simple answer on if it runs faster.

Secondly, it is simple. You do not have to look at plans, at runtime statistics, at deltas of session statistics, at trace files. All those things are good but you need more knowledge and experience to use them and time to set up, collect and study them. This is not to say you do not get a lot more out of tuning if you understand all of the stuff about trace files, explain plans etc, I hasten to add – but you do not need that to get going with performance tuning.

Thirdly, it is simple. You tell your manager’s manager that you use 47% less buffer gets, 12% less disk reads but 9% more CPU then they will ask you what all that means. You tell them that version A runs in 2 minutes and 9 seconds and version B in 34 seconds, they will understand. Better still, graph it and give them a power point…

Fourthly, it is simple. You can see that statement (a) is faster or slower than statement (b). Now you can look at the plan, at the statistics for the statement, at what wait events are occuring and start tying up book knowledge with real-world results.

Of course, timing with a watch is very crude. You have a very capable piece of computing power underlying that database of yours, so let’s get it to do the timing.

As a simple start, in SQL*Plus use “set timing on” (it can be abbreviated to “set timi on”). Most GUI SQL tools will either default to telling you the elapsed time or have an obvious option to switch on the functionality.

{Oh, I’ll just mention that if you cannot get timing in sql*plus or wherever to work you, might want to check what the initialisation parameter “TIMED_STATISTICS” is set to. Some ancient memory is telling me that if it is set to FALSE, you may not be able to use TIMING in SQL*Plus but another memory is telling me that stopped being true a while back, version 8 maybe. I tried setting TIMED_STATISTICS to false in my session on 10.2 but TIMING still worked, but then I left STATISTICS_LEVEL alone and that cause TIMED_STATISTICS to be set. It is so long ago that I worked on a system that had TIMED_STATISTICS set to false! Even a quick google did not throw up an immediate answer}.

DB10.2> select count(*) from sn
  2  /
any key> 

  COUNT(*)
----------
       116

1 row selected.

DB10.2> set timi on
DB10.2> select count(*) from sn
  2  /
any key> 

  COUNT(*)
----------
       116

1 row selected.

Elapsed: 00:00:01.09

There you go, the count of SN records, all 116, took 1.09 seconds.

Yes, I had “set pause on” and the timing includes how long it takes me to spot that the query has finished and press a key. We all do it. However, sometimes the need for user input it still missed {Something I see quite often is, for example, not pressing the button in PL/SQL developer to get ALL the rows for the SQL statement, rather than just the first screenful}. A key thing is to try and remove from testing all and any waiting for user response, as we humans are slow and irratic.

so SET PAUSE OFF.

Now you have TIMING on and pause off. Time to run something and see how long it takes, then try and speed it up and run again:

DB10.2> select count(*) from person where to_char(dob,'YYYY') = 1988
  2  /

  COUNT(*)
----------
        29
Elapsed: 00:00:01.15

DB10.2> create index per_dob on person(dob);

Index created.

Elapsed: 00:00:01.31
DB10.2> select count(*) from person where to_char(dob,'YYYY') = 1988;

  COUNT(*)
----------
        29
Elapsed: 00:00:00.14

There you go, I added an index and the query went down from 1.15 seconds to 0.14, that is 8 times faster. Timing is timing.
Well, no, and this is something you need to be careful of if you are new to tuning.

The second itteration is nearly always faster.

Why? Well, the first time you run a piece of SQL it has to be parsed for a start, and that takes some time. More importantly, the first time you select the data, it is going to be on disk. Oracle has read it in and put it into memory. The second time you query the same data, it will be found in memory. That {nearly always} makes the next access to the data a lot faster. The index was not being used as I had a function on the column in the WHERE clause and this stops the index from being used.

So having said I love testing by timing, you need to be cautious about one-off tests. Oh, and here below is proof that the index I created is making no real difference to the speed of the SQL query:

DB10.2> set autotrace on
DB10.2> select count(*) from person where to_char(dob,'YYYY') = 1988;

  COUNT(*)
----------
        29
Elapsed: 00:00:00.12

Execution Plan
-----------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost 
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     1 |     9 |   335
|   1 |  SORT AGGREGATE    |        |     1 |     9 |      
|*  2 |   TABLE ACCESS FULL| PERSON |    36 |   324 |   335
-----------------------------------------------------------

Statistics
----------------------------------------------------------
         20  recursive calls
         19  db block gets
        727  consistent gets
          0  physical reads
       1992  redo size

DB10.2> drop index per_dob
  2  /

Index dropped.

Elapsed: 00:00:00.03
DB10.2> select count(*) from person where to_char(dob,'YYYY') = 1988
  2  /

  COUNT(*)
----------
        29
Elapsed: 00:00:00.15

Execution Plan
----------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost
----------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     1 |     9 |  335
|   1 |  SORT AGGREGATE    |        |     1 |     9 |      
|*  2 |   TABLE ACCESS FULL| PERSON |    36 |   324 |  335
----------------------------------------------------------


Statistics
----------------------------------------------------------
        261  recursive calls
         19  db block gets
        841  consistent gets
          0  physical reads
       1992  redo size

We see 0.12 seconds goes to 0.14 seconds, exactly the same explain plan for both statements, a small increase in consistent gets, no physical gets by either statement (so the data is cached).
Why is the second statement a little slower and has a large increase in recursive calls and db block gets? Because I dropped the index and the statement had to be reparsed. Now, if you are new to tuning you would almost certainly not have appreciated what the recursive calls and DB block gets were all about, it could distract you from the question of “does it run faster”. It is certainly good to know all about that, but when you are starting off, you want to keep things simple and learn in stages.

What the above demonstrates, I hope, is that the second thing you run will have an advantage and could probably run faster even though, in reality, it is using more resource. And we tend to run old code first and new code second. So swap them over, give the old code the advantage of being run second.

Do not test things once, as you can easily be caught out. Test each version several times. And ignore the first run of each version. This is not perfect advice, code in production may well be being parsed and gathering data from disk, but unless you can allow for this in your testing, I think it is generally better, for simple testing, to run each version 6 times. Of the six runs, ignore the first run of each and average the results of the other 5. Which ever one runs faster on average is, well, fastest. IF the difference is significant.

Oh. Where is the SQL AUDIT in all this? Well, ponder on why am I generating REDO for a simple select…

Testing Methodology – Getting Community Information January 22, 2010

Posted by mwidlake in Testing, Uncategorized.
Tags: ,
4 comments

<Last post I tried to highlight that knowing specifically what you want to understand before you start testing is important and that it is beneficial to start with the booring official documentation. This is so that you can filter what is Out There in the web, as well as get a good grounding in what is officially possible.

There is no doubt that you can get a lot more information on Oracle features from the community than you can from the manuals and that it is often presented in a more easily understood and relevant way. But some of it is not right for your version or is just plain wrong. I can do no better than to refer to this recent posting by Jonathan Lewis on the topic. In this case, it is the persistance of old and now irrelevant information, perpetuated by the kind but flawed habit people have of pulling information together – but not testing it.

Treat everything on the net with caution if it has no proof. Even then be cautious :-) – I know I have “proved” a couple of things that turned out to be misunderstandings…

You can of course post to forums to ask for information and help, and people will often come to your aid. But you are often not really understanding the subject by doing that, you are just learning facts. It’s like learning the dates of the two World Wars and not understanding why the wars occured. I would point you to this posting by Lisa Dobson from a couple of years back, which makes the point about what you stand to learn, and also says something about how to ask for help. If it helps you to do the “right thing”, think of it selfishly. You will get better and more specific help if you have already looked into your issue and can ask specific questions. Also, if you properly learn something rather than just get a couple of facts about it, you are less likely to look foolish when you repeat those facts in a context they are not valid in. And you will.

Sorry, side tracked a little into one of my pet annoyances. Back to topic.

I want to know about SQL AUDIT. So I google SQL AUDIT ORACLE. Lots of hits. Let’s start filtering. First off, anything by “experts-exchange” and the other pay-for forums can forget it. Sorry guys but if I am going to spend money I’ll buy a manual. I will also ignore any stuff by sites that constantly suggest you go on training courses on boats with them or are “Team America” {everyone seen the film by Trey Parker? Puts the claim “We are the American Team” into a different light…}.

Now I go and look at the sites and I try making my search more intelligent, maybe adding in the word DBA_AUDIT_TRAIL and the like. I try and think what words and phrases someone explaining the topic would use which would not be common for other topics. That is why I often use database object names and commands in my searches.

In the case of SQL AUDIT, there are plenty of sites that generally pull together the basics from the manuals and tell you how to audit a few things and see the results and give some nice examples. It gets you further forward, but it’s mostly not very detailed. Just cookery instructions on what to do but not how it works. Thankfully, there is an excellent article by one of the experts in the field, Pete Finnigan.
Maybe I could have replaced this post by simply suggesting you just go to Pete’s web site, read what is there and follow the link from it to the ones he likes. It would work for the topic of SQL AUDIT. However, although it is always good to go to the sites of people you know and trust, it is always worth doing a google and going beyond the first page or two of results. Those first pages are mostly for a handful of very popular sites and sources. A new article or someone who is relatively unknown but has the answers you need may be further down the list, like pages 3 or 4. It is worth 5 mins just checking.

However, you are not likely to find everything you need. Certainly with SQL AUDIT you won’t find much about performance impact other than bland and not-very-helpful generic warnings that “Use of SQL AUDIT can carry a significant performance overhead”. How much overhead? for what sort of audit? And how do you know? In fact, when searching I found this, admittedly old, comment by Pete about there being relatively little “out there” about performance impact of Auditing. That made me smile, as that information was exactly what I was after.

*sigh*. The information I want does not seem to be out there. I need to do some testing.

That is for the next post (“at LAST”, I hear you all cry).

Testing Methodolgy – The Groundwork January 20, 2010

Posted by mwidlake in Perceptions, Testing, Uncategorized.
Tags: , ,
add a comment

<Previous PostNext Post …>

I want to start learning about using SQL AUDIT, as I mentioned a couple of days ago.

First question. What do I want to know about SQL AUDIT? I might just have thought “well, I do not know much about this area and I think I should – so I want to know everything”. That is OK, but personally I find it helps to make up a specific aim. Otherwise you can just flounder about {well, I do}. In this case I have decided I want to know:

  1. The options for auditing who is selecting key tables.
  2. How to audit when those key tables get changed.
  3. The impact on performance of that audit, and if it could be an issue.
  4. (4) follows on from (3), in that I want to find out how to control that performance impact.

For any of you who have been or are code developers, you hopefully appreciate test-driven coding. That is, you decide up front what the new code must achieve and design tests to ensure the code does it. You do not write any code until you have at least thought of the tests and written them down in a document. {Ideally you have written the tests and built some test data before you start, but then in an ideal world you would get paid more, have more holidays and the newspapers would tell the truth rather than sensational rubbish, but there you go :-) }

I do not think that learning stuff/testing as much different from developing code, thus the list above. I now know what I want to understand.

What next? I’m going to go and check the Oracle Documentation for the feature. And I am very careful to check the documentation for the version I will use. This is 10.2 for me. I know, the Oracle documentation can be pretty dry, it can be rather unhelpful and it is not very good at examples. But you can expect it to be 90% accurate in what it tells you. You can also expect it to be not-very-forthcoming about the issues, gotchas and bits that don’t work. {I have this pet theory that the official documentation only mentions how things do not work once a feature has been out for a version and people have complained that the documentation should let on about the limitations}.

So, for SQL AUDIT I suggest you go and read:

  • Concepts Manual, chapter 20 Database Security. If I am not rushed I would read the whole chapter, I might realise that what I want to do is better done with some other tool (If I wanted to see who had changed records months down the line, I think I would pick up that database triggers were a better bet, for example).
  • SQL Reference, chapter 13 , the section on AUDIT (no surprises there). I do not do much more than read through the SQL manual once though, as frankly I find it pretty useless for explaining stuff, but it puts into your head what the parts of the command there are and pointers to other parts of the documentation. I’ll read the concepts manual with more attention. In this case, the manual will lead me to:
  • Database Security Guide chapter 8. Which is pretty good, actually.
  • My next source of information, may not immediately spring to mind but I find it very valuable, is to find out which data dictionary objects are involved in the feature. In this case, the previous sources would lead me to go to the Database Reference and check out:
  • DBA_AUDIT_TRAIL, DBA_OBJ_AUDIT_OPTS, DBA_PRIV_AUDIT_OPTS, DBA_STMT_AUDIT_OPTS. And of course, SYS.AUD$. {I actually queried DBA_OBJECTS for anything with the word “AUDIT” in it, check out all the tables and also had a quick peak at the text for any views, which would have led me to SYS.AUD$ if I did not already know about it}.

Why do I go and look at the data dictionary objects and the reference guide? After all, is it not nerdy enough to have put myself through reading the SQL Reference manual? {and let us be honest, it is rarely enjoyable reading the SQL Reference manual}. Because  I want to know how it works, not just how to use it. Seeing the table give me a lot of information and the description of the columns may well tell me a lot more. First thing, SYS.AUD$ only has one index, on a column SESSIONID# (I know, there is another column in the index, I need to get to a DB to check this). Any queries not via this index are going to scan the whole damned thing. Right off, I suspect this will be an issue.

I will DESC the tables, see if there is already any information in them. In this case, there was not. A clean sheet.

Why did I not just go to Google (Bing, Yahoo, whatever) and search? Because I can trust the manuals to tell me stuff which is mostly true and is valid for my release. Not stuff about older versions, about later versions, urban myths or just outright fallacies. The manuals certainly will not tell me everything, far from it, but what it does say is solid stuff. With this reliable start I can now go to other sources and have some chance of filtering  all the other stuff that is Out There. Once filtered, it is probably a lot more informative and easier to digest than the manuals.  

I’ll ramble on about that next posting.

Testing Methodology January 19, 2010

Posted by mwidlake in Testing.
Tags:
3 comments

This is my 101st posting and {almost} my start for 2010. I was going to kick off 2010 on something I knew well, so I could look all smart – specifically, handling stats for big databases. But I have decided to kick off on something I know very poorly. Security. Namely, use of SQL AUDIT. Why? Why blog about something I know little about?

Because I want to blog about the process of finding out about stuff and testing that stuff. It’s actually going to be a thread about testing – the SQL AUDIT is just an excuse {and matches my work life, which is jolly important when time is limited and you need to keep the employer happy}. I started out my whole blog (before almost anyone paid any attention to my meanderings) on what you could find out just from doing a “select count(*)” from a table. You can find out a lot from simple beginings.

How do you test something? Well, that question is actually a step or two down the line from “how do I understand something”. Sorry, I was trained as a scientist at college and you get sold this line as a trainee scientist about Baconian Science whereby you record all the data about something and then come to conclusions. It does not work really. You need an idea first and then you need to test that idea.

So, in the case of IT, you come across something (the initial idea) and you want to know how it works. It is usually a feature of the software new to you. With software (and the Oracle Database is really just a huge pile of software) there is a source of starter information and it is called the “documentation”. If I was a Baconian scientist I would just use the feature and record everything I could think of about it and record what I discovered. But this is software and some human (or humans) designed it and implemented it. And documented it.

So, First thing, don’t google it. Go and read the official manual on it. It is not going to tell you everything (it is, after all, written by people desperate to put it in the best light possible – and yet leave space of consultancy and training dollars to be earned) and remember that the manual can be wrong, but it is likely to be more accurate than 80% of what you come across on web searches. {If I was a tabloid newspaper I would go and ask a bunch of people who knew little about the subject what they thought and then publish an article on it, emphasising any shocking or outrahgeous parts. Remember that when you Google or Bing or Yahoo anything, Anything. }.

Having read (and I would suggest you start with) the Concepts Manual on the feature and then any specific manual covering the feature, now do your web search and…
(a) look for sources you have grown to trust. Not popular sources necessarily, sources where you have rarely found them wrong or they publish examples and worked proofs of their utterences.
(b) look for repeated claims about that feature and keep them in mind. They could be urban myths, they could be accurate. Just keep them in mind.

Now go and do some tests. Yourself. Otherwise, you just never know…

And this is the nub, I think, of testing. Get some basic information about how it should work, throw in there your experience to date (eg, this AUDIT stuff is putting data into an oracle table, so it is inserting and updating rows in a table, it should react like data going into a normal table. What do I know already about insert and updates to tables?) and ask yourself a few simple questions:-
– How do I think this should work?
– How could this go wrong?
– What scenarios can I think of for using this?
– What are the simplest examples I can think of for this process?
– How can I reduce tests down to trying only this feature or even one aspect of this feature?
– How could this go wrong?
Yep, I ask that last question twice and it is no oversight. I want to try and understand where this new feature might break. I need to ask the “ahh but” questions.

Testing is Not Just for Code. September 16, 2009

Posted by mwidlake in Architecture, VLDB.
Tags: , , ,
7 comments

Someone I am currently working with has a wonderful tag line in her emails:

Next time we want to release untested, why don’t we just release undeveloped?

Testing is not limited to testing code of course. I have recently posted about how a backup is not a backup until you have tested it with a practice recovery.  How you think the database will work by looking at the data dictionary is just a nice theory until you run some actual tests to see how the database responds, as I have been doing with Histograms lately. Sadly, you could even say an Oracle feature is not an Oracle feature until you have tested it.

In my experience, this is particularly true when you test the edges of Oracle, when you are working on VLDBs {Very Large DataBases}.

Last month Jonathan Lewis posted about a 2TB ASM disc size bug, where if you allocated a disc over 2TB to ASM, it would fill it up, wrap around and write over the begining of the file. This week I heard from some past colleagues of mine that they hit this very same bug.
With these very same colleagues we hit a big in 10.1 where you could not back up a tablespace over 8TB in size with RMAN {I can’t give you a bug number for it as we were working with HP/Oracle direct at the time and they “handled it internally”, But when I mentioned it to him, Jonathan found a similar one, bug 5448714 , which stated a 4TB limit on backups. It could be the same bug}.

Yet another VLDB issue was we wanted to move just under one thousand tablespaces from one database to another {again, 10.1}, using transportable tablespaces. We tried to use the utility for checking you are working on a consistent set of tablespaces, but it could not cope with that many. But to plug them into the new tablespace you have to export the metadata and we found a 4000 character limit on the variable stating the tablespaces to transport. That’s 2.3 characters per tablespace, as you need comas to delimit them…Yes, you could manage if you renamed all tablespaces to AA, AB, AC…BA.,BB, BC etc.  If memory servers, the problem was with data pump export and we reverted to old style export which did not have the problem.

Another limit I’ve blogged on is that the automated stats job chokes on very large objects.

Some Data dictionary views can become very slow if you have several tens of thousands of tables/extents/tablespace/indexes

I can appreciate the issues and problems Oracle has with testing their code base, it is vast and people use the software in odd ways and it has to run on many platforms. You might also feel I am being picky by saying Oracle breaks a little when you have 8TB tablespaces or a thousand tablespaces. But

  • Oracle will say in big, glossy presentations, you can build Petabyte and Exabyte databases with Oracle {and have a product called Exadata, don’t forget}.
  • More and more customers are reaching these sizes as data continues to grow, for many site, faster than mores law.
  • Some of these limits appear with databases well below a Petabyte (say a tiddly small 50TB one :-) ).

I’ve been running into these issues with VLDBs since Oracle 7 and they are often with pretty fundamental parts of the system, like creating and backing up tablespaces! I think it is poor show that it is so obvious that Oracle has been weak in testing with VLDB-sized database before release. 

I wonder whether, with 11gR2, Oracle actually tested some petabyte data sizes to see if it all works? After all, as is often said, disk is cheap now, I’m sure they could knock one up quite quickly…

Follow

Get every new post delivered to your Inbox.

Join 156 other followers