jump to navigation

With Modern Storage the Oracle Buffer Cache is Not So Important. May 27, 2015

Posted by mwidlake in Architecture, Hardware, performance.
Tags: , , , ,
11 comments

With Oracle’s move towards engineered systems we all know that “more” is being done down at the storage layer and modern storage arrays have hundreds of spindles and massive caches. Does it really matter if data is kept in the Database Buffer Cache anymore?

Yes. Yes it does.

Time for a cool beer

Time for a cool beer

With much larger data sets and the still-real issue of less disk spindles per GB of data, the Oracle database buffer cache is not so important as it was. It is even more important.

I could give you some figures but let’s put this in a context most of us can easily understand.

You are sitting in the living room and you want a beer. You are the oracle database, the beer is the block you want. Going to the fridge in the kitchen to get your beer is like you going to the Buffer Cache to get your block.

It takes 5 seconds to get to the fridge, 2 seconds to pop it open with the always-to-hand bottle opener and 5 seconds to get back to your chair. 12 seconds in total. Ahhhhh, beer!!!!

But – what if there is no beer in the fridge? The block is not in the cache. So now you have to get your car keys, open the garage, get the car out and drive to the shop to get your beer. And then come back, pop the beer in the fridge for half an hour and now you can drink it. That is like going to storage to get your block. It is that much slower.

It is only that much slower if you live 6 hours drive from your beer shop. Think taking the scenic route from New York to Washington DC.

The difference in speed really is that large. If your data happens to be in the memory cache in the storage array, that’s like the beer already being in a fridge – in that shop 6 hours away. Your storage is SSD-based? OK, you’ve moved house to Philadelphia, 2 hours closer.

Let's go get beer from the shop

Let’s go get beer from the shop

To back this up, some rough (and I mean really rough) figures. Access time to memory is measured in Microseconds (“us” – millionths of a second) to hundreds of Nanoseconds (“ns” – billionths of a second). Somewhere around 500ns seems to be an acceptable figure. Access to disc storage is more like Milliseconds (“ms” – thousandths of a second). Go check an AWR report or statspack or OEM or whatever you use, you will see that db file scattered reads are anywhere from low teens to say 2 or 3 ms, depending on what your storage and network is. For most sites, that speed has hardly altered in years as, though hard discs get bigger, they have not got much faster – and often you end up with fewer spindles holding your data as you get allocated space not spindles from storage (and the total sustainable speed of hard disc storage is limited to the total speed of all the spindles involved). Oh, the storage guys tell you that your data is spread over all those spindles? So is the data for every system then, you have maximum contention.

However, memory speed has increased over that time, and so has CPU speed (though CPU speed has really stopped improving now, it is more down to More CPUs).

Even allowing for latching and pinning and messing around, accessing a block in memory is going to be at the very least 1,000 times faster than going to disc, maybe 10,000 times. Sticking to a conservative 2,000 times faster for memory than disc , that 12 seconds trip to the fridge equates to 24,000 seconds driving. That’s 6.66 hours.

This is why you want to avoid physical IO in your database if you possibly can. You want to maximise the use of the database buffer cache as much as you can, even with all the new Exadata-like tricks. If you can’t keep all your working data in memory, in the database buffer cache (or in-memory or use the results cache) then you will have to do that achingly slow physical IO and then the intelligence-at-the-hardware comes into it’s own, true Data Warehouse territory.

So the take-home message is – avoid physical IO, design your database and apps to keep as much as you can in the database buffer cache. That way your beer is always to hand.

Cheers.

Update. Kevin Fries commented to mention this wonderful little latency table. Thanks Kevin.

“Here’s something I’ve used before in a presentation. It’s from Brendan Gregg’s book – Systems Performance: Enterprise and the Cloud”

An Oracle Instance is Like An Upmarket Restaurant January 28, 2015

Posted by mwidlake in Architecture.
Tags: ,
11 comments

I recently did an Introduction to Oracle presentation, describing how the oracle instance worked – technically, but from a very high level. In it I used the analogy of a restaurant, which I was quite happy with. I am now looking at converting that talk into a set of short articles and it struck me that the restaurant analogy is rather good!

Here is a slide from the talk:

Simple partial overview of an Oracle Instance

Simple partial overview of an Oracle Instance

As a user of the oracle instance, you are the little, red blob at the bottom left. You (well, your process, be it SQL*Plus, SQL*Developer, a Java app or whatever) do nothing to the database directly. It is all done for you by the Oracle Sever Process – and this is your waiter.

Now, the waiter may wait on many tables (Multi-threaded server) but this is a very posh restaurant, you get your own waiter.

You ask the waiter for food and the waiter goes off and asks the restaurant to provide it. There are many people working in the restaurant, most of them doing specific jobs and they go off and do whatever they do. You, the customer, have no idea who they are or what they do and you don’t really care. You don’t see most of them. You just wait for your food (your SQL results) to turn up. And this is exactly how an Oracle Instance works. Lots of specific processes carry out their own tasks but they are coordinated and the do the job without most of us having much of an idea what each bit does. Finally, some of the food is ready and the waiter delivers the starter to you – The server process brings you the first rows of data.

Let’s expand the analogy a bit, see how far we can take it.

When you arrived at the restaurant, the Maître d’ greets you and allocates you to your waiter. This is like the Listener process waiting for connection requests and allocating you a server process. The Listener Process listens on a particular port, which is the front door to the restaurant. When you log onto an oracle database your session is created, ie your table is laid. If someone has only just logged off the database their session might get partially cleared and re-used for you (you can see this as the SID may well get re-used), as creating a session is a large task for the database. If someone had just left the restaurant that table may have a quick brush down and the cutlery refreshed, but the table cloth, candle and silly flower in a vase stay. Completely striping a table and relaying it takes more time.

The restaurant occupies a part of the building, the database occupies part of the server. Other things go in the server, the restaurant is in a hotel.

The PMON process is the restaurant manager or Head of House maybe and SMON is the kitchen manager, keeping an eye on the processes/staff and parts of the restaurant they are responsible for. To be candid, I don’t really know what PMON and SMON do in detail and I have no real idea how you run a large kitchen.

There are lots of other processes, these are equivalent to the Sous-chef, Saucier, commis-chef, Plonger (washes up, the ARC processes maybe?), Ritisseur, Poissonier, Patissier etc. They just do stuff, let’s not worry about the details, we just know there are lots of them making it all happen and we the customer or end user never see them.

The PGA is the table area in the restaurant, where all the dishes are arranged and provided to each customer? That does not quite work as the waiter does not sit at our table and feed us.

The SGA is the kitchen, where the ingredients are gathered together and converted into the dishes – the data blocks are gathered in the block buffer cache and processed. The Block Buffer Cache are the tables and kitchen surfaces, where all the ingredients sit. The Library cache is, yes, the recipes. They keep getting re-used as our kitchen only does certain recipes, it’s a database with a set of standard queries. It’s when some fool orders off-menu that it all goes to pot.

Food is kept in the larder and fridges – the tablespaces on disc. You do not prepare the dishes in the larder or fridge, let alone eat food out of them (well, some of the oracle process might nick the odd piece of cooked chicken or chocolate). everything is brought into the kitchen {the SGA} and processed there, on the kitchen tables.

The orders for food are the requests for change – the redo deltas. Nothing is considered ordered until it is on that board in the kitchen, that is the vital information. All the orders are preserved (so you know what was ordered, you can do the accounts and you can re-stock). The archived redo. You don’t have to keep this information but if you don’t, it’s a lot harder to run the restaurant and you can’t find out what was ordered last night.

The SCN is the clock on the wall and all orders get the time they were place on them, so people get their food prepared in order.

When you alter the ingredients, eg grate some of the Parmesan cheese into a sauce, the rest of the cheese (which, being an ingredient is in the SGA) is not put back into the fridge immediately, ie put back into storage. It will probably be used again soon. That’ll push it up the LRU list. Eventually someone will put it back, probably the Garçon de cuisine (the kitchen boy). A big restaurant will gave more then one Garçon de cuisine, all with DBW1 to x written on the back of their whites, and they take the ingredients back to the larder or kitchen when they get around to it – or are ordered to do so by one of the chefs.

Can we pull in the idea of RAC? I think we can. We can think of it as a large hotel complex which will have several restaurants, or at least places to eat. They have their own kitchens but the food is all stored in the central store rooms of the hotel complex. I can’t think what can be an analogy of block pinging as only a badly designed or run restautant would for example only have one block of Parmesan cheese – oh, maybe it IS a lot like some of the RAC implementations I have seen :-)

What is the Sommelier (wine waiter) in all of this? Suggestions on a post card please.

Does anyone have any enhancements to my analogy?

How do you Explain Oracle in 50 Minutes? December 2, 2014

Posted by mwidlake in Architecture, conference, UKOUG.
Tags: , ,
10 comments

I’ve done a very “brave”* thing. I’ve put forward a talk to this year’s UKOUG Tech14 conference titled “How Oracle Works – in under 50 minutes”. Yes, I really was suggesting I could explain to people how the core of Oracle functions in that time. Not only that, but the talk is aimed at those new to Oracle technology. And it got accepted, so I have to present it. I can’t complain about that too much, I was on the paper selection committee…

* – “brave”, of course, means “stupid” in this context.

As a result I am now strapped to the chair in front of my desk, preparing an attempt to explain the overall structure of an Oracle instance, how data moves in out of storage, how ACID works and a few other things. Writing this blog is just avoidance behaviour on my part as I delay going back to it.

Is it possible? I’m convinced it is.

If you ignore all the additional bits, the things that not all sites use, such as Partitioning, RAC, Resource Manager, Materialized Views etc, etc, etc, then that removes a lot. And if not everyone uses it, then it is not core.
There is no need or intention on my part to talk about details of the core – for example, how the Cost Based aspect of the optimizer works, Oracle permissions or the steps needed for instance recovery. We all use those but the details are ignored by some people for their whole career {not usually people who I would deem competent, despite them holding down jobs as Oracle technicians, but they do}.

You are left with a relatively small set of things going on. Don’t get me wrong, it is still a lot of stuff to talk about and is almost certainly too much for someone to fully take in and digest in the time I have. I’m going to have to present this material as if I am possessed. But my intention is to describe a whole picture that makes sense and will allow people to understand the flow. Then, when they see presentations on aspects of it later in the conference, there is more chance it will stick. I find I need to be taught something 3 or 4 times. The first time simply opens my mind to the general idea, the second time I retain some of the details and the third or forth time I start integrating it into what I already new.

My challenge is to say enough so that it makes sense and *no more*. I have developed a very bad habit of trying to cram too much into a presentation and of course this is a real danger here. I’m trying to make it all visual. There will be slides of text, but they are more for if you want to download the talk after the conference. However, drawing pictures takes much, much, much longer than banging down a half dozen bullet points.

One glimmer in the dark is that there is a coffee break after my session. I can go right up to the wire and then take questions after I officially stop, if I am not wrestled to the ground and thrown out the room.

If anyone has any suggestions or comments about what I should or should not include, I’d love to hear them.

This is all part of my intention to provide more conference content for those new to Oracle. As such, this “overview” talk is at the start of the first day of the main conference, 10am Monday. I have to thank my fellow content organisers for allowing me to stick it in where I wanted it. If you are coming to the conference and don’t know much Oracle yet – then I am amazed you read my blog (or any other blog other than maybe AskTom). But if you have colleagues or friends coming who are still relatively new to the tech, tell them to look out for my talk. I really hope it will help them get that initial understanding.

I had hoped to create a fully fledged thread of intro talks running through all of Monday and Tuesday, but I brought the idea up too late. We really needed to promote the idea at the call for papers and then maybe sources a couple of talk. However, using the talks that were accepted we did manage to get a good stab at a flow of intro talks through Monday. I would suggest:

  • 08:50 – Welcome and Introduction
    • Get there in time for the intro if you can, as if you are newish to the tech you are probably newish to a conference
  • 09:00 RMAN the basics, by Michael Abbey.
    • If you are a DBA type, backup/recovery is your number one concern.
  • 10:00 – How Oracle Works in 50 Minutes
    • I think I have said enough!
  • 11:30 – All about Joins by Tony Hasler
    • Top presenter, always good content
  • 12:30 – Lunch. Go and talk to people, lots of people, find some people you might like to talk with again. *don’t stalk anyone*
  • 13:20 – Go to the Oracle Keynote.
    • Personally, I hate whole-audience keynotes, I am sick of being told every year how “there has never been a better time to invest in oracle technology” – but this one is short and after it there is a panel discussion by technical experts.
  • 14:30 is a bit tricky. Tim Hall on Analytical Functions is maybe a bit advanced, but Tim is a brilliant teacher and it is an intro to the subject. Failing that, I’d suggest the Oracle Enterprise Manager round table hosted by Dev Nayak as Database-centric oracle people should know OEM.
  • 16:00 – Again a bit tricky for someone new but I’d plump for The role of Privileges and Roles in Oracle 12C by Carl Dudley. He lectures (lectured?) in database technology and knows his stuff, but this is a New Feature talk…
  • 17:00 – Tuning by Explain Plan by Arian Stijf
    • This is a step-by-step guide to understanding the most common tool used for performance tuning
  • 17:50 onwards – go to the exhibition drinks, the community drinks and just make friends. One of the best thing to come out of conferences is meeting people and swapping stories.

I better get back to drawing pictures. Each one takes me a day and I need about 8 of them. Whoops!

Friday Philosophy – The Importance of Context November 23, 2012

Posted by mwidlake in Friday Philosophy, Perceptions.
Tags: , , , ,
1 comment so far

A couple of weeks ago I was making my way through the office. As I came towards the end of the large, open-plan room I became aware that there was someone following behind me so, on passing through the door I held it briefly for the person behind me {there was no where else they could be going}, turned left and through the next door – and again held it and this time looked behind me to see if the person was still going the same way as I. The lady behind gave me the strangest look.

The strange look was reasonable – the door I’d just held for her was the one into the gentleman’s bathroom. *sigh*

I was doing the correct thing, I was attempting to be helpful to a fellow person, I was in fact being very polite. But because I had utterly failed to consider the context, there is now a lady who works on the same floor as I who considers me, at best, as strange. At worst she thinks I am very strange – and more than a little creepy. I fear the latter given her reaction when she saw me in the kitchen area recently and turned around. {By the way, if anyone can think of a good way I can clear this up I’d appreciate it. After all, I can’t exactly go up to her and say “sorry about holding the door to the gents for you the other day, I did not realise you were a woman”}.

My point is that you can do what you believe is the right thing but, because you are not thinking of the context or are unaware of the full situation, you end up giving utterly the wrong impression. I had a work situation like this a while back.

Without going into too much detail, I was working with a client on a data warehouse project. The Oracle database bulk-processed large quantities of data, did classic big-data queries and was sitting on some fairly expensive hardware with dedicated storage and the intention of implementing Dataguard. One of the issues they had was with a subsidiary part of the system that created a very large number of small transactions, lots of updates. High volume OLTP on a DW setup. It was hammering the storage and eating up all the available IO. The data for this subsidiary system was transient, no need to protect it.
I realised that the hardware was not correct for this subsidiary system and it needed no archived redo. Archiving redo is an all or nothing situation for an Oracle instance. So happy that I had worked out what to do I proposed {with a smile} moving the subsidiary system to it’s own database on it’s own hardware.
When I said this to the client, their response was a stony look and the comment “We’ve just spent a fortune on this platform……”. Having dug my hole I proceeded to jump right in there “It’s OK, what I am proposing is only about 5, 7 thousand pounds of kit – nothing compared to what you spent already!”. The client now got very, very annoyed indeed.

You see, the context is that they had been sold a system that was very expensive – it was to do a demanding job. They had been getting poor performance with the system and that is partly why I was there. They also did not really understand the technical nuances well (at least, not the chaps I was talking to) and they did not appreciate why I said what I did. From their perspective, this smiling loon was suggesting that a system costing 2-3% of what they had spent on their data warehouse platform was going to be able to do the processing that the expensive system could not. Either they had spent waaay too much, this new “expert” was an idiot or else I was lying to them. And they did not like any of those options.

Looking back it is clear I should have been more aware of how they would receive what I said. I’ve done this before {several times}, bounded into a situation like a wide-eyed puppy and gone “Look! We can just do that!” without considering things like upsetting the guy who had suggested the original solution, or making the on-site expert look stupid or blowing away a salesman’s pitch. Or that I have missed a glaring and valid reason why they can’t “just do that”.

I suspect that a few people would say “no, you just tell them the way it is and if they don’t like it or you upset someone then tough”. Well, maybe, but not if you want to be there to help fix the next problem. Also, I know I am not great at appreciating the context sometimes. That is part of why I will never run a company or be a senior manager, I lack the skills to judge the impact of what I propose or say sometimes, in my rush to be helpful. I am slowly learning to just hold back on ideas though and to run things past friends or colleagues with more “whole picture” skills first though. I might be rubbish at it but I can learn I am rubbish at it.

In the case of the situation above, the expensive system was correct for what they wanted to do – and maybe not quite expensive enough. I was suggesting a slightly unusual fix for a specific problem and I should have been more laboured in explaining the problem and more circumspect in leading them to the solution. I should have taken more time.

I should have checked who was following me and where I was going before I held the door open.

DBMS_APPLICATION_INFO for Instrumentation September 3, 2012

Posted by mwidlake in development, Instrumentation, performance.
Tags: , , , ,
24 comments

I just wanted to put up a post about DBMS_APPLICATION_INFO. This is a fantastic little built-in PL/SQL package that Oracle has provided since Oracle 8 to allow you to instrument your code. i.e record what it is doing. I’m a big fan of DBMS_APPLICATION_INFO and have used it several times to help identify where in an application time is being spent and how that pattern of time has altered.

Some PL/SQL developers use it and some don’t. It seems to me that it’s use comes down to where you work, as most PL/SQL developers are aware of it – but not everyone uses it (a friend of mine made the comment recently that “all good PL/SQL developers use it“. I can understand his point but don’t 100% agree).

It is incredibly easy to use. You use the procedures SET_MODULE(module_name,action_name), SET_ACTION(action_name) and SET_CLIENT_INFO(client_info) to set the values of the corresponding columns in V$SESSION for your current session. This is done in memory, there is no inserting or updating of any table rows, so it is incredibly quick and light-weight. The below shows a subset of V$SESSION including the columns that get set:

desc v$session
Name Null? Type
----------------------------------------- -------- ----------------
SADDR RAW(8)
SID NUMBER
SERIAL# NUMBER
AUDSID NUMBER
PADDR RAW(8)
USER# NUMBER
USERNAME VARCHAR2(30)
COMMAND NUMBER
...
MODULE VARCHAR2(64)
MODULE_HASH NUMBER
ACTION VARCHAR2(64)
ACTION_HASH NUMBER
CLIENT_INFO VARCHAR2(64)
...

Just something odd to note. MODULE, ACTION and CLIENT_INFO are shown at VC(64) but if you check the documentation you will see that:

MODULE is limited to 48 bytes
ACTION is limited to 32 bytes
CLIENT_INFO can be set to the whole 64 bytes

I have no idea why MODULE and ACTION are limited in this way but it might be historic, backward compatibility with prior versions.

As a quick example of their use:

test_11_2> --demo_dai.sql
test_11_2> -- quick demo of dbms_application_info
test_11_2> COL sid form 9999
test_11_2> col username form a12
test_11_2> col module form a18
test_11_2> col action form a22
test_11_2> col client_info form a15
test_11_2> -- set module and action
test_11_2> EXEC DBMS_APPLICATION_INFO.SET_MODULE(MODULE_NAME=>'SALES_LOAD',ACTION_NAME=>'VALIDATE_ALL_PRICES')
test_11_2> select sid,username,module,action,client_info from v$session where sid=650;

SID USERNAME MODULE ACTION CLIENT_INFO
----- ------------ ------------------ ---------------------- ---------------
650 ERIC SALES_LOAD VALIDATE_ALL_PRICES
test_11_2> --
test_11_2> -- Update the action within a module
test_11_2> EXEC DBMS_APPLICATION_INFO.SET_ACTION(ACTION_NAME=>'Update all Prices')
test_11_2> select sid,username,module,action,client_info from v$session where sid=650;

SID USERNAME MODULE ACTION CLIENT_INFO
----- ------------ ------------------ ---------------------- ---------------
650 ERIC SALES_LOAD Update all Prices

test_11_2> -- clear them
test_11_2> EXEC DBMS_APPLICATION_INFO.SET_MODULE(null,null)
test_11_2> select sid,username,module,action,client_info from v$session where sid=650;

SID USERNAME MODULE ACTION CLIENT_INFO
----- ------------ ------------------ ---------------------- ---------------
650 ERIC

test_11_2> -- you can set just an action
test_11_2> EXEC DBMS_APPLICATION_INFO.SET_action(ACTION_NAME=>'orphan process')
test_11_2> select sid,username,module,action,client_info from v$session where sid=650;

SID USERNAME MODULE ACTION CLIENT_INFO
----- ------------ ------------------ ---------------------- ---------------
650 ERIC orphan process

test_11_2> -- set them to something sensible
test_11_2> EXEC DBMS_APPLICATION_INFO.SET_MODULE(MODULE_NAME=>'RETURNS_LOAD',ACTION_NAME=>'MATCH_TO_SALE')
test_11_2> select sid,username,module,action,client_info from v$session where sid=650;

SID USERNAME MODULE ACTION CLIENT_INFO
----- ------------ ------------------ ---------------------- ---------------
650 ERIC RETURNS_LOAD MATCH_TO_SALE

test_11_2> -- client_info can be longer
test_11_2> EXEC DBMS_APPLICATION_INFO.SET_client_info(client_info=>'entered this module at '||sysdate)
test_11_2> select sid,username,module,action,client_info from v$session where sid=650;

SID USERNAME MODULE ACTION CLIENT_INFO
----- ------------ --------------- --------------- ----------------------------------------
650 ERIC RETURNS_LOAD MATCH_TO_SALE entered this module at 03-SEP-2012 13:07

The intention is that you set the MODULE as you enter a functional chunk of your application (be it a PL/SQL package or a chunk of JAVA, whatever {but be aware of the stateless nature of systems using middle tiers and pooled connections}) and then update the ACTION as you progress. You can set an action with no module, as demonstrated above, but I would recommend against it. On that topic I would make the following recommendations:

  • Set the MODULE at the start, eg as you enter a PL/SQL package’s main module
  • Set the MODULE before you do all the validation – you may wish to see if that validation is as quick as you think.
  • Use SET_ACTION to update the action prudently. ie when the activity of the code alters but not all the time. It will come clearer with use, but what you are after is to be able to judge the relative run durations of parts of the application. Too much detail can mask the overall picture. If you can alter code you can always add more instrumentation.
  • Clear the MODULE and ACTION explicitly as you exit a package. The activity is lightweight and you do not want any code that lacks this instrumentation to inherit values from calling code.
  • Do not set an ACTION without having set the MODULE earlier in the code.
  • Keep the text of these values short and specific.

You do not have to look at V$SESSION to see these values, which is good as you may lack the privs needed to do so. Since oracle 9 there have also been procedures to check what the values are set to. e.g:

test_11_2> BEGIN
2 DECLARE
3 V_MODULE VARCHAR2 (64);
4 V_ACTION VARCHAR2 (64);
5 begin
6 dbms_application_info.READ_MODULE(V_MODULE,V_ACTION);
7 DBMS_OUTPUT.PUT_LINE('Module IS '||V_MODULE||' Action IS '||V_ACTION);
8 END;
9 END;
10 /

Module IS RETURNS_LOAD Action IS MATCH_TO_SALE
test_11_2>

So as you can see, these values can be set, they can be accessed via the data dictionary or the package itself and, if you believe my wild claims, they are very lightweight to use. However, these values really come into themselves with eg OEM’s performance screens. I’ll leave examples of that to a second post. Here I just wanted to cover how easy it is to set and get the information.

That does lead me onto what I think is the main reason that DBMS_APPLICATION_INFO is not as widely used as it could {and should} be. Developers often lack access to eg OEM and so do not see the full benefit of using it. Also, developers tend to be {note the get-out-clause) more concerned with making the code work and less with monitoring it after it is released. DBA-types tend to be more concerned with monitoring it. Thus the DBA types would want it in the code and developers would not want to spend the time coding it in. We are all one big team though, so some negotiation and beer-buying (or management enforcement) might well resolve that one.

Will the Single Box System make a Comeback? December 8, 2011

Posted by mwidlake in Architecture, future, Hardware.
Tags: , ,
15 comments

For about 12 months now I’ve been saying to people(*) that I think the single box server is going to make a comeback and nearly all businesses won’t need the awful complexity that comes with the current clustered/exadata/RAC/SAN solutions.

Now, this blog post is more a line-in-the-sand and not a well researched or even thought out white paper – so forgive me the obvious mistakes that everyone makes when they make a first draft of their argument and before they check their basic facts, it’s the principle that I want to lay down.

I think we should be able to build incredible powerful machines based on PC-type components, machines capable of satisfying the database server requirements of anything but the most demanding or unusual business systems. And possibly even them. Heck, I’ve helped build a few pretty serious systems where the CPU, memory and inter-box communication is PC-like already. If you take the storage component out of needing to be centralise (and this shared), I think that is a major change is just over the horizon.

At one of his talks at the UKOUG conference this year, Julian Dyke showed a few tables of CPU performance, based on a very simple PL/SQL loop test he has been using for a couple of years now. The current winner is 8 seconds by a… Core i7 2600K. ie a PC chip and one that is popular with gamers. It has 4 cores and runs two threads per core, at 3.4GHz and can boost a single core to 3.8 GHz. These modern chips are very powerful. However, chips are no longer getting faster so much as wider – more cores. More ability to do lots of the same thing at the same speed.

Memory prices continue to tumble, especially with smart devices and SSD demands pushing up the production of memory of all types. Memory has fairly low energy demands so you can shove a lot of it in one box.

Another bit of key hardware for gamers is the graphics card – if you buy a top-of-the-range graphics card for a PC that is a couple of years old, the graphics card probably has more pure compute grunt than your CPU and a massive amount of data is pushed too and fro across the PCIe interface. I was saying something about this to some friends a couple of days ago but James Morle brought it back to mind when he tweeted about this attempt at a standard about using PCI-e for SSD. A PCI-e 16X interface has a theoretical throughput of 4000MB per second – each way. This compares to 600MB for SATA III, which is enough for a modern SSD. A single modern SSD. {what I am not aware of is the latency for PCI-e but I’d be surprised if it was not pretty low}. I think you can see where I am going here.

Gamers and image editors have probably been most responsible for pushing along this increase in performance and intra-system communication.

SSD storage is being produced in packages with a form factor and interface to enable an easy swap into the place of spinning rust, with for example a SATA3 interface and 3.5inch hard disk chassis shape. There is no reason that SSD (or other memory-based) storage cannot be manufactured in all sorts of different form factors, there is no physical constraint of having to house a spinning disc. Density per dollar of course keeps heading towards the basement. TB units will soon be here but maybe we need cheap 256GB units more than anything. So, storage is going to be compact and able to be in form factors like long, thin slabs or even odd shapes.

So when will we start to see cheap machines something like this: Four sockets for 8/16/32 core CPUs, 128GB main memory (which will soon be pretty standard for servers), memory-based storage units that clip to the external housing (to provide all the heat loss they require) that combine many chips to give 1Gb IO rates, interfaced via the PCIe 16X or 32X interface. You don’t need a HBA, your storage is internal. You will have multipath 10GbE going in and out of the box to allow for normal network connectivity and backup, plus remote access of local files if need be.

That should be enough CPU, memory and IO capacity for most business systems {though some quote from the 1960’s about how many companies could possible need a computer spring to mind}. You don’t need shared storage for this, in fact I am of the opinion that shared storage is a royal pain in the behind as you are constantly having to deal with the complexity of shared access and maximising contention on the flimsy excuse of “sweating your assets”. And paying for the benefit of that overly complex, shared, contended solution.

You don’t need a cluster as you have all the cpu, working memory and storage you need in a 1U server. “What about resilience, what if you have a failure?”. Well, I am swapping back my opinion on RAC to where I was in 2002 – it is so damned complex it causes way more outage than it saves. Especially when it comes to upgrades. Talking to my fellow DBA-types, the pain of migration and the number of bugs that come and go from version to version, mix of CRS, RDBMS and ASM versions, that is taking up massive amounts of their time. Dataguard is way simpler and I am willing to bet that for 99.9% of businesses other IT factors cause costly system outages an order of magnitude more times than the difference between what a good MAA dataguard solution can provide you compared to a good stretched RAC one can.

I think we are already almost at the point where most “big” systems that use SAN or similar storage don’t need to be big. If you need hundreds of systems, you can virtualize them onto a small number of “everything local”
boxes.

A reason I can see it not happening is cost. The solution would just be too cheap, hardware suppliers will resist it because, hell, how can you charge hundreds of thousands of USD for what is in effect a PC on steroids? But desktop games machines will soon have everything 99% of business systems need except component redundancy and, if your backups are on fast SSD and you a way simpler Active/Passive/MAA dataguard type configuration (or the equivalent for your RDBMS technology) rather than RAC and clustering, you don’t need that total redundancy. Dual power supply and a spare chunk of solid-state you can swap in for a failed raid 10 element is enough.

Lack of Index and Constraint Comments November 24, 2011

Posted by mwidlake in Architecture, database design, development.
Tags: , , , ,
10 comments

Something I’ve just reminded myself of is that under Oracle you cannot add a comment on an index or a constraint. You can only add comments on tables, views, materialized views, columns of those object types and a couple of esoteric things like Operators, Editions and Indextypes.

Here is an example of adding comments to tables and columns:

set pause off feed off
drop table mdw purge;
create table mdw(id number,vc1 varchar2(10));
comment on table mdw is 'Martin Widlake''s simple test table';
comment on column mdw.id is 'simple numeric PK sourced from sequence mdw_seq';
comment on column mdw.vc1 is'allow some random text up to 10 characters';
--
desc user_tab_comments

 Name                                                  Null?    Type
 ----------------------------------------------------- -------- ------------------------------------
 TABLE_NAME                                            NOT NULL VARCHAR2(30)
 TABLE_TYPE                                                     VARCHAR2(11)
 COMMENTS                                                       VARCHAR2(4000)

--
select * from dba_tab_comments where table_name='MDW'
/
OWNER                          TABLE_NAME                     TABLE_TYPE
------------------------------ ------------------------------ -----------
COMMENTS
----------------------------------------------------------------------------------------------------
MDW                            MDW                            TABLE
Martin Widlake's simple test table

select * from dba_col_comments where table_name='MDW'
order by column_name
/
OWNER                          TABLE_NAME                     COLUMN_NAME
------------------------------ ------------------------------ --------------
COMMENTS
----------------------------------------------------------------------------------------------------
MDW                            MDW                            ID
simple numeric PK sourced from sequence mdw_seq
MDW                            MDW                            VC1
allow some random text up to 10 characters
-- now to add a big comment so need to use the '-' line continuation character in sqlplus
--
comment on table mdw is 'this is my standard test table.-
 As you can see it is a simple table and has only two columns.-
 It will be populated with 42 rows as that is the solution to everything.'
/
select * from dba_tab_comments where table_name='MDW'
OWNER                          TABLE_NAME                     TABLE_TYPE
------------------------------ ------------------------------ -----------
COMMENTS
----------------------------------------------------------------------------------------------------
MDW                            MDW                            TABLE
this is my standard test table.  As you can see it is a simple table and has only two columns.  It w
ill be populated with 42 rows as that is the solution to everything.
--
/

Adding comments on tables, views and columns seems to have dropped out of fashion over the years but I think it is still a very useful feature of oracle and I still do add them (though I am getting a little slack about it myself over the last 3 or 4 years, which I must stop).

Comments are great, you can put 4000 characters of information into the database about each table, view and column. This can be a brief description of the object, a full explanation of what a column is to hold or even a list of typical entries for a column or table.

But you can’t add a comment on indexes or constraints. Why would I want to? Well, constraints and indexes should only be there for a reason and the reason is not always obvious from either the names of the columns or the name of the constraint or index, especially where you have a naming standard that forces you to name indexes and constraints after the columns they reference.

When you design a database, do a schema diagram or an ERD, you label your relationships between entities/tables. It tells you exactly what the relationship is. You might create an index to support a specific method of access or particular business function. You might alter the index in a way not immediately obvious to the casual observer, such as to allow queries that use the index to avoid having to visit the table. All of those things will, of course, be fully documented in the maintained project documentation in the central repository, available and used by all…

If I was able to add comments to constraints and indexes within the database then they would there. You move the system from one platform to the other, they are there. If for any wildly unlikely reason the central documentation lets you down, the information is always there in the database and easy to check. You may not be able to track down the original design documents but you have the database in front of you, so comments in that will persist and be very easy to find.

Lacking the ability to add comments on indexes and constraints, I have to put them at the table level, which I always feel is a kludge. I might actually raise an enhancement request for this, but as Oracle 12 is already nailed down, it will have to wait until Oracle 14. (A little bird told me Larry said there would be no Oracle 13…).

IOT Part 6(B) – OLTP Inserts into an IOT November 10, 2011

Posted by mwidlake in development, performance, Testing.
Tags: , , , , , ,
14 comments

<..IOT1 – the basics
<….IOT2 – Examples and proofs
<……IOT3 – Significantly reducing IO
<……..IOT4 – Boosting Buffer Cache efficiency
<……….IOT5 – Primary Key Drawback
<…………IOT6(A) – Bulk Insert slowed down

IOT 6 part A covered the sort of insert work used to create an IOT or push a large number of rows into it via an “insert into…select from” type format. ie Bulk Load.
In this, part B, I am going to test inserting a small number of rows (10,000) into a table already holding a million rows. Something more like an hourly load or the inserts that would occur over a working day – a more OLTP-like test

I’m going to do the tests against four things. A “non-insert” test to ensure my data creation is not taking a lot of resource {the Null test}, then against my three versions of test table – a normal heap table, my IOT and my Partitioned IOT. I run each test on each segment type three times, the minimum number needed to help identify variation that could invalidate the test.

You will probably be surprised by the results….

All three tables have the same column structure, the heap has indexes on the primary key and a unique constraint, the IOT and partitioned IOT are organized on a modified version of the Primary key and have the second index for the unique key. Below is the creation statement for the partitioned IOT (the IOT is the same without the partition clause). You can get full creation details from IOT part 2:

create table transaction_IOT_P
(tran_type number(2)     not null
,tran_id   number(10)    not null
,ACCO_type number(2)     not null
,ACCO_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(1000) not null
,vc_2      varchar2(1000)
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint tip_pk primary key(ACCO_type,ACCO_id,cre_date,tran_type,tran_id) 
--  using index tablespace index_01
,constraint tip_tran_uq unique (tran_type,tran_id)
  using index tablespace index_01
)
organization index
tablespace data_01
partition by range  (cre_date)
(partition rm20110901 values less than (to_date('01-09-2011','DD-MM-YYYY')) 
  tablespace data_01
,partition rm20111001 values less than (to_date('01-10-2011','DD-MM-YYYY'))
  tablespace data_01
,partition rm20111101 values less than (to_date('01-11-2011','DD-MM-YYYY'))
  tablespace data_01
,PARTITION RMTOP  VALUES LESS THAN (MAXVALUE)
  tablespace USERS
)

The three tests are:

  • Insert 10,000 records for random accounts, in time order for the same day. This would be like processing data for or during a “day”
  • Generating the 10,000 records in order of account, to better match the IOT structure, again for a single day. For any given account 0,1,2 or 3 records are created, to be a little more “real life”. This would be like loading a pre-processed set of data.
  • All 10,000 records are created for the one account, in datetime order, as an extreme example of inserting into an IOT data that is ordered to perfectly match the order of the IOT.

The code is written ‘badly’, ie a single record is prepared and inserted into the table at a time, within a loop of 10000 iterations. This is to avoid the bulk-processing that may benefit the heap table and also to better match the reality of an OLTP system. I do however only commit after all records are created for any given test, which is maybe a mistake but that’s what I did.

Now, before I go any further, does anyone want to guess by how much inserting into the heap table will be faster than inserting into the IOT (partitioned or otherwise)? Also, whether the partitioned IOT will be faster or slower than the IOT and what, if any, impact there will be by the ordering of data?

To allow you to think about that and not see the results by accident, I’ll stick in one of the insert harnesses here so you can see how I am doing it. Skip over if you just want the results

--
execute s_snap.my_snap(' finished non-insert test1')
--
-- Transaction_heap random data test
declare
v_num         number :=10000; -- number of accounts
v_offset      number :=-4;
v_str1        varchar2(60);
v_str2        varchar2(60);
v_tran_type   number(2);
v_tran_id     number(10);
v_acco_type   number(2);
v_acco_id     number(10);
v_cre_date    date;
v_vc_1        varchar2(1000);
v_vc_2        varchar2(1000);
v_date_1      date :=sysdate;
v_num_1       number(2) :=10;
v_num_2       number(2) :=15;
begin
v_str1:=dbms_random.string('U',60);
v_str2:=dbms_random.string('U',60);
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..10000 loop
    v_tran_type :=mod(1,3)+1;
    v_tran_id   := 1000000+i;
    v_acco_type := 5+(trunc(dbms_random.value(1,3))*5); -- 10 or 15
    v_acco_id   :=trunc(dbms_random.value(1,v_num/2)); -- 1 to 5000
    v_cre_date  :=trunc(sysdate-v_offset)+(i/(60*60*24));
    v_vc_1      :=substr(v_str1,1,51+mod(i,10));
    v_vc_2      :=substr(v_str2,1,46+mod(i,15));
    insert into transaction_heap
      (tran_type,tran_id,ACCO_type,ACCO_id,cre_date,vc_1,vc_2,date_1,num_1,num_2)
    values
      (v_tran_type,v_tran_id,v_ACCO_type,v_ACCO_id,v_cre_date
      ,V_vc_1,v_vc_2,v_date_1,v_num_1,v_num_2);
end loop;
commit;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
end;
/
--
execute s_snap.my_snap(' finished th insert test1')
--

Here are the results of three runs for all tests:

Elapsed time in seconds:

                    random insert             ordered-by-account       perfect IOT order match
Segment type    Test 1    2    3            Test 1    2    3           Test 1    2     3    
------------    ------ ---- ----            ------ ---- ----            ----- ---- ----
No insert          2.1  0.1  0.1               0.1  0.1  0.1              0.1  0.0  0.1   

                                    AVG                         AVG                        AVG
                                   ----                        ----                       ----
Heap              52.3 20.5 11.4   28.1       33.4 13.8 10.7   19.3       9.1 13.0  9.8   10.6
IOT              137.1 83.7 26.4   82.4      110.8 63.6 39.0   71.2       6.4  6.7  4.0    5.8
Partn'd IOT       33.2  9.5  8.7   17.1       17.1 10.4 14.3   13.9       8.1  5.3  4.5    6.0
                                   ----                        ----                       ----

Allow me to summarise that.

First, the preparation of the data and cycling around the loop, the no-insert tests, are negligible. {The first no_insert test is taking 2 seconds due to the parsing and collecting dictionary information overhead of the first thing done}. We can ignore the overhead of the test harness.

Much more significantly:

  • Each iteration of each test, things get faster {with one odd exception}
  • For Random and semi-ordered inserts, inserting into the IOT is slower than the Heap by a factor of 300% to 400% (3 times to 4 times slower) – what we all probably expected.
  • Inserting into the IOT is Faster than the heap when the insert order matches the structure of the IOT in the perfect_IOT test. Not just the same but 40% faster
  • The partitioned IOT is as Fast or Faster than the the Heap table, never mind the IOT. Over all the partitioned IOT takes 50% TO 75% THE TIME OF THE HEAP to be populated.

Hands up everyone expecting this? NB My hand would be down if I had not done similar tests before.

Why do we see these results?

I’ll just mention that the reason all the tests get quicker with each iteration is that the computer is doing the same thing over and over again and gets used to it, the relevant circuits in the chip are “oiling in”. Of course not, how silly – but it seems to be how non-IT people think of it :-) Basically more and more of each segment is getting cached and that reduces physical IO and speed things up. You see this a lot in testing and you need to be constantly vigilant about it. You might think I should really run the tests until they are steady-state, but that is a very false situation. On real system you don’t tend to get the same thing run over and over and over again on a system with nothing else running.

Much more interestingly, why is the partitioned IOT winning? Well, for the Heap table data is being poured into the leading edge of the table, which takes a little processing and some blocks to be created. However, there are also two indexes being maintained, the Primary Key and that Unique key. For each insert oracle has to find the right spot in each index (root block ->branch block->leaf block->check for row) and insert the new record, occasionally splitting a full leaf block.

For the IOT and the Partitioned IOT there is no table to maintain. In effect there are just two indexes to maintain. For the normal IOT this benefit is outweighed by the cost of working with a less space efficient Primary Key index. As there is the whole of the row in each index entry there are fewer index entries possible per leaf block and so more block need to be read in from disc and more splits are needed. Thus the inserts into the IOT are slower.
But for the partitioned IOT not only is there no table to maintain but the table (really the Primary Key) is partitioned – on date. We are creating data for a single day at a time. All the relevant blocks will be in the one partition (and it’s the latest and smallest partition) and quickly all cached in memory and processed in memory. It’s the efficient use of the cache again. For both the IOT and the Heap, the primary key segment being processed is larger and the inserts are spread across the whole index.

Ordering data to match the index order (ordered_by_account tests) helps for all three segment types – heap, IOT and partitioned IOT. We have two indexes and the data is being ordered for both in effect, as one is on the columns we are ordering the data creation for and the other, the unique key, is generated by an ascending sequence. The gain is about 25, 30% and is due to working through the indexes in order as opposed to skipping about. If the indexes were significantly larger than the buffer cache or on a highly active system where the turnover in the cache was very high (indeicated by a low buffer cache hit ratio :-) ) then the benefit would be higher.
The perfect ordering test (perfect_iot_order_match) is a special case. Not only is the data being processed in order of the indexes but only one part of the index is being updated – for a single account. In this situation the maintaining of the index is so efficient that the time saved not processing the heap table means both the IOT and IOT_P win out over the heap. So in this special case, inserts into IOTs are faster than inserts into a heap table.

And finally a quick summary:

  • Inserting records into an IOT carries about a 200-400% overhead, depending on the number of indexes
  • Ordering the data to match the IOT gives a 25-30% advantage, but it similarly aids update of the primary key on the heap
  • Partitioning the IOT such that a smaller chunk of the index is processed can be highly beneficial. I don’t cover it specifically in the above but you can probably appreciate that partitioning on a heap can be beneficial to processing indexes on the heap also.
  • If you can organise to create your IOT such that data is created only in one or a few distinct points in the structure (you might have to tweak you data load too of course), insert performance might match or even beat that for a heap table with the same indexes
  • Using methods to maximise efficient use of your buffer cache and reduce IO is a very powerful performance tuning aid, not just for select

BTW I am conscious of the fact that the above may surprise many and you might wonder if my data or tests are flawed. I have stripped out most of the code from this post as it was just becoming massive and confusing, but contact me and I will happily supply all the example scripts and data.

IOT P6(a) Update November 8, 2011

Posted by mwidlake in Architecture, development, performance, Testing.
Tags: , , , ,
3 comments

In my last post, IOT part 6, inserts and updates slowed down, I made the point that IOT insert performance on a relatively small Oracle system was very slow, much slower than on a larger system I had used for professional testing. A major contributing factor was that the insert was working on the whole of the IOT as data was created. The block buffer cache was not large enough to hold the whole working set (in this case the whole IOT) once it grew beyond a certain size. Once it no longer fitted in memory, Oracle had to push blocks out of the cache and then read them back in next time they were needed, resulting in escalating physical IO.

I’ve just done another test which backs up this claim. I altered my test database so that the block buffer cache was larger, 232MB compared to 100MB in my first tests. The full IOT is around 200MB

Bottom line, the creation of the IOT was greatly sped up (almost by a factor of 4) and the physical IO dropped significantly, by a factor of 20. As a result, the creation of the IOT was almost as fast as the partitioned IOT. It also shows that the true overhead on insert of using an IOT is more like a factor of 2 to 4 as opposed 6 to 8.

You can see some of the details below. Just to help you understand them, it is worth noting that I had added one new, larger column to the test tables (to help future tests) so the final segments were a little larger (the IOT now being 210MB as opposed to 180MB in the first tests) and there was a little more block splitting.

                        Time in Seconds
Object type           Run with       Run with
                     100MB cache    232MB cache
------------------  ------------    -----------   
Normal Heap table          171.9          119.4   
IOT table                1,483.8          451.4     
Partitioned IOT            341.1          422.6 

-- First reading 100MB cache
-- second reading 232MB cache 
STAT_NAME                            Heap    	IOT	      IOT P
-------------------------------- ---------- -----------  ----------
CPU used by this session            5,716         7,222       6,241
                                    5,498         5,967       6,207

DB time                            17,311       148,866      34,120
                                   11,991        45,459      42,320

branch node splits                     25            76          65
                                       25            82         107

leaf node 90-10 splits                752         1,463       1,466
                                      774         1,465       1,465

leaf node splits                    8,127        24,870      28,841
                                    8,162        30,175      40,678

session logical reads           6,065,365     6,422,071   6,430,281
                                6,150,371     6,544,295   6,709.679

physical read IO requests             123        81,458       3,068
                                      36          4,012       1,959

physical read bytes             2,097,152   668,491,776  25,133,056
                                1,400,832    34,037,760  16,048,128

user I/O wait time                    454       139,585      22,253
                                       39        34,510      19,293

The heap table creation was faster with more memory available. I’m not really sure why, the cpu effort was about the same as before and though there was some reduction in physical IO with the larger cache, I suspect it might be more to do with both the DB and the machine having been recently restarted.

All three tests are doing a little more “work” in the second run due to that extra column and thus slightly fewer rows fitting in each block (more branch node and leaf node splits), but this just highlights even more how much the IOT performance has improved, which correlates with a massive drop in physical IO for the IOT creation. If you check the session logical reads they are increased by a very small, consistent amount. Physical read IO requests have dropped significantly and, in the case of the IOT, plummeted.

I believe the 90:10 leaf node splits are consistent as that will be the maintaining of the secondary index on ACCO_TYPE and ACCO_ID, which are populated in order as the data is created (derived from rownum).

What this second test really shows is that the efficiency with which you are able to make use of the database cache is incredibly significant. Efficiently accessing data via good indexes or tricks like IOTs and hash tables is important but it really helps to also try and consider how data is going to be recycled within the cache or used, pushed out and then reused. A general principle for batch-type work seems to me to be that if you can process it in chunks that can sit in memory, rather than the whole working set, there are benefits to be gained. Of course, partitioning can really help with this.

{If anyone is wondering why, for the heap table, the number of physical IO requests has dropped by 70% but the actual number of bytes has dropped by only 30%, I’m going to point the finger to some multi-block read scan going on, either in recursive code or, more likely, my code that actually gathers those stats! That would also help explain the drop in user IO wait time for the heap run.}

Just for completeness, here is a quick check of my SGA components for the latest tests, just to show I am using the cache size I claim. All of this is on Oracle 11.1 enterprise edition, on a tired old Windows laptop. {NB new laptop arrived today – you have no idea how hard it has been to keep doing this blog and not play with the new toy!!!}. If anyone wants the test scripts in full, send me a quick email and I’ll provide them.:

-- sga_info.sql
-- Martin Widlake /08
-- summary
set pages 32
set pause on
col bytes form 999,999,999,999,999 head byts___g___m___k___b
spool sga_info.lst
select * 
from v$sgainfo
order by name
/
spool off
clear col
--
NAME                             byts___g___m___k___b RES
-------------------------------- -------------------- ---
Buffer Cache Size                         243,269,632 Yes
Fixed SGA Size                              1,374,892 No
Free SGA Memory Available                           0
Granule Size                                4,194,304 No
Java Pool Size                              4,194,304 Yes
Large Pool Size                             4,194,304 Yes
Maximum SGA Size                          401,743,872 No
Redo Buffers                                6,103,040 No
Shared IO Pool Size                                 0 Yes
Shared Pool Size                          142,606,336 Yes
Startup overhead in Shared Pool            50,331,648 No
Streams Pool Size                                   0 Yes

Friday Philosophy – The Dying Art of Database Design? September 9, 2011

Posted by mwidlake in Architecture, development, Friday Philosophy, rant.
Tags: , , ,
35 comments

How many people under the age of {Martin checks his age and takes a decade or so off} ohh, mid 30’s does any database design these days? You know, asks the business community what they want the system to do, how the information flows through their business, what information they need to report on. And then construct a logical model of that information? Judging by some of the comments I’ve had on my blog in the last couple of years and also the meandering diatribes of bitter, vitriolic complaints uttered by fellow old(er) hacks in the pub in the evening, it seems to be coming a very uncommon practice – and thus a rare and possibly dying skill.

{update – this topic has obviously been eating at my soul for many years. Andrew Clark and I had a discussion about it in 2008 and he posted a really good article on it and many, many good comments followed}

Everything seems to have turned into “Ready, Fire, Aim”. Ie, you get the guys doing the work in a room, develop some rough idea of what you want to develop (like, look at the system you are replacing), start knocking together the application and then {on more enlightened projects} ask the users what they think. The key points are the that development kicks off before you really know what you need to produce, there is no clear idea of how the stored data will be structured and you steer the ongoing development towards the final, undefined, target. I keep coming across applications where the screen layouts for the end users seem to almost be the design document and then someone comes up with the database – as the database is just this bucket to chuck the data into and scrape it out of again.

The functionality is the important thing, “we can get ‘someone’ to make the database run faster if and when we have a problem”.

Maybe I should not complain as sometimes I am that ‘someone’ making the database run faster. But I am complaining – I’m mad as hell and I ain’t gonna take it anymore! Oh, OK, in reality I’m mildly peeved and I’m going to let off steam about it. But it’s just wrong, it’s wasting people’s time and it results in poorer systems.

Now, if you have to develop a simple system with a couple of screens and a handful of reports, it might be a waste of time doing formal design. You and Dave can whack it together in a week or two, Chi will make the screens nice, it will be used by a handful of happy people and the job is done. It’s like building a wall around a flower bed. Go to the local builders merchants, get a pallet of bricks, some cement and sand (Ready), dig a bit of a trench where you want to start(Aim) and put the wall up, extending it as you see fit (Fire). This approach won’t work when you decide to build an office block and only a fool from the school of stupid would attempt it that way.

You see, as far as I am concerned, most IT systems are all about managing data. Think about it. You want to get your initial information (like the products you sell), present it to the users (those customers), get the new (orders) data, pass it to the next business process (warehouse team) and then mine the data for extra knowledge (sales patterns). It’s a hospital system? You want information about the patients, the staff, the beds and departments, tests that need doing, results, diagnoses, 15,000 reports for the regulators… It’s all moving data. Yes, a well design front end is important (sometimes very important) but the data is everything. If the database can’t represent the data you need, you are going to have to patch an alteration in. If you can’t get the data in quick enough or out quick enough, your screens and reports are not going to be any use. If you can’t link the data together as needed you may well not be able to DO your reports and screens. If the data is wrong (loses integrity) you will make mistakes. Faster CPUS are not going to help either, data at some point has to flow onto and off disks. Those slow spinning chunks of rust. CPUS have got faster and faster, rust-busting has not. So data flow is even more important than it was.

Also, once you have built your application on top of an inadequate database design, you not only have to redesign it, you have to:

  • do some quick, hacky  fixes to get by for now
  • migrate the existing data
  • transform some of it (do some data duplication or splitting maybe)
  • alter the application to cope
  • schedule all of the above to be done together
  • tie it in with the ongoing development of the system as hey, if you are not going to take time to design you are not going to take time to assess things before promising phase 2.

I’m utterly convinced, and experience backs this up, that when you take X weeks up front doing the database design, you save 5*X weeks later on in trying to rework the system, applying emergency hacks and having meetings about what went wrong. I know this is an idea out of the 80’s guys, but database design worked.

*sigh* I’m off to the pub for a pint and to reminisce about the good-old-days.

Follow

Get every new post delivered to your Inbox.

Join 198 other followers