jump to navigation

Database Sizing – How much Disk do I need? (The Easy Way) November 11, 2010

Posted by mwidlake in Architecture, development, VLDB.
Tags: , , , ,
trackback

How much Disk do I need for my new Oracle database? Answer:-

  • 8-10 times the volume of raw data for an OLTP system
  • 2-4 times the raw data volume for a Data Warehouse.
  • The bigger the database, the nearer you will be to the lower multiplication factors.

{Disclaimer. This is of course just my opinion, based on some experience. If you use the above figures for a real project and get the total disc space you need wrong, don’t blame me. If you do and it is right, then of course you now owe me a beer.}

Many of us have probably had to calculate the expected size a database before, but the actual database is only one component of all the things you need to run the Oracle component of your system. You need to size the other components too – Archived redo logs, backup staging area, dataload staging area, external files, the operating system, swap space, the oracle binaries {which generally gets bigger every year but shrink in comparison to the average size of an Oracle DB} etc…

In a similar way to my thoughts on how much database space you need for a person, I also used to check out the total disk space every database I created and those that I came across took up. {A friend emailed me after my earlier posting to ask if I had an obsession about size. I think the answer must be “yes”}.

First of all, you need to know how much “raw data” you have. By this I mean what will become the table data. Back in the early 90’s this could be the total size of the flat files the old system was using, even the size of the data as it was in spreadsheets. An Oracle export file of the system gives a pretty good idea of the raw data volume too. Lacking all these then you need to roughly size your raw data. Do a calculation of “number_of_rows*sum_of_columns” for your biggest 10 tables (I might blog more on this later). Don’t be tempted to overestimate, my multipliers allow for the padding.

Let us say you have done this and it is 60GB of raw data for an OLTP system. Let the storage guys know you will probably want about 500GB of space. They will then mentally put it down as “of no consequence” as if you have dedicated storage guys you probably have many terabytes of storage. {Oh, I should mention that I am not considering redundancy at all but space that is provided. The amount of actual spinning disk is down to the level and type of RAID you storage guys make you use. That is a whole other discussion}.

If you come up with 5TB of raw data for a DW system then you need around 12-15TB of disk storage.

If you come up with more than a Terabyte or so of raw data for an OLTP system or 10 to 20 Terabytes for a DW, when you give you figures to the storage guys/procurement people then they may well go pale and say something like “you have got to be kidding!”. This is part of why the multiplication factor for Data Warehouses and larger systems in general is less, as you are forced to be more careful about the space you allocate and how you use it.

The overhead of total disk space over Raw data reduces as the database gets bigger for a number of reasons:

  • The size of the Oracle binaries and the OS does not change as the database gets bigger.
  • The size of swap space does not increase in line wiht the database as, generally speaking, if you increase the database size from 100GB to 1TB you do not have the luxury of increasing the system memory of your server. It probably doubles.
  • Very large databases tend to have something making them big, like images or embedded documents, which are not indexed. Thus the ratio of table segments to index segments increases.
  • If you have a very large database you start removing indexes (often those that support constraints) to aid performance of data load and management, again improving the ratio of table segments to index segments.
  • Backups become partial or incremental to reduce the size and duration of the backup.
  • As mentioned before, the sheer size of system is such that you just take more care over cleaning up work areas, paring down the archived redo log areas (those files to compress well) and other areas.
  • If things get extreme or you have been doing this for donkeys years {note to none-UK people, this means many, many years} you start altering PCTFREE and checking over extent sizes.

My best ever ratio of database size to raw data was around 1.6 and it took an awful lot of effort and planning to get there. And an IT manager who made me very, very aware of how much the storage was costing him (it is not the disks, it’s all the other stuff).

I should also just mention that the amount of disk you need is only one consideration. If you want your database to perform well you need to consider the number of spindles. After all, you can create a very large database indeed using a single 2TB disc – but any actual IO will perform terribly.

About these ads

Comments»

1. PdV - November 12, 2010

In jest:

For OLTP I had a long reasongin (indexes, rollback, temp), endingin
Pi-Square, or approx factor of 10.

Funny that your best effort is 1.6, which approaches the “golden ratio” of 1.618…

mwidlake - November 12, 2010

“which approaches the “golden ratio” of 1.618″
Maybe it was a golden database architecture :-)

Thanks for the comment Piet and worry not about the typos – I manage enough with a normal keyboard and a spell checker available to me.

2. PdV - November 12, 2010

Bloody hell, internet access is bad enough here, via termservers and stuff. I made it worse with typos.
Apologies.

3. Log Buffer #207, A Carnival of the Vanities for DBAs | The Pythian Blog - November 26, 2010

[...] much disk do I need for my new Oracle database? This is the question,one of my favorite blogger Martin Wildake asked and answered in his blog post. He is also an Oakie now, and even secretly doesn’t think [...]

4. Senadj - November 28, 2010

Perhaps you could also add reserved space for log, trace, dump files. In case of errors in OLTP system you could get a series of large core dumps (i’ve seen dumps of more than a Gb each).
Be careful where you put these files – if they are all on one mount point one problematic instance could fill up space and you would have problems with all instances.

mwidlake - November 28, 2010

Hi Senadj,

Thanks for that, especially the advice on log files/dump files filling up filesystems – that is something I’ve seen a number of times over the years and it is good you highlight it.
There is a whole discussion thread possible on what you put where for an oracle DB and sizing the individual components. Anyone interested would do well to start by googling Optimal Flexible Architecture or OFA Oracle. I still owe Cary Millsap a good dozen pints for that.

However, those log files would all come under the multipliers I mention. The number and size of log files keeps going up, but so does everything. I remember the size of the oracle software binaries going down only once, I think it was 9 to 10 beta.

5. State of Data Last Week – Dec 11 « Dr Data's Blog - December 10, 2010

[...] – How much storage do I need for my data? ROT (Rule of thumb) is 8-10x Raw data for OLTP; 2-4x Raw Data for [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 166 other followers

%d bloggers like this: