jump to navigation

How Big is a Person? November 5, 2010

Posted by mwidlake in Architecture.
Tags: , ,
trackback

How big are you in the digital world?

By this, I mean how much space do you (as in, a random person) take up in a database? If it is a reasonably well designed OLTP-type database a person takes up 4K. OK, around 4K.

If your database is holding information about people and something about them, then you will have about 4K of combined table and index data per person. So if your database holds 100,000 customers, then your database is between 200MB and 800MB, but probably close to 400MB. There are a couple of situations I know of where I am very wrong, but I’ll come to that.

How do I know this? It is an accident of the projects and places I have worked at for 20 years and the fact that I became strangely curious about this. My first job was with the NHS and back then disk was very, very expensive. So knowing how much you needed was important. Back then, it was pretty much 1.5K per patient. This covered personal details (names, addresses, personal characteristics), GP information, stays at hospitals, visits to outpatient clinics etc,. It also included the “reference “ data, ie the information about consultants, wards and departments, lookups etc. If you included the module for lab tests it went up to just over 2K. You can probably tell that doing this sizing was a job I handled. This was not Oracle, this was a database called MUMPS and we were pretty efficient in how we held that data.

When I moved to work on Oracle-based hospital systems, probably because I had done the data sizing in my previous job and partly because I was junior and lacked any real talent, I got the job to do the table sizings again, and a laborious job it was too. I did it very conscientiously, getting average lengths for columns, taking into account the length bytes, row overhead, block overhead, indexes etc etc etc. When we had built the database I added up the size of all the tables and indexes, divided by the number of patients and… it was 2K. This was when I got curious. Had I wasted my time doing the detailed sizings?

Another role and once again I get the database sizing job, only this time I wrote a little app for it. This company did utilities systems, water, gas, electricity. My app took into account everything I could think of in respect of data sizing, from the fact that the last extent would on average be 50% empty to the tablespace header. It was great. And pointless. Sum up all the tables and indexes on one of the live systems and divide by the number of customers and it came out at 2-3K per customer. Across a lot of systems. It had gone up a little, due to more data being held in your average computer system.

I’ve worked on a few more person-based systems since and for years I could not help myself, I would check the size of the data compared to the number of people. The size of the database is remarkably consistent. It is slowly going up because we hold more and more data, mostly because it is easier to suck up now as all the feeds are electronic and there is no real cost in taking in that data and holding it. Going back to the hospital systems example, back in 1990 it used to be that you would hold the fact a lab test had been requested and the key results information – like the various cell counts for a blood test. This was because sometimes you had to manually enter the results. Now the test results come off another computer and you get everything.

I said there were exceptions. There are three main ones:

  • You are holding a very large number of transaction records for the person. Telephony systems are one of the worst examples of this. Banking, credit cards and other utility systems match the 4K rule.
  • You hold images or other “unstructured” chunks of data for people. In hospital systems this would cover x-rays, ultrasound scans etc. But if you drop them out of the equation (and this is easy as they often are held in separate sub-systems) it remains a few K per person. CVs push it up as they are often in that wonderfully bloaty Word format.
  • You are holding mostly pointers to another system, in which case it can be a lot less than 4K per person. I had to size a system recently and I arrogantly said “4K per person”. It turned out to be less than 1K, but then this system turned out to actually hold most person data in one key data store and “my” system only held transaction information. I bet that datastore was about 4K per person

I have to confess that I have not done this little trick of adding up the size of all the tables and indexes and dividing by the number of people so often over the last couple of years, but the last few times I checked it was still 3-4K – though a couple of times I had to ignore a table or two holding unstructured data.
{The massive explosion in the size of database is at least partly down to holding pictures – scanned forms, photos of products, etc, but when it comes down to the core part of the app for handling people, it seems to have stayed at 4K. The other two main aspects driving up database size seem to me to be the move from regional companies and IT systems to national and international ones, and that fact that people collect and keep all and every piece of information, be it any good for anything or not}.

I’d love to know if your person-based systems come out at around 4K per person but I doubt if many of you would be curious enough to check – I think my affliction is a rare one.

Comments»

1. David Morris - November 6, 2010

My DB (started in 1995) is of military casualties, 6000 entries and is a tiny 3Mb in size so I’m looking at only 512 bytes per record. I’m not storing and address information, just name rank, number and some text, so with an address. I’d agree that 2k is a good rule of thumb.

2. Tweets that mention How Big is a Person? « Martin Widlake's Yet Another Oracle Blog -- Topsy.com - November 7, 2010

[…] This post was mentioned on Twitter by John Piwowar, John Piwowar. John Piwowar said: Interesting read on Martin Widlake's blog: "How Big is a Person?" http://bit.ly/aJ6mAb […]

3. Graham Oakes - November 11, 2010

Hi Martin,

I think another reason that dbs are growing is laziness derived from cheaper disk. Back in the day (!) when disk was expensive, archiving routines were built into the original designs for applications because we simply couldn’t afford not to. As disk has become cheaper development teams are more interested in extra functionality and any thought of creating a process for cleaning up or archiving unwanted data has fallen to the bottom of the list. I’ve lost count of the number of bloated databases I’ve come across in the last few years that are full of old data that just aren’t needed anymore.

mwidlake - November 11, 2010

So true Graham, so true. Archiving seems to be almost unheard of in modern systems and, yes, people hold data “just because they can”. It’s like photography. When you had to keep changing physical film and then get the films processed, I think people thought about the pictures they were taking. Now people take photos of everything, all the time, often with hardly a glance at what it is they are photographing. I wonder how many of those pictures get looked at more than once…

4. Database Sizing – How much Disk do I need? (The Easy Way) « Martin Widlake's Yet Another Oracle Blog - November 11, 2010

[…] a similar way to my thoughts on how much database space you need for a person, I also used to check out the total disk space every database I created and those that I came […]


Leave a reply to Graham Oakes Cancel reply