jump to navigation

Saturday Philosophy – The unbelievably small world of VLDBs June 12, 2010

Posted by mwidlake in VLDB.
Tags: , , ,

Yesterday I posted about the potential for a Oracle in Science community within the UK Oracle user group {and wider for that matter, there is after all a world Oracle Life Science community but it is currently less vibrant than it was, sadly}.

My friend and occasional drinking partner Peter Scott replied to say he felt there was “a place for a SIG for stonking great databases” {now wouldn’t SGDB be a better TLA than VLDB? 🙂 }.

Well, I would agree but for one small detail. An apparent lack of anyone willing to be part of the community.

When I was building a very considerable VLDB {and I’m sorry I keep going on about it, I’ll try and stop soon} back in the early to mid 2000’s I seemed to be working in a vacuum of information, let alone prior experience. Yes, there was stuff in the Oracle manuals about how big things could theoretically be made and some vague advice on some aspects of it, but an absolute lack of any visible Oracle customers with anything even approaching the sizes I was contemplating. 2TB was about the limit and I was already way beyond that. Was this because I really was pushing the boundaries of database size? Well, I have since found out that whilst I was up there just behind the leading edge, there were several databases much, much bigger than mine and others already envisioned that might hit the Petabyte level, let alone Terabyte.

The thing is, no one would speak about them. At all.

We were left to do it all pretty much from scratch and it would not have been possible if I had not spent years building up with VLDBS as the definition of a VLDB size increased, plus of course cracking support by the other DBAs and Systems Admins around me. And to be fair, Oracle Corp helped us a lot with our efforts to build these massive databases. Interestingly, one Oracle Consultant would regularly tell me that our systems really were not so unusually big and there were plenty larger. He usually said this when I asked, exasperatedly as something else failed to scale, if Oracle had every tested things at this level :-). But despite constantly asking to meet with these people with massive systems, so we could exchange war stories and share advice, and being promised such contacts by Oracle, they never materialized except for CERN – who we already talked to as a fellow scientific organisation – and Amazon, who it turns out did things in a very different way to us {but it was really good to talk to them and find out how they did do their big databases, thanks guys}. Both were at the same scale or just behind where we were.

This is because most of the people with massive oracle databases will not talk about them as they are either run by the largest financial organisations, are to do with defense or in some other way just not talked about. In his comment Peter refers to a prior client with an OLTP-type system that is now around the PB scale. I would be pretty sure Peter can’t say who the client is or any details about how the system was designed.

So although I think there is a real need for a “stonking great databases” forum, I think there is a real problem in getting a user community of such people/organisations together. And if you did, none of the members would be allowed to say much about how they achieved it, so all you could do would be sit around and brag about who has the biggest. There is an Oracle community about such things, called the Terabyte Club, but last I knew it was invite-only and when I managed to get invited, it turned out that mine was biggest by a considerable margin, so I was still not meeting these elusive groups with 500TB databases. Maybe there is an Oracle-supported über database society but as I never signed the official secrets act might not have been eligible to play.

If I am wrong and anyone does form such a user group (or is in one!) I would love to be a member and I would strive to present and help.

I’ll finish with what appears to be a contradiction to what I have just written. There already is a UKOUG User Group that deals with large systems and I chair it – the Management and Infrastructure SIG. {sorry, the info on the web page could do with some updating}. Part of what we cover is VLDBs. But we also cover Very Many DataBases (companies with thousands of instances) and Very Complex DataBases plus how you go about the technical and management aspects of working in a massive IT Infrastructure. It might be that we could dedicate a meeting to VLDBs and see how it goes, but I know that whilst many who come along are dealing with database of a few TB, no one is dealing with hundreds of TB or PB database. Either that or they are keeping quiet about it, which takes us back to my main point. The MI SIG is probably the closest to a VLDB SIG we have in Europe though, and is a great bunch of people, so if you have a VLDB and want to meet some fellow sufferers, we have our next meeting on 23rd September in the Oracle City office.



1. Pete Scott - June 12, 2010

So true – it is as if people are ashamed to stand up and say “my name is Mike and I have a large database”

I remember putting in 4.5TB system at the turn of the century – go live was Dec 1999 – I felt quite proud of that but the customer would not own-up to having it, not even to owning a barn full of Sun kit… 4.5TB of 18GB disks needs a little space. I guess this was argument that they got “competitive advantage through analysing their data” and telling people how to do it was silly.

And now people might admit they have the data, but not on what it is running. For example I have had a talk accepted for Oracle Open World this September on the lessons learned building a retail data warehouse on Exadata, the bit I have to leave out is who the customer is!

This of course is big problem for us who speak about databases – we can talk if we are not specific but then that almost becomes wooly marketing hype. Working in a consultancy lets me have more scope on what I can say, but not free rein. And people at meeting love to hear stories from users, and people like me…

Two other points – that PB scale DB is strictly speaking not OLTP either. And SGDB or VLDB for that matter are not TLAs, the are FLAs …. there’s inflation for you 🙂

mwidlake - June 13, 2010

Hi Pete,

OK, OK, caught on the TLA/FLA point. I guess many people pretty much interpret TLA as “Acronym” now, but it does not make it correct. At least FLA can cover four and five letter acronyms.

It is interesting that you mention the size of the disks used back in 1999 and that they were 18GB. One of the things that allows people to build Terabyte, and now Petabyte, datastores is the acerage of modern disks. 1TB discs are becoming relatively common. But as the seek times and transfer rates of disks have not increased at anything like the rate of the acerage, disk storage is slowing down in relation to the data volume and that is a major issue with VLDBs. {I thought I had blogged about that but I can’t find it, but I have presented on it so I can see a quick translation to a blog coming up on that}.

On the topic of having material to speak about. Yes, being in a consultancy, especially one with a good reputation and strong staff like yours, helps get access to more systems, but audiences I think are slightly sceptical about presentations by consultancies, as they fear it will just be a sell for the consultancy. Less so than presentations by Oracle but not as “clean” as end users. Who are not allowed by their company to present….

2. Noons - June 13, 2010

More likely lack of info on these is due to the folks involved being flat out trying to overcome the bugs they uncover at every turn. But just ignore that and call me a “dinossaur”, after all it appears to work for Oracle…

And in many cases, VLDB ends up being a collective exercise with a group of people or a consultancy company involved and understandably they don’t want that knowledge capital to be shared.

I prefer to refer to any database that nudges the limits of the hardware involved as a vldb: the problems became very much the same, abstracting the bugs.

mwidlake - June 13, 2010

That’s a good point about working flat out when you are are effectively doing testing work on these extremes noons 🙂

And yes, there is an intellectual capital and benefit in being able to cope with massive data volumes, but given your first point of all the lost time and effort finding where VLDBs break Oracle {and, I guess, other RDBMSs} at the edges, making contacts with other such users and exchanging info on the mechanics of living with the data volumes would seem to make sense to me.

I kind of expected to find a lot of people interested in hearing about VLDB techniques and few people willing to present! Not an absolute absence of any willingness to engage at all.

Finally, I like your definition of VLDB as anything that nudges the limits of the hardware. It’s one of the factors I came up with when trying to define what a VLDB is .

Noons - June 13, 2010

I forgot to include the software “limits” as well: as important as the hardware ones, IME.
Not just the database software: have had many nasty surprises with the OS itself, as well as storage architectures and devices.

I think the unwillingness to engage is due to the change in nature of the dba and specialist job.
For the last 10 years, Oracle and others have been dumbing down those, to the point where we now have most dbas and “architects” so unaware of the inherent limits of their systems that all of this is just noise to them.
Very few are even aware of what is possible, let alone how to do it or how to get there.

I was told recently by one of the above that the next upgrade to our architecture has to be Exadata.
We just reached around half the sustained I/O throughput of an entry level Exadata, in hardware that costs an order of magnitude less.

The argument goes like this: “Why bother? If Oracle marketing says the next step is Exadata, who are we to contradict them?”
That’s what I mean by dumbing down the job.

And the show goes on…

mwidlake - June 13, 2010

That’s a very good point about the software and the other infrastructure. I think it is getting harder and harder not to specialise as there is so much more you need to know about each area. Things simply are not getting simpler. How many DBAs know everything about Audit, Security, Backup & Recovery, ASM, other storage, SQL*Net, Parallel, RAC… So to learn about Linux, Networks, storage etc as well is increasingly difficult. And yet, knowing at least the basics about those areas can reap dividends. Question is, when do you sleep?

On the marketing, Oracle {as a company} like any other vendor is there to sell their stuff. Nothing much else.Companies don’t care if you need it or not, they just want to sell it. If selling it means identifying the customer’s needs and addressing them then a commercial company will do that – if and only if the long-term payback is more than just marketing to the hilt. Have you noticed that mobile phone companies now concentrate on selling on the basis of “It’s better than the last model, look how sexy it is and how many extra features it’s got”? Only very rarely with any mention of it’s ability to make and receive calls. I don’t want to have a 4MP camera on my phone. I have a camera, it works fine. That diddy little lens on the phone is not good enough to give decent clarity at 4MP detail and it lets in so little light for that many pixels that in anything less than bright sunlight the image will have so much noise it will have to be heavily processed. Just give me a phone that makes calls and maybe stick a 1MP camera with a half-decent lens on it for snaps. And I do not need it to be able to reproduce the philharmonic orchesta on it’s external speaker. It is tiny and tinny and not suited to it. “ring ring” will do, it’s a PHONE.

I’m as grumpy as you, Noons!

Noons - June 13, 2010

LOL! My old 6610 doesn’t need to be rebooted every second day, like the new E72 from work does…
Coming back to VLDB and the lack of interest:

The amount of learning needed nowadays to cover the entire gamut of Oracle is staggering. I doubt that anyone can be an expert on the whole lot, hence my reluctance in accepting the certification exercises as having any validity: it’s just too wide a field, all one can do is know in detail one area at a time. Simply impossible to keep up with the lot.

But for VLDB, the range is much narrower and more specialised. That makes it possible for one to concentrate enough “grey matter” firepower to at least be able to understand the problems involved, if not reason all solutions. And once one knows what the problems are, it’s then feasible to effectively engage specialist knowledge to resolve each hurdle.

What I find scary is a management -and yes: also a dba – attitude that “none of this is worth pursuing, we just engage consultants whenever we need to think”. Sure fire way to blow budgets and end up buying what one doesn’t need!

Good luck taking this with the OUG. I gave up on the Australian version of it and their insistence on boring me with Fusion and other j2ee marketing nonsense – and decided to join the Meetup initiative from Pythian: at least there we can decide what to discuss, instead of having endless presentations from commercial consultancies.

Not perfect, but a lot better than what I saw at the OUG for years.

mwidlake - June 13, 2010

User groups and VLDB.
Something which is nice about the UK OUG is the special interest groups. You go to the DBMS SIG and you will not get three presentations on Fusion and two on Java, you get 5 presentations on database stuff. And maybe one out of left field, but then you want the occasional talk about other things.

The Management and Infrastructure SIG is quite a small one, but we strive to keep the talks relevant and limit the sales pitches. Again, a pitch or two is OK, if it is in a very relevant field, as some people may well want the product being pitched.

But I think a SIG dedicated to the very top end of VLDB just would not succeed, not unless a couple of the more significant players were to join in. I can’t see the MOD doing that, for a start…

When I came back to being technical about 4 years back I realised I could not re-skill across the board, so I have not tried to become a production DBA again. I do Performance as I already knew something and I like it, though it is the most common “specialisation”, and VLDB, as I had a lot of experience again and it is a rarer skill.

It’s a pity you are on the other side of the globe, Noons, this would make an excellent Pub topic 🙂

3. oakesgr - June 14, 2010

Hi Martin,

The lack of people willing to speak about their oracle experiences / solutions (VLDB or otherwise) is a particular gripe of mine. Unfortunately it’s just not in their interest to do so. Certain industries and organisations are so paranoid about their ‘market leading technology’ (which almost definitely isn’t anything like market leading!) that they refuse to let their employees divulge any information whatsoever.

It’s disappointing as you can almost guarantee that whatever problem you find you’re up against, someone out there has already encountered and resolved it.

As for sales pitches at SIGs – as long as they’re in the minority I’m happy to put up with them. Certainly at your last MI SIG I felt that the balance was about right. The exadata one was a sales pitch in all but name, but it helps to have a break from continuous heavy thinking anyway.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: