jump to navigation

Friday Philosophy – Who Comes Looking? September 18, 2009

Posted by mwidlake in Blogging.
Tags: ,
8 comments

I’ve been running this blog for a few months now and I find it interesting to see how people come to it. A handful of people come to it as I tell them I have a blog page, but most people come across it by either:

  • Links from other blogs or web pages.
  • Search engines.

WordPress gives me stats on these for Today and Yesterday and I can check back on the referrers and searches for any given day, going back several months. Most blog sites provide the same features, I thought I would just run through them for those who do not have a blog.

I can tell when I have been mentioned on someone else’s blog, as I usually see a spike in my hits and their web page is at or near the top of the list of referrers. Interestingly, I will sometimes see a burst of hits from an old reference on someone else’s blog or webpage. I think this happens when a third person has referenced the page or person which then referenced me.

Another interesting facet is the impact on my hits if an Oracle Name mentions me. My busiest day occurred when Richard Foote mentioned a posting I did on “Unhelpful Helpful People” and a couple of other well-known Oracle Names also picked up on the thread. It’s a bit like a small-time-actor getting into a scene with a Hollywood Star :-).

The most interesting, though, are the search engine hits.

My favorite search term to lead to my blog so far is “martin widlake unhelpful people”. I really hope that was someone looking for the post I mention above, as opposed to anything else…

As time goes by, the search engine hits are generating a larger and larger slice of my traffic (and the personal mentions less and less :-) ). This is going to be partly due to me putting more content on the Blog to be found but also, as I get more hits and links, search engines will give me more prominence. It becomes self-feeding. Search engines find me as I have been visited before, so I get visited again and Search engines see that I have been visited even more and move me up the list…

{This is, of course, how Burleson gets so much traffic, he always references back to himself and his web sites and appears to have several sites that all cross-reference between them, priming the search engine pump (or absolutely flooding it, I suspect)}.

Some of the most common searches that find me are on obscure items I have blogged about. They may not be of such general interest {such as when I blogged about errors with gathering system statistics {{and more to follow on that topic}} } but I guess when someone hits the same issue or topic, I am one of a very few places that has mentioned it. I get a steady trickle of hits for “c_obj#_intcol#” since I blogged about it often being the biggest object in the SYSTEM tablespace. So perhaps to increase my search engine hits I should not blog about mainstream issues but rather really obscure, odd stuff than almost no one is interested in!

Some days I will get several hits by people searching on “Martin Widlake”. I wonder why they are searching on me specifically. Occasionally, it has been just before I am called about a job. Usually not though {so maybe it was about a job - but then they found my blog and decided against it…}.

Some searches that get to my blog are just odd. Yesterday one search that found me was “how to put fingers on keyboard”. Why? I have no idea why a search on that would land on my blog. Maybe I should try it!

Oh, and I suddenly have a favorite search that found me, hot in today, just as I am blogging about the very topic:

“it’s a crock of cr4p and it stinks”

Now what is that about? Why search on it and why find me?

*sigh*

What is a VLDB? September 18, 2009

Posted by mwidlake in Architecture, VLDB.
Tags: ,
10 comments

In a post just a couple of days ago on testing, I complained that VLDBs break at the edges. Coskan posted a comment asking I blog on what I consider to be a VLDB and thus what a VLDB DBA is, and I am more than happy to oblige, especially as Coskan not only comments a lot but provides a really good summary of blogs on his own blog. {in fact, I need to add Coskan to my blog roll, something I have been meaning to do for ages}.

Hopefully, this will link to the comments section of that post as Piet deVisser added a wonderful comment answering the question for me. Go and read, I suspect it is better than my attempt here!

VLDB stands for Very Large DataBase. It is not an acronym I like as it sounds suspiciously like a sexually transmitted disease, but maybe that is just a problem with my mind. The term ULDB appeared for a while but seems to have failed to gain traction. U stands for “Ultra” of course.

So what is a Very Large DataBase?

A VLDB is a database who’s very size gives you, as a DBA or database architect, extra work.

Maybe a simpler rule that you can apply is “you can’t back up the database in 24 hours using standard tools”. You can chuck more tape drives and IO channels at a DB but you will quickly hit a limit where you infrastructure or budget can’t cope.

Another, and one I pinch outrageously from Piet is “you can’t afford to duplicate the database for test/QA purposes”. That leads to a whole raft of issues.

I put forward another definition of a VLDB in a comment on the original blog. “Any database who’s size makes 8 out of 10 DBAs comment ‘that is a big database’.” That definition takes into account whether a database is generally beyond the experience of most DBAs /Designers. Why do I think that is significant? Because it means most DBAs/Designers will not have worked with a database that size and thus dealt with the associated problems. The database engine may {or may NOT, as I have complained about} cope with the database size, but you need staff to design it and look after it who know how to do so.

The definitive size of a VLDB, of course, goes up year by year. A few weeks ago I found a document I have mentioned in presentations a couple of times, an internal White Paper by Oracle Corp on what a VLDB is, written around 1994. Next time I am at home I’ll scan it. If I remember correctly, at that time 30GB and beyond on a VMS or Mainframe system was considered a VLDB and, in Unix (the new kid on the block back then), 20GB was the threshold.

Right now, as of September 2009, I would judge any database over 10TB of used space is a VLDB. In 12 months, that will be 20TB. In another 12 months, 40 or maybe 50TB.

“Moore’s Law” traditionally states that compute power doubles every 18 months, but I have just suggested that the VLDB limit doubles every 12 months. I say that as, over the last 10 years, I have worked on several systems, systems most DBAs would consider as “challengingly large”, which double in 12 months or less. Data is exploding. More and more of us are dealing with VLDBs.
This “doubling in 12 months” was not the case (I think) back in 1995, it started in 2000 or so. Before then, database size was doubling about in line or less than with Moore’s law I would say, but that is only my opinion.

What changed? People swapped from thinking you could store only what you really needed to thinking you could store “everything”. Which is “everything” your Moore’s-law expanding CPUs can process PLUS all the summary and metadata you extract from that data.

I could be wrong in my figures though. If you took size as the determining factor and doubled 20GB every 18 months from 1994, you would now class a VLDB, in 2009, as approx 20TB.

What main issues do you face with a VLDB?

  • Backing up the database. With a VLDB, a daily backup of everything via RMAN or Hot Backup is simply not possible, as you can’t run the backup in 24 hours. You need to: Backup less often; backup only part of the DB; use hardware such as mirror splitting or deltas; some other trick like, say, never backing it up but having 3 standbys. I’ve seen it done.
  • Performance. You need to consider radical changes such as removing RI or designing around full table scans and ignoring the block buffer cache for the largest tables.
  • The number or size of objects starts causing bits of Oracle to break or work less efficiently (so many tables it takes 2 minutes to select them all or you hit an unexpected limit like th 2TB disk size in ASM, because you need to use bigger disc sizes as otherwise you need more discs than you can easily manage).
  • Maintenance tasks become a challenge in their own right. This could be stats gathering, it could be adding columns to a table, it could be recreating global indexes, all of which now take more time than you can schedule in the maintenance windows {so part of the definition of a VLDB could be down to how active a database is and how small you maintenance windows are – 1TB could be a VLDB if you can never spend more than an hour doing anything!}
  • GUIs are no use to you. Listing all the tablespaces in your database with OEM is a pain in the proverbial when you have 962 tablespaces. You can’t keep track of all of them, visually.
  • You can’t properly test or prototype as you cannot afford to create a full sized test system

I’d like to pick up that last point. With A VLDB, you often end up doing things on the live system that you have been unable to test or prove because you simply lack a test system that is even within an order of magnitude the size of your live system. RAC is a particular issue, it seems many sites are happy to have the live system as a RAC system but not the test or development systems. When you raise the issue, the response is often “well, there is not that much difference between RAC and non-RAC systems is there?”. You are not allowed to get violent with the client,or even deeply sarcastic. Yes, there is a huge difference.

A VLDB DBA is someone who has had to consider the above for more than a few months, or on more than one system. Or who simply cries when you mention Oracle breaking when it exceeds size limits.

How do you know when you are dealing with a ULDB? When you can find no one else who will speak publically about a database bigger than yours. When I was working with the Sanger Institute on my pet “it is really quite huge” database I would often have these frustrating conversations with Oracle Corp:

“X is giving me trouble as the database is just so large”

“It is not that large, we have customers with bigger database”

“Can you introduce me, so we can talk about these issues?”

“Errr, no , they don’t like to talk about their systems”.

Thanks. Thanks a bunch.

Remember, there is always someone with a bigger DB than you. But they probably won’t talk about it.

 

Enough for tonight….

Follow

Get every new post delivered to your Inbox.

Join 166 other followers