jump to navigation

Friday Philosophy – What ever happened to System Design? September 14, 2009

Posted by mwidlake in Architecture, performance.
Tags: ,
8 comments

{OK, so it is not Friday, I’ve got a lot on, OK?! Shesh…}

I’m a Contractor/Consultant. As a Contractor I am expected to arrive On Site and within a week be making progress.  As a Consultant the week becomes a day.

That’s fine, it’s the territory. However, there is a question I always ask and I know other people ask it when they arrive on site:-

“Can I please have a look at the database design”.

We normally mean an Entertity Relationship diagram or at least a schema design by this. The usual answer these days?

” We keep meaning to do one, but it never seems to quite get done – here are some misleading and poor notes on a few of the the schemas, if you get an ER diagram together, we would love to see it”.

Why does no one do an ER diagram these days and why is there never a picture of the tables and relationships that you can look at for a system? Despite the fact that anyone new to the system hopes one will exist? What Happened to the skill of logical design?!

When did the ethos of having an overall schema layout just go out of the window??? And how in hell do You (yes, You, the system owner) expect to have a clue about the overall system without one?

Oh, and what is that I hear? You want me to improve the performance of your system without having a description of said system!? You have no clue of the overall database design but you want it to run faster?!? I better set the hidden “_RUN_DATABASE_FASTER” parameter to TRUE immediately then.

Is this just me or WHAT?

Your Backups Are Probably Too Simple September 10, 2009

Posted by mwidlake in Architecture.
Tags:
5 comments

Following on from my post a few days back on When a Backup is Not a Backup I want to pick up on the topic of Simplicity, something Piet de Visser, Neil Chandler and I regularly discuss.

It’s very good to ask yourself the question “how to keep oracle backups simple“. A simple backup may not be the quickest to run or the fastest to recover from but it is a lot more likely to work. Which is the one critical defining feature of a backup!

However, the backing up of the Oracle database itself is not what I have in mind, after all any DBA who is beyond the trainee stage should know how to do it and there are many options with Oracle to achieve a good, simple backup (Cold backup, full export, manual hot backups, simple RMAN )

{As an aside, something that really bugs me about Oracle at the moment is their utter focus on RMAN in the backup documentation –  for example, the 10gR2 “Backup and Recovery Basics Manual” is almost dedicated to RMAN. It should not be, is not called the “Recovery Manager Basics Manual”. If you go to the Advanced guide it is all pretty much RMAN until you get to chapter 17, where you eventually find a reasonable set of information on creating a simple, user-controlled hot backup. Or use Google on “End backup Oracle” and there is endless information and help. Such a simple backup method should be made more of by Oracle Corp, I believe}

My concern is that the Oracle Database is very often only one component of the whole system.  Note  the “the.”

First up – do you use Database Links to read data between the database you are considering and others? More to the point, do you do two-phase commits between your database of concern and other oracle databases? If so, you may well need to keep the data in-synch between them. {Don’t forget – Your applications may independently update multiple database without using Database Links}. How exactly do you manage that? Have a policy that if database (a) has to be recovered, you also do database (b), both to a point-in-time?

Second, and probably most common, do you have files external to the database that are referenced, say as BFILES or via the application? If you do a recovery, do you need to make sure those files are in-synch with the database? You may well have the situation where files on the server/file store are backed up at one time and the database at another. Loss of the whole server will need a full recovery of both backups, which were taken at different times.

Three, are there other dependencies, such as password authentication, account creation, audit maintenance?

Four, If your system crashes and was half way through a batch process, how do you cleanly resume the batch or clear it down and start again (having made sure the consideration of external files above has been resolved)?

 I’m sure you get the idea now. Think about it.

Test recoveries are good at identifying the above issues, but only if the test recovery includes testing the application(s) that use the database {as opposed to the usual “select sysdate from dual – yep, it works” test}.

The above considerations are part of why I am a fan of keeping all the things to do with a business application in the Oracle database, even if it might not be the ideal from other perspectives (eg images make the database a lot larger and may not be processed as fast as they can be as external files).

All of the above are probably the first steps towards creating your Disaster Recovery Plan. “Disaster Recovery” is a whole topic in its own right. Actually, it’s a whole business and one people have very lucrative careers in. But if you are the DBA and have the backups nailed, then Disaster Recovery (what to do if the building burns down) is probably next on your list of things to worry about.

When a Backup is Not a Backup September 8, 2009

Posted by mwidlake in Architecture.
Tags: ,
13 comments

I had a call from an old client today who had found themselves in a bit of a pickle. They were facing what I’d judge is the third most common “avoidable disaster” for Oracle databases. And probably the most devastating.

They had a damaged database, just one of those things. Media failure had corrupted a couple of datafiles. This is notthe disaster, most DBAs who have been around for a few years have experienced this. Disks are mechanical devices, they go wrong, controllers develop faults, people accidentally pull the plugs out of switches.

The disaster was that they had never tested their backup. Guess what? It turned out that their backup was not be a valid backup. It was just a bunch of data on tape {or disk or wherever they had put the backup}.

A backup is not a backup until you have proven you can recover from it.

If your backup has not been tested, and by that I mean used to recreate the working system which you then test, then it is just a bunch of data and a hopeful wish. You will not know if it works until you are forced to resort to it, and that means (a) you already have a broken system and an impact on your business (b) time is going to be short and pressure is high and (c) if you are the DBA, you could be about to fail in your job in a most visible and spectacular way.

Oddly enough, only two or three weeks ago I had another client in exactly the same position. Database is corrupted, the backup had not been tested, the backup turned out to be a waste of storage. In fact, in this case, I think the backup had not been taken for a considerable period of time as a standby database was being relied on as the backup. Which would have been fine if the Standby had been tested as fulfilling the purpose of “working backup”.

The standby had not, as far as I could deduce, ever been opened and tested as a working and complete database since the last major upgrade to the host system. When tested for real, It proved not to be a working and complete database. It was an expensive “hopeful wish”.

The client from a few weeks ago finally got their system back, but it took a lot of effort for the DBAs and the developers to sort it out and they were a tad lucky. The jury is out on the client who called me today.

I can’t think of anything at all that the DBA function does that is more important than ensuring backups are taken and work. {Maybe an argument could be made that creating the system is more important as nothing can be done about that, but then you could argue that the most important thing you do in the morning is get out of bed}. Admittedly, setting up, running and testing backups is not a very exciting job. In fact it often seems to be a task passed on to a less experienced member of the team {just like creating databases in the first place}. But most of us are not paid to have fun, we are paid to do a job.

I’ll just make a handful more comments before giving up for today.

  • The database backup nearly always needs to be part of a more comprehensive backup solution. You need a way to recreate the oracle install, either a backup of the binaries and auxiliary files (sql*net, initialization files, password files etc) or at least installation media and all patches so you can recreate the installation. You need a method to recover or recreate your application and also your monitoring. You might need to be able to recreate your batch control. O/S? Have you covered all components of your system, can you delete any given file off your system and fix it?
  • Despite the potential complexity resulting from the previous point, you should keep your backup as simple as you can. For example, if you only have a handful of small databases, you might just need old-style hot backups via a shell script, rather than RMAN. Only get complex {like backing up from a standby} if you have a very, very compelling business need to do so.
  • Testing the backup once is a whole world better then never testing it. However, regular, repeated recovery tests not only allow the DBA/Sys Admin teams to become very comfortable with the recovery process and help ensure it is swift and painless, but by trying different scenarios, you may well discover issues that come under the first two points.

I’ve not even touched on the whole nightmare of Disaster Recovery :-)

Follow

Get every new post delivered to your Inbox.

Join 160 other followers