jump to navigation

Friday Philosophy – The Dying Art of Database Design? September 9, 2011

Posted by mwidlake in Architecture, development, Friday Philosophy, rant.
Tags: , , ,
trackback

How many people under the age of {Martin checks his age and takes a decade or so off} ohh, mid 30’s does any database design these days? You know, asks the business community what they want the system to do, how the information flows through their business, what information they need to report on. And then construct a logical model of that information? Judging by some of the comments I’ve had on my blog in the last couple of years and also the meandering diatribes of bitter, vitriolic complaints uttered by fellow old(er) hacks in the pub in the evening, it seems to be coming a very uncommon practice – and thus a rare and possibly dying skill.

{update – this topic has obviously been eating at my soul for many years. Andrew Clark and I had a discussion about it in 2008 and he posted a really good article on it and many, many good comments followed}

Everything seems to have turned into “Ready, Fire, Aim”. Ie, you get the guys doing the work in a room, develop some rough idea of what you want to develop (like, look at the system you are replacing), start knocking together the application and then {on more enlightened projects} ask the users what they think. The key points are the that development kicks off before you really know what you need to produce, there is no clear idea of how the stored data will be structured and you steer the ongoing development towards the final, undefined, target. I keep coming across applications where the screen layouts for the end users seem to almost be the design document and then someone comes up with the database – as the database is just this bucket to chuck the data into and scrape it out of again.

The functionality is the important thing, “we can get ‘someone’ to make the database run faster if and when we have a problem”.

Maybe I should not complain as sometimes I am that ‘someone’ making the database run faster. But I am complaining – I’m mad as hell and I ain’t gonna take it anymore! Oh, OK, in reality I’m mildly peeved and I’m going to let off steam about it. But it’s just wrong, it’s wasting people’s time and it results in poorer systems.

Now, if you have to develop a simple system with a couple of screens and a handful of reports, it might be a waste of time doing formal design. You and Dave can whack it together in a week or two, Chi will make the screens nice, it will be used by a handful of happy people and the job is done. It’s like building a wall around a flower bed. Go to the local builders merchants, get a pallet of bricks, some cement and sand (Ready), dig a bit of a trench where you want to start(Aim) and put the wall up, extending it as you see fit (Fire). This approach won’t work when you decide to build an office block and only a fool from the school of stupid would attempt it that way.

You see, as far as I am concerned, most IT systems are all about managing data. Think about it. You want to get your initial information (like the products you sell), present it to the users (those customers), get the new (orders) data, pass it to the next business process (warehouse team) and then mine the data for extra knowledge (sales patterns). It’s a hospital system? You want information about the patients, the staff, the beds and departments, tests that need doing, results, diagnoses, 15,000 reports for the regulators… It’s all moving data. Yes, a well design front end is important (sometimes very important) but the data is everything. If the database can’t represent the data you need, you are going to have to patch an alteration in. If you can’t get the data in quick enough or out quick enough, your screens and reports are not going to be any use. If you can’t link the data together as needed you may well not be able to DO your reports and screens. If the data is wrong (loses integrity) you will make mistakes. Faster CPUS are not going to help either, data at some point has to flow onto and off disks. Those slow spinning chunks of rust. CPUS have got faster and faster, rust-busting has not. So data flow is even more important than it was.

Also, once you have built your application on top of an inadequate database design, you not only have to redesign it, you have to:

  • do some quick, hacky  fixes to get by for now
  • migrate the existing data
  • transform some of it (do some data duplication or splitting maybe)
  • alter the application to cope
  • schedule all of the above to be done together
  • tie it in with the ongoing development of the system as hey, if you are not going to take time to design you are not going to take time to assess things before promising phase 2.

I’m utterly convinced, and experience backs this up, that when you take X weeks up front doing the database design, you save 5*X weeks later on in trying to rework the system, applying emergency hacks and having meetings about what went wrong. I know this is an idea out of the 80’s guys, but database design worked.

*sigh* I’m off to the pub for a pint and to reminisce about the good-old-days.

Comments»

1. Niall Litchfield - September 9, 2011

off to the pub at 10:30am, you must be bitter and twisted! Or just published the next day, or some other real reason.

Anyway.I think an awful lot of it occurred when the job title designer morphed into the job title Architect and holders of that position tended to mentally move from specifying how business problems should be solved, to specifying which technologies should be used to solve business problems. There’s more in me on that, so maybe that’s a blog of its own.

mwidlake - September 9, 2011

OK Niall, I did not go to the pub yet…And you were much missed on Tuesday.

I’d be interested to see that Blog on the Architect title. I use it sometimes myself but in many ways I don’t like doing so – but it probably best matches in the minds of potential employers and agents what I do. I design Oracle systems. I design them with the aim of supporting what the business needs that system to do and in such a way that it is still working in 3 years and can be easily modified. I try to take into account the hardware but I expect there to be experts on storage and network who can deal with that as I think no one person can know enough (and keep up-to-date on) all of the stack well enough to do it all just. Well, I can’t!

Dom Brooks - September 9, 2011

So development is more short-term and agile but as has to be the planning, financing and resource allocation – easy to ramp up, easy to ramp down.

Can we justify a dedicated database specialist within the team? Not yet. Nope, still no. Oh well.

Can one person attend five different sprint stand-ups all scheduled at the same time? No. Ok, so we’ll get five different people all of whom know a little bit about the database that should cover it.

Many of the development teams that I have known that did have a database specialist (e.g. in a ratio of up to 15:1 java developer:database specialist) no longer work that way any longer.

But it’s also not specifically about the database. It’s about the big picture. The data and the database deal with the big picture. And it is the big picture being neglected that is often the issue.

2. John Brady - September 9, 2011

Martin,

I couldn’t agree more, which is why I blogged about Why data modelling is important 2 years ago now. And as I said then, for any serious application “Its ALL about the Data”. And how it will affect scalability of an application later on.

Some of this is due to the Agile movement, which seems to meet your description of skipping any major design phase and just getting on with the development as soon as possible. Hence, that’s why I hate Agile Development.

I think there is also a major disconnect between modelling data attributes of real world entities (people, products) and how data is structured while it is being processed within a program. They are different, but seem to be mistaken as being the same as each other by many people. Hence we have Java developers (to pick an example) that create their own Object / Class Model for how their new application will process the data, and either forget that this is different to the permanent data or simply assume it will magically be stored on disk in an efficient manner. And later on someone else has to deal with the mismatch between the Object Model and what the proper Relational Model should be – often termed the Impedance Mismatch.

John Brady

PS. I hope the links to my blog entries come through – I have used raw HTML to put them into this comment. Not sure if these get filtered out or not.

mwidlake - September 9, 2011

Thanks John, and the links are fine – in both they work and that I like the contents.

I was not going to touch on Agile methodologies but you, Tim, Martin (and several of the comments and Andy’s article that I now link to) mention Agile.
Maybe there is a link between the attitude behind lack-of-design and also {AB}use of Agile. I can see some real benefits of Agile, a big part of me wants to like Agile and I think it is in some ways just a rehash of parts of earlier design methodologies (Rapid Application Development anyone? SWAT teams anyone?) but I think a lot of us have been exposed to really very, very poor use of Agile. It seems to be too often be an excuse for lack of patience or a method for management to “cut through the chaff and get the job done”. I’d like to cut through the chaff of meetings about why the development failed (yet again) and just do the job right. My gut feeling is Agile is right for handling enhancements or developing front-end applications, you need to focus on that job and get it done swiftyl. It only, ONLY works if it is done on some good foundations.

David Harper - September 9, 2011

John, re “I hate Agile Development””, I had very similar experiences in an “Agile” team that I was drafted into, three years ago. They were a bunch of Ruby-on-Rails fans, fresh out of college, who had been given the task of building a mission-critical web application with a database back-end. They were way too cool to do anything like database design. Or listen to the users.

They thought that they were the dog’s bollocks, and unfortunately, so did one or two key people in senior management, so the fact that the users hated the application, and it wasn’t scaling to handle real-world demand, and the team didn’t know how to fix it, were overlooked. Indeed, the team doubled in size, yet the application became more and more loathed by users. There were even a couple of “unfortunate incidents” when they managed to trash their production database, and it was left to the DBA team to rescue them from the wrath of their users.

The whole sorry mess is now being examined by an external consultant, to see whether it should be replaced by a commercial product. As for me, I left the team at the start of this year and became a DBA 🙂

3. Tim Hall - September 9, 2011

We are all agile* now. No need for design or any other form of forethought… 🙂

One of the bad things the internet has done is allowed people to adopt the “chuck it out quick and we’ll fix it later” attitude. When deploying a new application meant installing it on 600 PCs, you can bet people were invested in getting it done right first time.

* There is a place for agile development, provided it uses a real agile methodology. Unfortunately, most people seam to equate agile with no methodology.

Cheers

Tim…

Mark D Powell - September 9, 2011

“chuck it out quick and we’ll fix it later” Our developers have been using this approach for more than 20 years. In fact as I think about it I am not so sure that this approach does not go back to the dawn of IT.

1acosta - September 12, 2011

Reminds me of “there’s nothing more permanent than a temporary solution”. Been there one too many times!

Gerald

4. Martin Berger (@martinberx) - September 9, 2011

If it gives you any satisfaction, even I AM in my mid 30´s I also have a big problem with that implementation of ‘AGILE’.
It’s just not right to mix up every step in building *anything*, and ‘later’ is just an abbreviation for ‘I have no clue and hope to handle the hot potato to someone else in the future’.

5. Martin Bach - September 9, 2011

Hi Martin!

Having been to the pub that remarkable evening too I thought I’d add a comment too.

I am in my early thirties (and yes, that’s rubbing it in) and had a two semester course on database design back at Uni, including how to write ER diagrams and preserve relational integrity even involving triggers! OK, some may not be relevant anymore but the main problem is as you say, the lack of attention to the database design. Worse, there are too many people relying on an object-relational mapper to develop the data model (shudder).

Equally, you will almost certainly hear the counter-argument that we all are dinosaurs on the way to extinction, and agile is the way to develop! Sometimes I feel I’m too cynical for my age…

Martin B.

6. Mathew Butler - September 9, 2011

I’ve seen data modelling happening on recent projects, and even on some that could be considered “Agile”. I don’t think this happens nearly enough though.

Notice I said data modelling and not database design – on the “Agile” projects the physical stuff is thought out, but in a more iterative fashion. An “Agile” delivery is by nature more iterative in terms of delivering functionality, and this is top to bottom including the database. This takes some getting used to.

My experience of “Agile” is that even with some “real” “Agile” methodologies, teams are still working out the details of what is important and leads to a successful delivery. What I like about Agile is the regular cycle of looking back to identify what works/doesn’t work and the imperatives to change within each cycle to improve the construction/delivery process – I think that this is the value in any development methodology. And yes, this can take place regardless of your approach to construction and delivery.

This “Agile” stuff *still* leaves me feeling uncomfortable, though – I don’t think the “Agile” projects I refer to have quite yet got the balance of up front thought and design (across all tiers, including the database) right, just yet.

I’ll stop there as I’m at risk of going off topic (if I have gone too far down that route already, apologies)

Thanks for posting this, Martin, you’ve motivated me to comment on a blog – something I haven’t done for some time now.

7. Marco Gralike - September 9, 2011

Even if its a guy like me, nuts about implementing XML in the database (aka “new 2.0 stuff”), guess what, I noticed that I spend 3/4 of my time regarding XML Schema and the underlying physical design consequences / alterations needed, to make it work.

Afterwards, the other 1/4 part of the time is just simple. Agile probably works, but I think only, and only if, when all topics are addressed by knowledgable people which also have the discipline to act upon it.

Toon Koppelaars - September 9, 2011

> when all topics are addressed by knowledgable people which also have the discipline to act upon it.

Touchee…

8. Dom Brooks - September 9, 2011

I like the idea of a two tier database infrastructure.

If everything database related – design, code, etc – has been reviewed (ideally by your database experts on the project as you go, i.e. as part of the iterative agile process), approved and all database policies adhered to, etc, then this way please, welcome to the platinium infrastructure – you get our full support, etc.

Sorry? What’s that? You’ve got to go live next week, there’s no time to change anything let alone review anything? This way sir/madam to our plastic bit bucket service. Yes you were aware that you needed to involve the database people, remember we sent that memo, had that meeting and you signed that bit of paper. Excellent. Call us when you’re ready to do things properly.

9. Graham - September 9, 2011

I have a theory that Agile isn’t really to blame. My thought is that the quality of the people involved will have a much higher bearing on the quality of the end product rather than the development methodology used. A poor team will create a poor end product…and vice versa.

Now having said all that, I don’t think that Agile is particularly well suited to the logical or physical design steps.

10. David Harper - September 9, 2011

Let’s not forget that all the cool kids use Ruby on Rails to build their database-backed web applications these days. Rails developers will tell you that they don’t need fusty, boring things like foreign key constraints or unique indexes in their database schemas. No sirree, because Rails takes care of all that for you, and it never makes a mistake.

I’m not making this shit up, as Martin knows. For a couple of deeply frustrating years, I had to work with people who actually believed that stuff, and who tried to make me believe it too. Fortunately, I managed to escape to the relatively sane and serene world of database administration. It was either that or Prozac, and I figured that becoming a DBA was a better career move.

mwidlake - September 9, 2011

But you WERE tempted by the prozac, weren’t you David? 🙂 {only kidding}
Mind you, those had been a pretty bad couple of years for you. Quite a few good people were chewed up by that team and spat out damaged…
My last “Friday Philosophy” was on the topic of bad teams and the root being the bad manager/management environment. I think you were in a classic example of that.

11. jgarry - September 9, 2011

As many people (Tom Kyte springs to mind) have pointed out, front ends come and go, but the data is forever. Of course that is an oversimplification, but in the hive mind of hot development topics that concept has been turned on it’s head – social media data is entirely transient. Even being able to point to a relevant 2 year old blog is a rarity, which to me is just more proof that 98% of everything is crap.

This just brings up the point of data life cycles – as db designers or old-timey dba’s, we only consider fairly long-lived and important data and its relationships “data.” But if you think of it with a life cycle, we see we are mentally skipping all those things like transient and temporary data, and perhaps a lot of streamed and dynamically aggregated data – we may not materialize it unless we “have to for performance.”

So if someone tries to twitterize some important data, there isn’t the infrastucture to life cycle it. The data’s there, then it is gone. What’s missing is a smooth transition on the continuum of data life. What infrastructure there is is also transient, if it isn’t all bright and shiny and new, it gets forgotten.

How will my refrigerator survive?

mwidlake - September 10, 2011

“we see we are mentally skipping all those things like transient and temporary data, and perhaps a lot of streamed and dynamically aggregated data ”

I can’t agree Joel {which is unusual, I usually do agree with you}. I’d say that a data lifecycle person should and would be aware of transient data, especially if it is a key part of the application. I know one person who works on an airport bagage handling system and, for them, the data as it is in the database today is pretty much all that is important. So speed and daily reliability is paramount, backup and recovery is very secondary (who needs to know what bags were lost last week 🙂 ). Share dealing and gambling system are similar, but some of that data needs to be persisted of course. A database design exercise would capture this, especially if you bothered with CRUD analysis and DLM.

Part of why I have not bothered {yet} with twitter is as it seems designed for truly unimportant transient data, ie stuff of the moment or just random blahh – “Yay, just watched a DVD of Not the Nine O’clock News and Pamela Stevenson is great!” {Note the use of full words, I am just not cut out for twitter}. My mates might want to join in on a quick reminisce but I would not want to keep that for 2 years! So, as you say, what happens to tweats you really would like to hold on to?

As for infrastructure, yes it moves on, but often with old data ported across. I wonder if one of my Astronomer friends would like to comment on how small and how old some of the ?Viking? datasets are and how they are still being analyzed.

jgarry - September 16, 2011

Speaking of old astronomical data

We perhaps disagree less than you think, if you haven’t spent much time in the social media. Truly unimportant transient data, a term I can grasp. Tut tut data. It’s what happens when you have developers without a data lifecycle person. The other half of the point I intended was those of us who do think in olde design terms don’t consider that as data. We depersonalize the enemy and kill him.

I first misread your tweats misspelling. Decades ago, my sig was “It ain’t the teat, it’s the tumidity.” I was having severe swelling of my knees at the time, though few online knew that and would assume other swelling.

mwidlake - September 17, 2011

Hey, that Lunar Orbiter Image recovery Project you link to is a cracker!

Maybe we are closer in opinion than I thought, and yes I have not joined in with the Social Media world very much. It’s a time sink and I don’t seem to ever have much time.

David Harper - September 10, 2011

Apropos “long-lived data” …

Back when I was paid to do astronomy rather than look after databases, one of my colleagues worked with Professor Richard Stephenson, an astronomer at Durham University. They were interested in tracking the historical change in the rotation rate of the Earth, which has been slowing down ever since the Moon was formed and began to drift away from the Earth, gradually stealing its rotational angular momentum in the process.

They were able to make use of 2400-year-old observations of solar eclipses made in ancient Babylon. The Babylonian astronomers recorded their observations on clay tablets, which are still readable today.

I wonder whether today’s astronomical observations will still be readable in the year 4400?

12. Dominic Delmolino - September 10, 2011

I notice that people only start paying attention to design when it becomes clear that the infrastructure can’t run their hugely inefficient O(n^10) algorithm. Eh? What’s that? People don’t know how to design or predict the scalability of algorithms anymore? Shocker!

Before the advent “massive horizontal scalability at web scale”, folks bumped up against single-server hardware limits, so that they couldn’t paper over their code with ever faster hardware — much to the chagrin of the hardware and software license sales people who were always happy to sell the bigger box, and the higher license cost “per cpu, per core, per Ghz”. In some respects that brought on a mini-Golden Age of people starting to spend more time on designs that could actually scale and run efficiently.

Now though all you hear about is that the way to performance happiness is by “horizontal scalability” — lazy speak for “just add more hardware” and “that database can’t scale because it won’t run across 1,000 nodes”. Your database isn’t fast enough? Just shard it. Need to have the data on every node, just duplicate it. Storage is cheap, cpu are cheap, memory is somewhat cheap, etc.

I’ve had limited success in demonstrating that writing simpler code, writing code that reduces cycles and spins, actually saves money through needing less floor space, less cooling, less power, etc. Often times I can also demonstrate that by putting together the right table design, and the right database API (procedures that directly implement business / mission functions) I can actually deploy capability faster than the Ruby / Java folks who are attempting to marshal and wrangle data in their code…

David Harper - September 10, 2011

“Your database isn’t fast enough? Just shard it.”

Or use MongoDB, because MongoDB is web-scale:

http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale

(Thanks to Martin for sharing the link in the first place!)

13. Tyler Muth - September 10, 2011

Why do you think we’re selling so many Exadata’s? 😉 I spend most of my time doing Exadata POVs these days and at least have of the systems we test would be substantially faster with some DB Dev basics. Nobody thinks in “sets” anymore, way too much row-by-row.

Compounding the problem is the explosion of data volumes and drive capacity, but only minimal increases in drive performance. Interesting stat from seagate presentation I’ve been meaning to blog about: from 1978 to 2008 drive capacity increased 92,000x but throughput only increased by 32x (numbers might be off slightly, but close).

Then you have the fact that the storage admins and DBAs rarely talk and are often at odds with each other. I see data warehouses with 2x 2Gb FC HBAs. People arguing over 2.9 vs 3.1 GHz chips, speed of RAM, but then “we’ll just throw it on the SAN”.

Now all I hear about is nosql this and hadoop that. They don’t change the laws of physics in any way. If your query needs to traverse 600 GB of data in 3 minutes, you better have the hardware to support that or redesign your data model.

mwidlake - September 10, 2011

Breath, Tyler, Breath… Pull in the air, relax, calm…
And rest.

I’ve said it many times and others have said it better, disk storage is SLOWING DOWN in real terms when compare to volume. You can now place a 2TB database on one physical disk, it will perform like a dog. Even spread over 5 disks, it will still perfom like a dog compared to the same system spread over 30 150GB disks (allowing for evil raid). Can you buy 200GB spinning-rust disks any more?

I’ve not yet worked on an Exadata system (It’s like Financial Gerbil, – no one wants to hire you until you have experience of Financial Gerbil – the fact that you know more than most about how a database at that volume works, it matters not unless you know the language of Financial Gerbil…) but something I know. If your database is “big” and was “designed” with no consideration for how you would pull the data out and you ask tom-fool questions of the database like “give me all these records filtered by something I never dreamed of filtering on” well, if exadata can filter on that at the storage unit level and push only the flitered result set up the stack, you are on to a winner. If you added indexes and physical grouping to match that query, you would not need that expensive Exadata solution.

I’m not knocking Exadata per se, I think it is a very smart technical solution, but to my mind, to a large extent, it is fixing the “error” that the customer had no idea whatsoever how they want to query their data. Just chuck all that data into some time-based partitioned tables and let’s query whatever on whatever flavours our toast that morning. NO! Think about what you need to know and develop a system to provide what you need to know. Don’t know what you need to know? Get a new job.

14. Friday Philosophy – Why doesn’t Agile work? « Martin Widlake's Yet Another Oracle Blog - September 16, 2011

[…] Agile seems to have borrowed extensively from} to great success. However, after my post last week on database design, many of the comments were quite negative about Agile – and I had not even mentioned it in my […]

15. Uwe Hesse - September 18, 2011

Great post, Martin, really to the point! Whenever I encounter Developers that emphasize their “Database agnostic” aka “Generic” approach, I have difficulties to keep being polite and not telling them too drastically that this is the view of an ignorant who will not leverage the benefits of any Database product. Instead, they get (generic) a bad performance 🙂

But the application was developed fast at least.

16. In Defense of Agile Development (and Their Ilk) « Martin Widlake's Yet Another Oracle Blog - September 21, 2011

[…] jettisons documentation, design and proper testing. This whole thread sprung from my lamenting the demise of database design and several of the comments highlighted that the introduction of Agile seemed to equate, at least […]

17. State of Data #67 « Dr Data's Blog - September 25, 2011

[…] #DBMS – Is Database Design a dying art or a dead art already (interesting comments too) […]

18. Ian Carney - October 11, 2011

I completely agree with Martin – efficient applications require effective database design. As Dominic says it is the best way to deliver performance, and also data integrity with minimum hardware and development costs.

We had the arguments about set vs row at a time processing back in the 80s with COBOL programmers. Java just seems to an re-started these discussions.

You can’t rewrite the laws of physics, Jim

Grumpy Ian

19. mwidlake - October 11, 2011

Thanks Ian,

Set versus row processing and programming – Good programmers get it, bad or under-trained programmers don’t. I guess with each new language you get a new set of the latter sort of programmers.

And you’re no more grumpy than me 🙂

20. Database Design - October 17, 2011

Great post! Awesome perspective on database design.

21. carrière by cocamegadose - Pearltrees - December 11, 2011

[…] Friday Philosophy – The Dying Art of Database Design? « Martin Widlake's Yet Another Oracle Blog do some quick, hacky fixes to get by for now migrate the existing data Also, once you have built your application on top of an inadequate database design, you not only have to redesign it, you have to: transform some of it (do some data duplication or splitting maybe) You see, as far as I am concerned, most IT systems are all about managing data. Think about it. […]


Leave a reply to Martin Berger (@martinberx) Cancel reply