jump to navigation

With Modern Storage the Oracle Buffer Cache is Not So Important. May 27, 2015

Posted by mwidlake in Architecture, Hardware, performance.
Tags: , , , ,

With Oracle’s move towards engineered systems we all know that “more” is being done down at the storage layer and modern storage arrays have hundreds of spindles and massive caches. Does it really matter if data is kept in the Database Buffer Cache anymore?

Yes. Yes it does.

Time for a cool beer

Time for a cool beer

With much larger data sets and the still-real issue of less disk spindles per GB of data, the Oracle database buffer cache is not so important as it was. It is even more important.

I could give you some figures but let’s put this in a context most of us can easily understand.

You are sitting in the living room and you want a beer. You are the oracle database, the beer is the block you want. Going to the fridge in the kitchen to get your beer is like you going to the Buffer Cache to get your block.

It takes 5 seconds to get to the fridge, 2 seconds to pop it open with the always-to-hand bottle opener and 5 seconds to get back to your chair. 12 seconds in total. Ahhhhh, beer!!!!

But – what if there is no beer in the fridge? The block is not in the cache. So now you have to get your car keys, open the garage, get the car out and drive to the shop to get your beer. And then come back, pop the beer in the fridge for half an hour and now you can drink it. That is like going to storage to get your block. It is that much slower.

It is only that much slower if you live 6 hours drive from your beer shop. Think taking the scenic route from New York to Washington DC.

The difference in speed really is that large. If your data happens to be in the memory cache in the storage array, that’s like the beer already being in a fridge – in that shop 6 hours away. Your storage is SSD-based? OK, you’ve moved house to Philadelphia, 2 hours closer.

Let's go get beer from the shop

Let’s go get beer from the shop

To back this up, some rough (and I mean really rough) figures. Access time to memory is measured in Microseconds (“us” – millionths of a second) to hundreds of Nanoseconds (“ns” – billionths of a second). Somewhere around 500ns seems to be an acceptable figure. Access to disc storage is more like Milliseconds (“ms” – thousandths of a second). Go check an AWR report or statspack or OEM or whatever you use, you will see that db file scattered reads are anywhere from low teens to say 2 or 3 ms, depending on what your storage and network is. For most sites, that speed has hardly altered in years as, though hard discs get bigger, they have not got much faster – and often you end up with fewer spindles holding your data as you get allocated space not spindles from storage (and the total sustainable speed of hard disc storage is limited to the total speed of all the spindles involved). Oh, the storage guys tell you that your data is spread over all those spindles? So is the data for every system then, you have maximum contention.

However, memory speed has increased over that time, and so has CPU speed (though CPU speed has really stopped improving now, it is more down to More CPUs).

Even allowing for latching and pinning and messing around, accessing a block in memory is going to be at the very least 1,000 times faster than going to disc, maybe 10,000 times. Sticking to a conservative 2,000 times faster for memory than disc , that 12 seconds trip to the fridge equates to 24,000 seconds driving. That’s 6.66 hours.

This is why you want to avoid physical IO in your database if you possibly can. You want to maximise the use of the database buffer cache as much as you can, even with all the new Exadata-like tricks. If you can’t keep all your working data in memory, in the database buffer cache (or in-memory or use the results cache) then you will have to do that achingly slow physical IO and then the intelligence-at-the-hardware comes into it’s own, true Data Warehouse territory.

So the take-home message is – avoid physical IO, design your database and apps to keep as much as you can in the database buffer cache. That way your beer is always to hand.


Fixing my iPhone with my Backside May 18, 2015

Posted by mwidlake in Hardware, off-topic, Perceptions, Private Life.
Tags: , ,

Working with Oracle often involves fixing things – not because of the Oracle software (well, occasionally it is) but because of how it is used or the tech around it. Sometimes the answer is obvious, sometime you can find help on the web and sometimes you just have to sit on the issue for a while. Very, very occasionally, quite literally.

Dreaded "out of battery" icon

Dreaded “out of battery” icon

Last week I was in the English Lake District, a wonderful place to walk the hills & valleys and relax your mind, even as you exhaust your body. I may have been on holiday but I did need to try and keep in touch though – and it was proving difficult. No phone reception at the cottage, the internet was a bit slow and pretty random, my brother’s laptop died – and then my iPhone gave up the ghost. Up on the hills, midday, it powers off and any attempt to use it just shows the “feed me” screen. Oh well, annoying but not fatal.

However, I get back to base, plug it in…and it won’t start. I still get the “battery out of charge” image. I leave it an hour, still the same image. Reset does not help, it is an ex-iPhone, it has ceased to be.

My iPhone is version 5 model, black as opposed to the white one shown (picture stolen from “digitaltrends.com” and trimmed), not that the colour matters! I’ve started having issues with the phone’s battery not lasting so well (as, I think, has everyone with the damned thing) and especially with it’s opinion of how much charge is left being inaccurate. As soon as it gets down to 50% it very rapidly drops to under 20%, starts giving me the warnings about low battery and then hits 1%. And sometimes stays at 1% for a good hour or two, even with me taking the odd picture. And then it will shut off. If it is already down at below 20% and I do something demanding like take a picture with flash or use the torch feature, it just switches off and will only give me the “out of charge” image. But before now, it has charge up fine and, oddly enough, when I put it on to charge it immediately shows say 40-50% charge and may run for a few hours again.

So it seemed the battery was dying and had finally expired. I’m annoyed, with the unreliable internet that phone was the only verbal way to keep in touch with my wife, and for that contact I had to leave the cottage and go up the road 200 meters (thankfully, in the direction of a nice pub).

But then I got thinking about my iPhone and it’s symptoms. It’s opinion of it’s charge would drop off quickly, sudden drain had a tendency to kill it and I had noticed it lasting less well if it was cold (one night a couple of months ago it went from 75% to dead in 10, 15 mins when I was in a freezing cold car with, ironically, a dead battery). I strongly suspect the phone detects it’s level of charge by monitoring the amperage drop, or possibly the voltage drop, as the charge is used. And older rechargeable batteries tend to drop in amperage. And so do cold batteries {oddly, fully charged batteries can have a slightly higher voltage as the internal resistance is less, I understand}.

Perhaps my battery is just not kicking out enough amperage for the phone to be able to either run on it or “believe” it can run on it. The damn thing has been charging for 2 or 3 hours now and still is dead. So, let’s warm it up. Nothing too extreme, no putting it in the oven or on top of a radiator. Body temperature should do – We used to do this on scout camps with torches that were almost exhausted. So I took it out of it’s case (I have a stupid, bulky case) so that it’s metal case is uncovered and I, yep, sat on it. And drank some wine and talked balls with my brother.

15 minutes later, it fires up and recons it is 70% charged. Huzzah, it is not dead!

Since then I have kept it out it’s case, well charged and, frankly, warm. If I am out it is in my trouser pocket against my thigh. I know I need to do something more long-term but right now it’s working.

I tend to solve a lot of my IT/Oracle issues like this. I think about the system before the critical issue, was there anything already going awry, what other symptoms were there and can I think of a logical or scientific cause for such a pattern. I can’t remember any other case where chemisty has been the answer to a technology issue I’m having (though a few where physics did, like the limits on IO speed or network latency that simply cannot be exceeded), but maybe others have?

Friday Philosophy – Tosh Talked About Technology February 17, 2012

Posted by mwidlake in Friday Philosophy, future, Hardware, rant.
Tags: , ,

Sometimes I can become slightly annoyed by the silly way the media puts out total tosh and twaddle(*) that over-states the impact or drawbacks about technology (and science ( and especially medicine (and pretty much anything the media decides to talk about)))). Occasionally I get very vexed indeed.

My attention was drawn to some such thing about SSDs (solid State Discs) via a tweet by Gwen Shapira yesterday {I make no statement about her opinion in this in any way, I’m just thanking her for the tweet}. According to Computerworld

SSDs have a ‘bleak’ future, researchers say

So are SSDs somehow going to stop working or no longer be useful? No, absolutely not. Are SSDs not actually going to be more and more significant in computing over the next decade or so? No, they are and will continue to have a massive impact. What this is, is a case of a stupidly exaggerated title over not a lot. {I’m ignoring the fact that SSDs can’t have any sort of emotional future as they are not sentient and cannot perceive – the title should be something like “the future usefulness of SSDs looks bleak”}.

What the article is talking about is a reasonable little paper about how if NAND-based SSDS continue to use smaller die sizes, errors could increase and access times increase. That is, if the same technology is used in the same way and manufacturers continue to shrink die sizes. It’s something the memory technologists need to know about and perhaps find fixes for. Nothing more, nothing less.

The key argument is that by 2024 we will be using something like 6.4nm dies and at that size, the physics of it all means everything becomes a little more flaky. After all, Silicon atoms are around 0.28nm wide (most atoms of things solid at room temperature are between 0.2nm and 0.5nm wide), at that size we are building structures with things only an order of magnitude or so smaller. We have all heard of quantum effects and tunneling, which means that at such scales and below odd things can happen. So error correction becomes more significant.

But taking a reality check, is this really an issue:

  • I look at my now 4-year-old 8GB micro-USB stick (90nm die?) and it is 2*12*30mm, including packaging. The 1 TB disc on my desk next to it is 24*98*145mm. I can get 470 of those chips in the same space as the disc, so that’s 3.8TB based on now-old technology.
  • Even if the NAND materials stay the same and the SSD layout stays the same and the packaging design stays the same, we can expect about 10-50 times the current density before we hit any problems
  • The alternative of spinning platers of metal oxides is pretty much a stagnant technology now, the seek time and per-spindle data transfer rate is hardly changing. We’ve even exceeded the interface bottleneck that was kind-of hiding the non-progress of spinning disk technology

The future of SSD technology is not bleak. There are some interesting challenges ahead, but things are certainly going to continue to improve in SSD technology between now and when I hang up my keyboard. I’m particularly interested to see how the technologists can improve write times and overall throughput to something closer to SDRAM speeds.

I’m willing to lay bets that a major change is going to be in form factor, for both processing chips and memory-based storage. We don’t need smaller dies, we need lower power consumption and a way to stack the silicon slices and package them (for processing chips we also need a way to make thousands of connections between the silicon slices too). What might also work is simply wider chips, though that scales less well. What we see as chips on a circuit board is mostly the plastic wrapper. If part of that plastic wrapper was either a porous honeycomb air could move through or a heat-conducting strip, the current technology used for SSD storage could be stacked on top of each other into blocks of storage, rather then the in-effect 2D sheets we have at present.

What could really be a cause of technical issues? The bl00dy journalists and marketing. Look at digital cameras. Do you really need 12, 16 mega-pixels in your compact point-and-shoot camera? No, you don’t, you really don’t, as the optics on the thing are probably not up to the level of clarity those megapixels can theoretically give you, the lens is almost certainly not clean any more and, most significantly, the chip is using smaller and smaller areas to collect photons (the sensor is not getting bigger with more mega-pixels you know – though the sensor size is larger in proper digital SLRs which is a large part of why they are better). This less-photons-per-pixel means less sensitivity and more artefacts. What we really need is maybe staying with 8MP and more light sensitivity. But the mega-pixel count is what is used to market the camera at you and I. As a result, most people go for the higher figures and buy something technically worse, so we are all sold something worse. No one really makes domestic-market cameras where the mega-pixel count stays enough and the rest of the camera improves.

And don’t forget. IT procurement managers are just like us idiots buying compact cameras.

(*) For any readers where UK English is not a first language, “twaddle” and “tosh” both mean statements or arguments that are silly, wrong, pointless or just asinine. oh, Asinine means talk like an ass :-) {and I mean the four-legged animal, not one’s bottom, Mr Brooks}

Will the Single Box System make a Comeback? December 8, 2011

Posted by mwidlake in Architecture, future, Hardware.
Tags: , ,

For about 12 months now I’ve been saying to people(*) that I think the single box server is going to make a comeback and nearly all businesses won’t need the awful complexity that comes with the current clustered/exadata/RAC/SAN solutions.

Now, this blog post is more a line-in-the-sand and not a well researched or even thought out white paper – so forgive me the obvious mistakes that everyone makes when they make a first draft of their argument and before they check their basic facts, it’s the principle that I want to lay down.

I think we should be able to build incredible powerful machines based on PC-type components, machines capable of satisfying the database server requirements of anything but the most demanding or unusual business systems. And possibly even them. Heck, I’ve helped build a few pretty serious systems where the CPU, memory and inter-box communication is PC-like already. If you take the storage component out of needing to be centralise (and this shared), I think that is a major change is just over the horizon.

At one of his talks at the UKOUG conference this year, Julian Dyke showed a few tables of CPU performance, based on a very simple PL/SQL loop test he has been using for a couple of years now. The current winner is 8 seconds by a… Core i7 2600K. ie a PC chip and one that is popular with gamers. It has 4 cores and runs two threads per core, at 3.4GHz and can boost a single core to 3.8 GHz. These modern chips are very powerful. However, chips are no longer getting faster so much as wider – more cores. More ability to do lots of the same thing at the same speed.

Memory prices continue to tumble, especially with smart devices and SSD demands pushing up the production of memory of all types. Memory has fairly low energy demands so you can shove a lot of it in one box.

Another bit of key hardware for gamers is the graphics card – if you buy a top-of-the-range graphics card for a PC that is a couple of years old, the graphics card probably has more pure compute grunt than your CPU and a massive amount of data is pushed too and fro across the PCIe interface. I was saying something about this to some friends a couple of days ago but James Morle brought it back to mind when he tweeted about this attempt at a standard about using PCI-e for SSD. A PCI-e 16X interface has a theoretical throughput of 4000MB per second – each way. This compares to 600MB for SATA III, which is enough for a modern SSD. A single modern SSD. {what I am not aware of is the latency for PCI-e but I’d be surprised if it was not pretty low}. I think you can see where I am going here.

Gamers and image editors have probably been most responsible for pushing along this increase in performance and intra-system communication.

SSD storage is being produced in packages with a form factor and interface to enable an easy swap into the place of spinning rust, with for example a SATA3 interface and 3.5inch hard disk chassis shape. There is no reason that SSD (or other memory-based) storage cannot be manufactured in all sorts of different form factors, there is no physical constraint of having to house a spinning disc. Density per dollar of course keeps heading towards the basement. TB units will soon be here but maybe we need cheap 256GB units more than anything. So, storage is going to be compact and able to be in form factors like long, thin slabs or even odd shapes.

So when will we start to see cheap machines something like this: Four sockets for 8/16/32 core CPUs, 128GB main memory (which will soon be pretty standard for servers), memory-based storage units that clip to the external housing (to provide all the heat loss they require) that combine many chips to give 1Gb IO rates, interfaced via the PCIe 16X or 32X interface. You don’t need a HBA, your storage is internal. You will have multipath 10GbE going in and out of the box to allow for normal network connectivity and backup, plus remote access of local files if need be.

That should be enough CPU, memory and IO capacity for most business systems {though some quote from the 1960’s about how many companies could possible need a computer spring to mind}. You don’t need shared storage for this, in fact I am of the opinion that shared storage is a royal pain in the behind as you are constantly having to deal with the complexity of shared access and maximising contention on the flimsy excuse of “sweating your assets”. And paying for the benefit of that overly complex, shared, contended solution.

You don’t need a cluster as you have all the cpu, working memory and storage you need in a 1U server. “What about resilience, what if you have a failure?”. Well, I am swapping back my opinion on RAC to where I was in 2002 – it is so damned complex it causes way more outage than it saves. Especially when it comes to upgrades. Talking to my fellow DBA-types, the pain of migration and the number of bugs that come and go from version to version, mix of CRS, RDBMS and ASM versions, that is taking up massive amounts of their time. Dataguard is way simpler and I am willing to bet that for 99.9% of businesses other IT factors cause costly system outages an order of magnitude more times than the difference between what a good MAA dataguard solution can provide you compared to a good stretched RAC one can.

I think we are already almost at the point where most “big” systems that use SAN or similar storage don’t need to be big. If you need hundreds of systems, you can virtualize them onto a small number of “everything local”

A reason I can see it not happening is cost. The solution would just be too cheap, hardware suppliers will resist it because, hell, how can you charge hundreds of thousands of USD for what is in effect a PC on steroids? But desktop games machines will soon have everything 99% of business systems need except component redundancy and, if your backups are on fast SSD and you a way simpler Active/Passive/MAA dataguard type configuration (or the equivalent for your RDBMS technology) rather than RAC and clustering, you don’t need that total redundancy. Dual power supply and a spare chunk of solid-state you can swap in for a failed raid 10 element is enough.


Get every new post delivered to your Inbox.

Join 185 other followers