jump to navigation

Why oh Why Do We Still Not Have a Fast Bulk “SQL*Unloader” Facility? December 1, 2016

Posted by mwidlake in Architecture, database design, performance.
Tags: , , ,
trackback

Way back in 2004 I was working at the UK side of the Human Genome project. We were creating a massive store of DNA sequences in an Oracle database (this was one of two world-wide available stores of this information, for free & open use by anyone {* see note!}). The database was, for back then, enormous at 5-6TB. And we knew it would approx double every 12 months (and it did, it was 28TB when I had to migrate it to Oracle 10 in 2006, over 40TB 6 months later and grew to half a petabyte before it was moved to another organisation). And were contemplating storing similar massive volumes in Oracle – Protein, RNA and other sequence stores, huge numbers of cytological images (sorry, microscope slides).

I did my little bit to work out how we all tick

I did my little bit to work out how we all tick

Significant chunks of this data were sometimes processed in specialist ways using Massively Parallel Processing farms (hundreds or thousands of compute nodes so we could do “2 years of compute” in a day or even a few hours). Oracle was never going to be able to support this within the database.

But Oracle had no fast, bulk data unloading offering, a “SQL*Unloader”. Other RDBMSs did. And that gave me an issue. I had to argue as to why hold this data in Oracle where it was hard to yank out chunks of it rather than keep it in say a simple file store or MySQL?

That year I was invited to Oracle Open World and I was given a slot with a senior VP in the RDBMS/OCFS area (as that is what were using – I know! But we were). I had been warned by someone that he could be a little “robust”. In fact my friend called him “Larry’s Rottweiler”. However my chat with the SVP went fine. Until he asked me for the one thing I would really want and I said “well, it’s been promised for the last 2 or 3 versions of Oracle, but it still does not exist in V10 – I’d like a SQL*Unloader so we can rapidly extract bulk data from Oracle into a structured file format”. He went very cold. I continued: “We need to do really intensive processing of the data with ‘C’ and if it is locked into the database we can’t do that. It’s hard for me to justify using Oracle as our primary data store if the data is now stuck there…”

I honestly thought he would bite!

I honestly thought he would bite!

He went ballistic. He was furious, he was raising his voice, leaning forward, I honestly thought he might hit me. I can’t remember the whole rant but one line stuck: “We will NEVER help you get your data out of Oracle!”. I ended the discussion as soon as I could – which obviously suited him fine also.

And here we are over 10 years later and no SQL*Unloader has appeared. And, as far as I can see, there is no longer even a suggestion that “one day” one will exist. You can obviously get data out of Oracle using SQL*Plus, SQLcl or PL/SQL’s UTL_FILE. But you have to write your own code and to make it parallel or fast is not that simple. And there are some commercial tools out there to do it. But fundamentally we do not have a simple, robust & fast data unloader tool within the Oracle toolset.

But I remain mystified as to why Oracle Corp will not provide a tool to do this. I can see the argument that doing so would help you migrate your Oracle Data set to a different platform and thus move away from Oracle, but I think that is a dumb argument. If you are willing to dump your Oracle environment for something else you are already not happy – and making life difficult is only going to make you less happy and almost certain to never come back! It’s incredibly inconvenient to extract all your data at present but compared to the cost to your organisation of changing a central part of your infrastructure, it’s just a bloody annoyance. It’s like domestic service providers (telephone, internet, TV) that make it easy for you to sign up but a nightmare to leave. Thus guaranteeing that once you leave (cos you will) you will certainly NOT be going back to them in a hurry!

So for the sake of what I see as a misplaced protectionist stance they close the door to rapidly extracting data from Oracle databases for processing in other ways. I’ve come across other situations like this, before and since, but the Human Genome issue threw it into sharp relief for me. The end result I’ve seen a few times (and effectively what we had at the Sanger) is the data gets stored at least twice – once in Oracle and then somewhere else where it is easy to access. Now that’s a daft and bad place to be, multiple copies and one of them at least lacking RI. Thanks for forcing that one on us Oracle.

Something that is changing is Cloud. Oracle wants us all to move our data and systems into the sky. Their whole marketing message at the moment is nothing but cloud-cloud-cloud and they give the impression that cloud is in fact everything, every solution. So how do you get 10TB of data into the cloud? It’s not so hard really. You can trickle it in over time, after all networks are constantly getting faster, but for bulk data this is a poor solution. Or you can ship your data physically. You can’t beat the bandwidth of a transit van. I’ve heard Oracle people at OOW and other conferences saying this is easy-peasy to arrange (as it should be, I’ve done it for clients doing migrations a couple of times).

But how are you going to get it out again? How well do you think that a company that has never provided SQL*Unloader is going to help you get bulk data back out of your Oracle cloud-based databases into on-premises systems? And how well are they going to help you shift it from one of their systems to another cloud system? One they have not sold you? Sometimes people talk about business relationships being like a marriage. Well once you gave gone cloud, you might have ensured any divorce is going to be really messy!

Update – see the comment by Noons on the fact that VLDBs and Data Warehouses are often not as static as you think – so there can be a constant need to move a lot data into and out of such very large data stores. Cloud is not currently going to support this well. And if you have a data life-cycle management policy, say archiving off data 3+ years old. Where are you archiving it off to and how will you do this once it is in the web?

* Please note, this was not human DNA sequence information gathered for medical reasons! It was mostly sequence information for other organisms (chimps, flies, potatoes, rice, viruses, bacteria all sorts) generated by the scientific and, in some cases, biopharma community. There was human DNA sequence in there but by consent and anonymised – you can’t have a human genome sequence without involving some humans! We did not hold your Dodgy Uncle’s DNA test that was taken for that court case…

Advertisements

Comments»

1. jarneil - December 1, 2016

That is a very astute observation. Yesterday I was extremely impressed with Amazon “Snowmobile”: https://aws.amazon.com/snowmobile/ for moving bulk data to the cloud.

It makes it sound so easy to get huge quantities of data in (and it probably is) but think of what you said about moving to another provider. I read the page about snowmobile quite carefully and it sounds like a one way deal to me.

This applies to any cloud vendor, if you move enough quantity of data to them are you effectively locking yourself in?

mwidlake - December 1, 2016

That does look impressive – and now I know what truck to hijack to try and get someone to pay to have their system back 🙂

I looked at their “Snowball” offering too : https://aws.amazon.com/snowball/
I fist thought it was the size of an external hard disc enclosure and wondered how it could hold so much – but at “under 50 pounds” it’s a PC case full of storage, ain’t it!

Just one thought. When I’ve physically moved storage I’ve always sent at least 2 copies, in case of delay or disaster. So would you need 2 trucks or just accept it’s REALLY well looked after (and does not get stuck in snow for a week…)

2. sydoracle1 - December 2, 2016

We’re using SQOOP to move data off Oracle RDBMS onto Hadoop (in our case an Oracle Big Data Appliance, but that’s not significant except that it is all connected on Infiniband pipes).

It’s possible that the Oracle attitude will change if they get enough BDA customers wanting to move data between Oracle products.

3. Noons - December 2, 2016

The interesting thing we found about the “cloud everywhere” thing is that moving the data in and out of it costs a motza. Which in the case of very large data repositories – your DNA case is a good example – is a problem at the start and later on if one wants to move again.
But there is an aspect of this that few are talking about: data is not stale! It needs to be updated often to stay relevant!
Eg: Datawarehouses. A lot of folks think they are mostly read-only. Really? How do they expect that data to be there and stay relevant? Spontaneous generation? Quantum soup filtering? Of course not.
Now, try to constantly pump large volumes in and out of a cloud installation and then come back to me with the “cloud is cheaper” nonsense! We costed it and it turns out to most definitely not be cheaper!
Small “inconvenient” details that few talk about.
Just like offloading data from an Oracle db.
Thanks for bringing this subject up, most useful.

mwidlake - December 2, 2016

Thank you for highlighting a really good point Noons.

4. Andrey Chervonets (@AChervonets) - December 9, 2016

>> And if you have a data life-cycle management policy, say archiving off data 3+ years old. Where are you archiving it off to and how will you do this once it is in the web?

I am not sure about the Cloud services, but on premise – data-pump and event legacy exp – can do this work.
Fast?! It depends on project requirements.

Why we still have no Unloader? Oracle need to sell Golden Gate, etc. 🙂

Oracle does not have some other things too. For example segments space usage statistics. We had to invent CoMinder SpaceStat to track how our tables/indexes are growing.
There may be other things customer needed, but were afraid to request. Who knows?!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: