jump to navigation

Friday Philosophy – Are Good IT People Just Lucky Starters? April 22, 2022

Posted by mwidlake in database design, Friday Philosophy.
Tags: ,
4 comments

There is a tradition in IT that older members of the community complain that “the youngsters of today” don’t really understand how to design systems, develop code, and test applications properly. It was like this in the late 80’s when I started. It was a common theme of discussion at the end of the 90’s. The 2000’s seemed to me to be endless carping about development frameworks that simplified things to the lowest common (poorly performing) denominators. And it seems to be a big part of what us oldies do on the likes of twitter or at conferences now – or at least did when we had proper conferences pre-plague.

One aspect of IT certainly has changed over the decades and that has been a shift in the layer(s) where most professional IT people need to concentrate to get the best out of a system. Back in the late 80’s the oldsters were saying you had to understand registers, memory pointers, the physical constraints of the hardware to get decent performance, and use a low-level programming language like C or even assembler. As time has gone on the move has been up the technology stack, leaving the lower levels like memory management and how interpreters work as either solved or looked after by a tiny set of people.

By the end of the 90’s it was all Middleware and application servers took most systems and thus development off the database servers or mainframes. software architecture and frameworks were supposed to remove the need to worry about *how* the data was got, and more modern languages meant you did not have to worry about memory maintenance, freeing resources etc.

One thing that has not changed (IMHO) but is now often overlooked is the physical placement of data, data structures, and how the storage works. I can often vastly improve the performance of applications or specific pieces of SQL by doing what we used to do in the 1990’s – storing data in a manner that makes accessing it require less IO, putting related data together. Some of us old database types moan endlessly about places not doing proper database design anymore.

But sometimes I wonder if the actual areas of focus moaned about by the old guard are not that significant. The thing they (and now I and many of my friends) also complain about is actually more significant – and that is our attitude towards the process of designing and building systems.

I was not trained in IT at college, I studied biology. But I was trained when I came into the industry, in SSADM, database design, structured testing methodologies, even logical problem solving. And I was lucky to work with people, mostly those bitter old people, who spent as much time teaching me why you do things in a certain way and not just the syntax of a language or which framework to use or which source control repository was flavour of the year.

Looking back over my whole career I’ve encountered a huge number of very average or even poor developers/DBAs/analysts/system designers. And yet some people from every generation are much better than their compatriots, even though they have similar backgrounds and experience. And often these very good people are not apparently any more generally intelligent than the not good ones. Thinking of those good people, I believe a common trait across them is that they listened to the old people complaining and, though they ignored a lot of it, they learnt to ask why things work the way they do and to try and understand the overall technical architecture of computer systems – and not just their little bit. They learnt to consider the whole system, even if some of it was only understood at a simplistic level.

I’m not sure that this curiosity about how things work and the need to look at the wider picture is taught at college (and these days most people coming into IT have come from a college course that either focuses on computing or is a main component of it). If it was taught, wouldn’t it be more common?

I think we learn how to solve problems and design systems from those around us who have done it for real, many times. And that’s those moaning old buggers. But it’s not understanding a layer of technology that is important but actually the total opposite – understanding the wider picture.

I think most people who are good at IT, no matter what their age, were blessed by early exposure to talented (but miserable, complaining) people who simply did things in a sensible, holistic way and asked “why?” a lot. We were lucky.

If you think this whole article is just a plea to listen to me when I complain about the youth of today, you may have a point. But, if you do listen, I might (probably by accident) teach you something that helps your career for years to come. It’s not so much how smart you are but your attitude which makes what you work on a success.

Index Organized Tables – the Basics. July 18, 2011

Posted by mwidlake in development, internals, performance.
Tags: , , , ,
36 comments

..>IOT2 – Examples and proofs
….>IOT3 – Greatly reducing IO with IOTs
……>IOT4 – Boosting Buffer Cache Efficiency
……..>IOT5 – Primary Key issues
……….>IOT6a – Slowing Down Insert
…………>IOT6(B) – OLTP Inserts

I think Index Organized Tables(IOTs) are a much under-used and yet very useful feature of Oracle. Over the next few postings I’m going to cover some aspect of Index Organised Tables, both good and not-so-good. I am going to cover some benefits of IOTs that I think many people are unaware of. In this first post I am just going to run through the basics of IOTs.

The idea behind an IOT is simple. You hold all the data for the table in the ordered structure of an index. Why would you want to do that? Let us consider a very common requirement, accessing a row in a “large” table via a known, unique key.

Traditionally you have a heap table holding the data you want to access and a standard index to support access to that table. See the first diagram below. The 4-layer triangle represents the index, with a root block, two levels of branch blocks and then the leaf blocks at the “bottom”. The blue rectangle represents the table with the squares being individual rows. Of course, in a large table there would be thousands or millions of “squares”, this is just a simple diagram to show the idea.

When you issue a SQL statement to select the row via the indexed column(s) then oracle will read the root block (1), find the relevent block in the first level of branch blocks (2), then the relevant block in the second level of branch blocks (3) and finally (as far as the index is concerned) the relevant Leaf Block for the unique key. The leaf block holds the indexed column(s) and also the rowid. The rowid is the fastest way to look up a record, it states the file, block and row offset for the row. This allows oracle to go straight to the block and get the row. That is read number (5).
The number of branch blocks {and thus the number of blocks that need to be read to find a row} will vary depending on how much data is indexed, the number and size of the columns in the index, how efficiently the space has been used in the blocks and one or two other factors. In my experience most indexes for tables with thousands or millions of rows have one, two or three levels of branch blocks.

The second diagram shows a representation of the Index Organized Table. The table has in effect disappeared as a distinct object and the information has been moved into the leaf blocks of the index {part of me feels Index Organized Tables should really be called Table Organized Indexes or Table Containing Indexes as that would better indicate what is physically done}:

So with the IOT oracle reads the root block (1), the two branch level blocks (2 and 3) and finally the leaf block (4). The leaf block does not hold the rowid but rather the rest of the columns for the table {this can be changed, a more advanced feature allows you to store some or all the extra columns in an overflow segment}. Thus to access the same data, Oracle has to read only 4 blocks, not 5. Using an IOT saves one block read per unique lookup.

This saving of block reads is probably the main feature that IOTs are known for, but there are others which I will cover in later posts. Two things I will mention now is that, firstly, the use of IOTs is potentially saving disc space. An index is in effect duplication of data held in the table. When you create an index no new information is created but space is used up holding some of the table information in a structure suitable for fast lookup. Secondly, the index and table have to be maintained whenever a change is made to the columns that are indexed. IOTs reduce this maintenance overhead as there is only one thing to maintain.

Now for some drawbacks.

  • The IOT has to be indexed on the primary key. There is no option to create an IOT based on other indexes. As such you have to either be accessing the table via the primary key to get the benefit – or you have to be a little cunning.
  • The index is going to be larger than it was and very often larger than the original table. This can slow down range scans or full scans of the index and a “full table scan” will now be a full index scan on this large object, so that can also negatively impact performance. However, if a range scan would then have resulted in access to the table to get extra columns, the IOT gives a similar benefit in reducing IO to that for single row lookups.
  • I just want to highlight that you now have no rowid for the rows.
  • Secondary indexes are supported but will potentially be less efficient due to this lack of rowid.

So, a brief summary is that Index Organised Tables effectively move the table data into the Primary Key index, reduce the number of block lookups needed to select one row, can save some disc space. But you can only organize the table via the Primary Key and it can make full or partial table scans and lookups via other indexes slower.

There are several more important benefits to IOTs {in my opinion} which I will come to over the next week or two.