jump to navigation

Exclusion of Unioned SQL in Views – Followup August 20, 2012

Posted by mwidlake in database design, performance, SQL.
Tags: , ,
add a comment

Last week I put up a post about how Oracle can filter out sections of a union view..

Within the comments I put up another example where the CBO did not filter out all but one of the Union views despite my replicating the exact WHERE clause of one of the unioned statements. Jonathan Lewis posted a followup to say “Oracle can be very touchy about how it plays this game” and made a prediction of how the CBO would handle a slightly different scenario.

This was the extra scenario and I include brief details on creating the unioned view too. NB all on Oracle 11.2.0.2. {non-Exadata :-) }

  1  create table mdw_source
  2  tablespace users
  3  as select * from dba_objects

  1  create or replace view mdw_v1 as
  2  select
  3    '1'              src_type
  4   ,ms.OWNER ,ms.OBJECT_NAME,ms.SUBOBJECT_NAME ,ms.CREATED
  5  from mdw_source ms
  6  where ms.object_name like 'A%'
  7  union all
  8  select
  9    '2'              src_type
 10   ,ms.OWNER ,ms.OBJECT_NAME,ms.SUBOBJECT_NAME ,ms.CREATED
 11  from mdw_source ms
 12  where ms.object_name like 'B%'
 13  UNION ALL
 14  select
 15    '3'              src_type
 16   ,ms.OWNER ,ms.OBJECT_NAME,ms.SUBOBJECT_NAME ,ms.CREATED
 17  from mdw_source ms
 18  where ms.object_name like 'C%'

1 select count(created)
2 from mdw_v1 mv
3 where mv.object_name like ‘C%’

Execution Plan
———————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————–
| 0 | SELECT STATEMENT | | 1 | 75 | 551 (1)| 00:00:07 |
| 1 | SORT AGGREGATE | | 1 | 75 | | |
| 2 | VIEW | MDW_V1 | 824 | 61800 | 551 (1)| 00:00:07 |
| 3 | UNION-ALL | | | | | |
|* 4 | TABLE ACCESS FULL| MDW_SOURCE | 1 | 34 | 184 (1)| 00:00:03 |
|* 5 | TABLE ACCESS FULL| MDW_SOURCE | 2 | 68 | 184 (1)| 00:00:03 |
|* 6 | TABLE ACCESS FULL| MDW_SOURCE | 821 | 27914 | 184 (1)| 00:00:03 |
———————————————————————————–

Predicate Information (identified by operation id):
—————————————————
4 – filter(“MS”.”OBJECT_NAME” LIKE ‘A%’ AND “MS”.”OBJECT_NAME” LIKE ‘C%’)
5 – filter(“MS”.”OBJECT_NAME” LIKE ‘B%’ AND “MS”.”OBJECT_NAME” LIKE ‘C%’)
6 – filter(“MS”.”OBJECT_NAME” LIKE ‘C%’)

Statistics
———————————————————-
8 recursive calls
0 db block gets
2417 consistent gets 

So, as you can see, despite me providing the WHERE clause in my final select as “where mv.object_name like ‘C%’” and the WHERE clauses of the unioned view are all similar and one is “ms.object_name like ‘C%’” the CBO fails to realise it can exclude all but one of the unioned SQL statements – despite the mutally exclusive filter predicates:

4 – filter(“MS”.”OBJECT_NAME” LIKE ‘A%’ AND “MS”.”OBJECT_NAME” LIKE ‘C%’)
5 – filter(“MS”.”OBJECT_NAME” LIKE ‘B%’ AND “MS”.”OBJECT_NAME” LIKE ‘C%’)

I have to confess, in light of recent experience I would have wondered if these filter predicates would prevent the other two SQL statements being fired at execution time but the number of consistent gets confirms that it is not – 2417 compared to 805 or so for when the query exclusion occurs. You can check back to the prior post to verify that.

So, let’s do the tests Jonathan suggested. He suggested altering the view to use equality predicates so I have recreated the view as such:

TEST_DB> create or replace view mdw_v1 as
  2  select
  3    '1'              src_type
  4    ,ms.OWNER ,ms.OBJECT_NAME ,ms.SUBOBJECT_NAME ,ms.CREATED
  5  from mdw_source ms
  6  where substr(ms.object_name,1,1) ='A'
  7  union all
  8  select
  9    '2'              src_type
 10   ,ms.OWNER ,ms.OBJECT_NAME ,ms.SUBOBJECT_NAME ,ms.CREATED
 11  from mdw_source ms
 12  where substr(ms.object_name,1,1) ='B'
 13  UNION ALL
 14  select
 15    '3'              src_type
 16   ,ms.OWNER ,ms.OBJECT_NAME ,ms.SUBOBJECT_NAME ,ms.CREATED
 17  from mdw_source ms
 18  where substr(ms.object_name,1,1) ='C'

View created.

TEST_DB> set autotrace on
TEST_DB> --
TEST_DB> -- below will do a select of a non-indexed column across the view
TEST_DB> select count(created) from mdw_v1

COUNT(CREATED)
--------------
          1288

Execution Plan
-----------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |     1 |     9 |   552   (1)| 00:00:07 |
|   1 |  SORT AGGREGATE      |            |     1 |     9 |            |          |
|   2 |   VIEW               | MDW_V1     |  1677 | 15093 |   552   (1)| 00:00:07 |
|   3 |    UNION-ALL         |            |       |       |            |          |
|*  4 |     TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  5 |     TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  6 |     TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   5 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   6 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       2417  consistent gets


TEST_DB> -- below will do a select of a non-indexed column
TEST_DB> -- specific to one of the Union's in the view via the src_type
TEST_DB> select count(created)
  2  from mdw_v1
  3  where src_type='3'

COUNT(CREATED)
--------------
           129

Execution Plan
------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |     1 |    12 |   184   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE       |            |     1 |    12 |            |          |
|   2 |   VIEW                | MDW_V1     |   561 |  6732 |   184   (1)| 00:00:03 |
|   3 |    UNION-ALL          |            |       |       |            |          |
|*  4 |     FILTER            |            |       |       |            |          |
|*  5 |      TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  6 |     FILTER            |            |       |       |            |          |
|*  7 |      TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  8 |     TABLE ACCESS FULL | MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(NULL IS NOT NULL)
   5 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   6 - filter(NULL IS NOT NULL)
   7 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   8 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        807  consistent gets


TEST_DB> -- below will do a select of a non-indexed column
TEST_DB> -- specific to one of the Union's in the view via matching the where clause
TEST_DB> select count(created)
  2  from mdw_v1
  3  where substr(object_name,1,1) ='C'

COUNT(CREATED)
--------------
           129

Execution Plan
------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |     1 |    75 |   184   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE       |            |     1 |    75 |            |          |
|   2 |   VIEW                | MDW_V1     |   561 | 42075 |   184   (1)| 00:00:03 |
|   3 |    UNION-ALL          |            |       |       |            |          |
|*  4 |     FILTER            |            |       |       |            |          |
|*  5 |      TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  6 |     FILTER            |            |       |       |            |          |
|*  7 |      TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  8 |     TABLE ACCESS FULL | MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(NULL IS NOT NULL)
   5 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   6 - filter(NULL IS NOT NULL)
   7 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   8 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        807  consistent gets


TEST_DB> -- below will do a select of a non-indexed column
TEST_DB> -- specific to one of the Union's in the view via NOT matching the where clause
TEST_DB> -- but maybe logical exclusion can be managed
TEST_DB> select count(created)
  2  from mdw_v1
  3  where object_name ='C'

COUNT(CREATED)
--------------
             0

Execution Plan
-----------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |     1 |    75 |   551   (1)| 00:00:07 |
|   1 |  SORT AGGREGATE      |            |     1 |    75 |            |          |
|   2 |   VIEW               | MDW_V1     |     3 |   225 |   551   (1)| 00:00:07 |
|   3 |    UNION-ALL         |            |       |       |            |          |
|*  4 |     TABLE ACCESS FULL| MDW_SOURCE |     1 |    34 |   184   (1)| 00:00:03 |
|*  5 |     TABLE ACCESS FULL| MDW_SOURCE |     1 |    34 |   184   (1)| 00:00:03 |
|*  6 |     TABLE ACCESS FULL| MDW_SOURCE |     1 |    34 |   184   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("MS"."OBJECT_NAME"='C' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   5 - filter("MS"."OBJECT_NAME"='C' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   6 - filter("MS"."OBJECT_NAME"='C' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       2417  consistent gets

TEST_DB> -- below will do a select of a non-indexed column
TEST_DB> -- specific to one of the Union's in the view via NOT matching the where clause
TEST_DB> -- but logically it is the same.
TEST_DB> select count(created)
  2  from mdw_v1
  3  where object_name like 'C%'

COUNT(CREATED)
--------------
           129

Execution Plan
-----------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |     1 |    75 |   551   (1)| 00:00:07 |
|   1 |  SORT AGGREGATE      |            |     1 |    75 |            |          |
|   2 |   VIEW               | MDW_V1     |    24 |  1800 |   551   (1)| 00:00:07 |
|   3 |    UNION-ALL         |            |       |       |            |          |
|*  4 |     TABLE ACCESS FULL| MDW_SOURCE |     8 |   272 |   184   (1)| 00:00:03 |
|*  5 |     TABLE ACCESS FULL| MDW_SOURCE |     8 |   272 |   184   (1)| 00:00:03 |
|*  6 |     TABLE ACCESS FULL| MDW_SOURCE |     8 |   272 |   184   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("MS"."OBJECT_NAME" LIKE 'C%' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   5 - filter("MS"."OBJECT_NAME" LIKE 'C%' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   6 - filter("MS"."OBJECT_NAME" LIKE 'C%' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       2417  consistent gets

Above, I create the view, do a quick select count(*) to cause a full scan of all three section in the union view, then do selects using various WHERE predicates to see what sections of the unioned SQl statements are excluded.

The results are very intersting. The filtering on sub_type still works fine, but now the WHERE predicate “where substr(object_name,1,1) =’C’” allows the CBO to exclude sections of the union, the “Null is not null” filters appear and the consistent gets is 807. The WHERE predicate “where object_name =’C'” which at first glance you might think to be logically the same but is not (we selected object_names where only the first character is checked but the whole object_name is selected) acts does not do any exclusion and I thought it might. There are no OBJECT_NAMES of ‘C’ but logically as a human you can see that such an object name could not be found by the first two unioned SQL statements in any case.

The last test is most interesting. The WHERE clause of the SQL select over the view is “where object_name like ‘C%’” and is logically the same as the one WHERE clause in the unioned view “where substr(object_name,1,1) =’C’“. So no exclusion occurs and again we see 2417 consistent gets.

It would seem the CBO Pixies either have not coded for that or, as Jonathan says Oracle can be “touchy” about this feature. Bottom line, if you are expecting this sort of unioned view SQL exclusion to occur – TEST!

Dropped Tables, Hiding Extents and Slow DBA_FREE_SPACE Queries February 16, 2012

Posted by mwidlake in database design, internals, performance.
Tags: , , ,
8 comments

My last post was on slow dictionary queries caused by having many, many thousands of extents in your database. This post is about a special case of this problem, which I encountered recently. It was on an Exadata box – Exadata is not particularly relevant to the issue, but I’m curious to see if mentioning Exadata will make this post more popular :-)

I was querying the used and free space on a very small database on the client’s X2-2 1/4 rack Exadata machine. The query was really slow, taking about 30 seconds. This is a FAST box, what is going on?

I quickly realised that the problem was specific to one tablespace:

>@SPC_SUM
Enter the tablespace (or leave null)> DATA_01

TS_NAME              ORD      SUM_BLKS        SUM_K  MAX_CHNK_K NUM_CHNK
-------------------- ----- ----------- ------------ ----------- --------
DATA_01              alloc     262,144    2,097,152   2,097,152        1
                     free       63,128      505,024     504,384       11
2 rows selected.
Elapsed: 00:00:00.21

> @SPC_SUM
Enter the tablespace (or leave null)> USERS

TS_NAME              ORD      SUM_BLKS        SUM_K  MAX_CHNK_K NUM_CHNK
-------------------- ----- ----------- ------------ ----------- --------
USERS                alloc     748,320    5,986,560   5,372,160        2
                     free      127,904    1,023,232       6,144    3,058
2 rows selected.

Elapsed: 00:00:26.05

We can see it takes 0.21 seconds for tablespace DATA_01, 26.05 seconds for the USERS. Yes, USERS is larger but not 150 times larger. What is going on? Well, as the title of this post suggests, it is down to dropping tables. The below shows my route to that conclusion.

What I could see was that for the USERS tablespace the maximum chunk of free space was relatively small compared to the sum of free space – 6MB out of 1GB, pretty much – and that there was a lot of individual pieces of free space, 3,058. This tablespace was shattered into a lot of bits. So, what sort of extent size management do we have? How big are the tables and indexes in this tablespace {NB I already knew we had no partitions so I did not have to worry about that}.

select tablespace_name,initial_extent,next_extent
      ,extent_management,min_extlen,allocation_type
from dba_tablespaces where tablespace_name ='USERS'

TABLESPACE_NAME                INITIAL_EXTENT NEXT_EXTENT EXTENT_MAN MIN_EXTLEN ALLOCATIO
------------------------------ -------------- ----------- ---------- ---------- ---------
USERS                                   65536             LOCAL           65536 SYSTEM

select tablespace_name,blocks,count(*) from dba_extents
where tablespace_name = 'USERS'
group by tablespace_name,blocks
having count(*) >1
order by blocks desc,tablespace_name

TABLESPACE_NAME                    BLOCKS   COUNT(*)
------------------------------ ---------- ----------
USERS                                2560          2
USERS                                2048          3
USERS                                1536          7
USERS                                1408          5
USERS                                1280          2
USERS                                1248          2
USERS                                1152          2
USERS                                1024        229
USERS                                 896         15
USERS                                 768         21
USERS                                 736          3
USERS                                 720          3
USERS                                 704          2
USERS                                 672          2
USERS                                 640         25
USERS                                 624          2
USERS                                 576          2
USERS                                 512        117
USERS                                 400          2
USERS                                 384         34
USERS                                 360          2
USERS                                 312          2
USERS                                 288          4
USERS                                 256         49
USERS                                 248          2
USERS                                 240          2
USERS                                 192          5
USERS                                 160          4
USERS                                 128       1165
USERS                                   8       1788

30 rows selected.

So we have system controlled extent allocation, this should be fine. I did a quick check of the table contents – 300 or so tables and 200 or so indexes. And, as you can see from the check of extents above, very few larger extents and a lot of small ones. {As a little aside, note the relatively large number of 8-block, 128-block, 256-block and 1024-block extents; These are the sizes that LMTs with system managed extents tend to use unless a similar sized chunk is found to be available, in which case it is used}.

I did some other checks on indexes and segment sizes and it all boiled down to one thing. There were not a lot of things in this tablespace, what was there was small and… The total of all those table and index blocks was way less than the used space in the tablespace.

That is the major indicator of what is going on here. The difference in space used in the tablespace and the total of the visible tables and indexes.

The last link in the chain is the recycle bin.

select owner,ts_name,count(*) from dba_recyclebin group by owner,ts_name

OWNER                          TS_NAME                          COUNT(*)
------------------------------ ------------------------------ ----------
USER1                          USERS                              542356
USER1                                                                  2
WEGWEGWEG                      USERS                                  97
KKKUKUYLLX                     USERS                                 149
USOVFPKEKS                     USERS                                   3
....
ERHJTRTTTURT                   USERS                                   4

11 rows selected.

That’s 542,356 objects in the recyclebin for one user, in the one tablespace. My problem tablespace. The penny clicked, something I had already noticed and was a little uncomfortable about fell into place.

The client is using Oracle Data Integrator (ODI) to pull data together and put it into the database. The process they are using basically creates a table, uses it to load some data into and then pushes the data into the target tables. Then drops the table. The drop does not do a “DROP TABLE … PURGE;”. This is done many, many times per load cycle, which is intended to run several times a day.

Something you should always keep in mind with the recyclebin is that the tables, indexes and their extents that go into the recycle bin do not get reported in several of the key data dictionary views. I did mention this in a blog post way back but maybe a whole post on it is called for.

So, the tablespace was being shattered by the constant creating and dropping of small tables. It was hidden from easy view due to how the recyclebin is (not) exposed in data dictionary views.

It is not good practice to constantly create and drop lots of tables. As well as the admittedly rather odd impact that this posting is all about, there is a lot of internal work involved for the oracle database in creating and dropping tables. It is an overhead best avoided. The client had good reasons for this approach but now they are going to look at the alternatives.

I’ve not actually proven in this posting that all those dropped tables is the actual cause of the slow performance in querying the free space in that tablespace {though the corroborative evidence is very strong}. Neither have I said how we cleared up the mess. I’ll cover both of those in the next post on this mini-thread.

Oracle Nostalgia December 15, 2011

Posted by mwidlake in database design, development.
Tags: ,
16 comments

When preparing the material for my “Oracle Lego – an introduction to Database Design” presentation for the UKOUG last week, I was looking back at my notes from a course on the topic from “a few years back”. There were a few bits which made me smile.

Oracle’s [SQL] implementation conforms to ANSI standard, although referential integrity will not be enforced until version 7

Any other old geezers having flashbacks? I am so glad my first major Oracle development project swapped to using a Beta of V7 very early, so we had the integrity turned on during most of development. I had to help a few projects go from V6 to V7 and turn on the RI – it was usually very, very painful. Or impossible. I always think back to those nightmare experiences when some bright spark suggests turning off referential integrity for “ease of development” or “performance” reasons. There are good performance reasons for altering how you implement RI but, as I said during my presentation on database design, I have never, ever, ever seen a system with RI turned off that did not have damaged data.

Oracle’s optimiser is rule-based. Designing efficient queries involves taking advantage of the optimiser behaviour

.

You can tell this course was run in the UK due to the lack of ‘z’ in ‘optimiser’ :-). How many of use can now make a stab at the seven or eight significant rules from the 15 (16, 17 as versions advanced) in the list? Several rules were to do with Clusters so you did not care. Let’s think, what were the main things to keep in mind…

  • most significant table last in the FROM clause and order upwards in the order you wanted to visit {most significant being the one you felt you could most efficiently do the first filter against}
  •  WHERE clauses ordered downwards in the order you wanted them to be applied.
  • Order of preference to identify a row was something like ROWID, primary key, unique key, full non-unique key, partial unique key, partial non-unique key, full index scan, full table scan.
  • Disable index access by adding 0 to numeric columns and concatinating null to varchars.

I’ve not checked back in the manuals (I have a set of the V7 on my laptop) so I’m probably wrong.

Storage….selecting suitable values for storage parameters … will improve the final performance of the database

Considering the “suitable values for storage parameters” was perhaps my first real conscious step into being a performance/design guy {I was lucky to be on a project where designing for the RBO and matching those rules was just part of being any developer}, but the calculating of rows-per-block, initial and next extent, pctincrease (not always zero you know), initrans/maxtrans, segment to tablespace size… I learnt all about that and had spreadsheets for it all.

Now of course, all of the above about storage (and RBO) has pretty much disappeared. Oracle has made some of the contents of my brain redundant.

But some things have not changed at all in 18 years:

Users can be relied upon to know what they do NOT want, not what they want, which {unfortunately} is the premise from which analysis starts)

I think the above is the fundamental issue from which all iterative design methodologies spring. ie do not believe what the user says they want, show them something and fix it. It is probably human nature that we are not well able to express what we want but have no problem pointing out something is not at all what we want :-). Add in all the issues in respect of forgetting about the exceptions, assumed knowledge, incompatible vocabularies (the words your users say to you are as confusing as the techno-babble you fire back at them) and all analysis is fundamentally flawed.

Do some analysis – but then prototype like crazy. With real users.

Lack of Index and Constraint Comments November 24, 2011

Posted by mwidlake in Architecture, database design, development.
Tags: , , , ,
10 comments

Something I’ve just reminded myself of is that under Oracle you cannot add a comment on an index or a constraint. You can only add comments on tables, views, materialized views, columns of those object types and a couple of esoteric things like Operators, Editions and Indextypes.

Here is an example of adding comments to tables and columns:

set pause off feed off
drop table mdw purge;
create table mdw(id number,vc1 varchar2(10));
comment on table mdw is 'Martin Widlake''s simple test table';
comment on column mdw.id is 'simple numeric PK sourced from sequence mdw_seq';
comment on column mdw.vc1 is'allow some random text up to 10 characters';
--
desc user_tab_comments

 Name                                                  Null?    Type
 ----------------------------------------------------- -------- ------------------------------------
 TABLE_NAME                                            NOT NULL VARCHAR2(30)
 TABLE_TYPE                                                     VARCHAR2(11)
 COMMENTS                                                       VARCHAR2(4000)

--
select * from dba_tab_comments where table_name='MDW'
/
OWNER                          TABLE_NAME                     TABLE_TYPE
------------------------------ ------------------------------ -----------
COMMENTS
----------------------------------------------------------------------------------------------------
MDW                            MDW                            TABLE
Martin Widlake's simple test table

select * from dba_col_comments where table_name='MDW'
order by column_name
/
OWNER                          TABLE_NAME                     COLUMN_NAME
------------------------------ ------------------------------ --------------
COMMENTS
----------------------------------------------------------------------------------------------------
MDW                            MDW                            ID
simple numeric PK sourced from sequence mdw_seq
MDW                            MDW                            VC1
allow some random text up to 10 characters
-- now to add a big comment so need to use the '-' line continuation character in sqlplus
--
comment on table mdw is 'this is my standard test table.-
 As you can see it is a simple table and has only two columns.-
 It will be populated with 42 rows as that is the solution to everything.'
/
select * from dba_tab_comments where table_name='MDW'
OWNER                          TABLE_NAME                     TABLE_TYPE
------------------------------ ------------------------------ -----------
COMMENTS
----------------------------------------------------------------------------------------------------
MDW                            MDW                            TABLE
this is my standard test table.  As you can see it is a simple table and has only two columns.  It w
ill be populated with 42 rows as that is the solution to everything.
--
/

Adding comments on tables, views and columns seems to have dropped out of fashion over the years but I think it is still a very useful feature of oracle and I still do add them (though I am getting a little slack about it myself over the last 3 or 4 years, which I must stop).

Comments are great, you can put 4000 characters of information into the database about each table, view and column. This can be a brief description of the object, a full explanation of what a column is to hold or even a list of typical entries for a column or table.

But you can’t add a comment on indexes or constraints. Why would I want to? Well, constraints and indexes should only be there for a reason and the reason is not always obvious from either the names of the columns or the name of the constraint or index, especially where you have a naming standard that forces you to name indexes and constraints after the columns they reference.

When you design a database, do a schema diagram or an ERD, you label your relationships between entities/tables. It tells you exactly what the relationship is. You might create an index to support a specific method of access or particular business function. You might alter the index in a way not immediately obvious to the casual observer, such as to allow queries that use the index to avoid having to visit the table. All of those things will, of course, be fully documented in the maintained project documentation in the central repository, available and used by all…

If I was able to add comments to constraints and indexes within the database then they would there. You move the system from one platform to the other, they are there. If for any wildly unlikely reason the central documentation lets you down, the information is always there in the database and easy to check. You may not be able to track down the original design documents but you have the database in front of you, so comments in that will persist and be very easy to find.

Lacking the ability to add comments on indexes and constraints, I have to put them at the table level, which I always feel is a kludge. I might actually raise an enhancement request for this, but as Oracle 12 is already nailed down, it will have to wait until Oracle 14. (A little bird told me Larry said there would be no Oracle 13…).

What Have I Let Myself in For! – UKOUG this year November 16, 2011

Posted by mwidlake in development, Meeting notes, UKOUG.
Tags: , , , ,
7 comments

One of my favourite Oracle happenings of the year is fast approaching, the UK Oracle User Group technical conference {see/click on the link on the right margin}. I’ve blogged before ( like here, last year) why I think it is so good.

I try and present at the conference each year and I go no matter if I am presenting or not.

However, this year I think I might have got myself into trouble. I put forward 3 talks, expecting one or possibly two to get through. One on Index Organized Tables, one on IT disasters and one as an introduction to database design – I’ve moaned about it being a dying art so I figured I should get off my backside and do something positive about it. Each talk is in a different stream.

Well, the IOT talk was accepted, the Disasters talk was rejected and the Database Design intro was put on the reserve list. I was happy with that. I did three talks the first year I presented and promised myself never to be that stupid again {I spent most of the conference in the Speaker’s lounge or my hotel putting the talks together and tweaking slides}.

What I was not expecting was for the OakTable to ask me to do the IOT talk on the OakTable Sunday. Yikes! {The OakTable Sunday is a great opportunity to see a set of presentations by people who really know their stuff in a smaller setting – You really want to get along to it if you can}. However I had two reasons not to do it:

  1. I would have to miss one of the other OakTable talks.
  2. That thing I said about people presenting who really know their stuff.

I was told that (1) was not a problem as the talks would be repeated in the main conference so I would have an opportunity to see  the one I missed and (2) stop being so British and do it. {In fact, one friend on the OakTable told me off after the last conference for my criticism of my own presentation that year – “yes it was poor for you but normally you do a good job, so keep doing it”}. Of course I said yes.

Then it struck me, I was presenting twice now. Once on Sunday and repeating on Wednesday in hall 5 {I’ll probably not simply repeat the contents, at the OakTable Sunday I’ll assume a little more knowledge by the audience and dig a bit deeper technically, in the main conference I’ll cover off the basics more, for those utterly new to IOTs}. At least it was only one set of slides to prepare.

A few days later I get a mail from the UKOUG office. A gap had appeared in the Development stream, would I be willing to do my “Oracle Lego – an introduction to database design” talk – but beef it up a little? Yes, sure. What do you mean about beef it up? The dev stream guys wanted something that went into more detail, was more about some of the more challenging systems I’ve work on. So we exchanged a few emails and it quickly became apparent that some wanted the intro talk I had originally proposed, to get people going with database design. Others felt there would be more audience for a more in-depth talk, so could I span both? I had to say no. I remember attending my Oracle database design course in 1993. It was 5 days long. If my memory serves there was also a second course a couple of weeks later that covered more advanced design for 3 days! I can talk fast but not 8 days fast. They were effectively asking for two quite different presentations, an intro and then a review of more challenging examples “OK” they said, “do Oracle Lego – But if another gap comes up, could you do the intermediate talk?”. Err, OK… So I wrote a quick synopsis for “Oracle Meccano” {Meccano is a toy construction kit made up of miniature girders, plates, bolts and stuff you can make proper things out of. If you liked Lego you would love Meccano as you got older} .

Since then I have been slightly anxious about getting an email from the UKOUG about a gap in the development stream for the conference…

This week I have started preparing the presentations for real {which so far has resulted in me breaking my server, finding a load of notes on blogs I was going to write and then doing this post} so I contacted the ladies in charge of the agenda and asked if I was now off the hook for the Oracle Meccano talk? “Yes, no more gaps, it is not on the agenda”. Phew. “But could you put it together in case of last minute cancellations?”. *sigh*. OK.

So I will, but I’m not signing up to do any Session Chairing, which I was about to. If you see me at the conference and I look a little crazed, it’s because I got a mail from the UKOUG just before the event about a sudden gap…

At least there is no chance I will be asked to do the Disasters talk at short notice, I saw the scores it got by the paper reviewers :-).

IOT Part 6(B) – OLTP Inserts into an IOT November 10, 2011

Posted by mwidlake in development, performance, Testing.
Tags: , , , , , ,
14 comments

<..IOT1 – the basics
<….IOT2 – Examples and proofs
<……IOT3 – Significantly reducing IO
<……..IOT4 – Boosting Buffer Cache efficiency
<……….IOT5 – Primary Key Drawback
<…………IOT6(A) – Bulk Insert slowed down

IOT 6 part A covered the sort of insert work used to create an IOT or push a large number of rows into it via an “insert into…select from” type format. ie Bulk Load.
In this, part B, I am going to test inserting a small number of rows (10,000) into a table already holding a million rows. Something more like an hourly load or the inserts that would occur over a working day – a more OLTP-like test

I’m going to do the tests against four things. A “non-insert” test to ensure my data creation is not taking a lot of resource {the Null test}, then against my three versions of test table – a normal heap table, my IOT and my Partitioned IOT. I run each test on each segment type three times, the minimum number needed to help identify variation that could invalidate the test.

You will probably be surprised by the results….

All three tables have the same column structure, the heap has indexes on the primary key and a unique constraint, the IOT and partitioned IOT are organized on a modified version of the Primary key and have the second index for the unique key. Below is the creation statement for the partitioned IOT (the IOT is the same without the partition clause). You can get full creation details from IOT part 2:

create table transaction_IOT_P
(tran_type number(2)     not null
,tran_id   number(10)    not null
,ACCO_type number(2)     not null
,ACCO_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(1000) not null
,vc_2      varchar2(1000)
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint tip_pk primary key(ACCO_type,ACCO_id,cre_date,tran_type,tran_id) 
--  using index tablespace index_01
,constraint tip_tran_uq unique (tran_type,tran_id)
  using index tablespace index_01
)
organization index
tablespace data_01
partition by range  (cre_date)
(partition rm20110901 values less than (to_date('01-09-2011','DD-MM-YYYY')) 
  tablespace data_01
,partition rm20111001 values less than (to_date('01-10-2011','DD-MM-YYYY'))
  tablespace data_01
,partition rm20111101 values less than (to_date('01-11-2011','DD-MM-YYYY'))
  tablespace data_01
,PARTITION RMTOP  VALUES LESS THAN (MAXVALUE)
  tablespace USERS
)

The three tests are:

  • Insert 10,000 records for random accounts, in time order for the same day. This would be like processing data for or during a “day”
  • Generating the 10,000 records in order of account, to better match the IOT structure, again for a single day. For any given account 0,1,2 or 3 records are created, to be a little more “real life”. This would be like loading a pre-processed set of data.
  • All 10,000 records are created for the one account, in datetime order, as an extreme example of inserting into an IOT data that is ordered to perfectly match the order of the IOT.

The code is written ‘badly’, ie a single record is prepared and inserted into the table at a time, within a loop of 10000 iterations. This is to avoid the bulk-processing that may benefit the heap table and also to better match the reality of an OLTP system. I do however only commit after all records are created for any given test, which is maybe a mistake but that’s what I did.

Now, before I go any further, does anyone want to guess by how much inserting into the heap table will be faster than inserting into the IOT (partitioned or otherwise)? Also, whether the partitioned IOT will be faster or slower than the IOT and what, if any, impact there will be by the ordering of data?

To allow you to think about that and not see the results by accident, I’ll stick in one of the insert harnesses here so you can see how I am doing it. Skip over if you just want the results

--
execute s_snap.my_snap(' finished non-insert test1')
--
-- Transaction_heap random data test
declare
v_num         number :=10000; -- number of accounts
v_offset      number :=-4;
v_str1        varchar2(60);
v_str2        varchar2(60);
v_tran_type   number(2);
v_tran_id     number(10);
v_acco_type   number(2);
v_acco_id     number(10);
v_cre_date    date;
v_vc_1        varchar2(1000);
v_vc_2        varchar2(1000);
v_date_1      date :=sysdate;
v_num_1       number(2) :=10;
v_num_2       number(2) :=15;
begin
v_str1:=dbms_random.string('U',60);
v_str2:=dbms_random.string('U',60);
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..10000 loop
    v_tran_type :=mod(1,3)+1;
    v_tran_id   := 1000000+i;
    v_acco_type := 5+(trunc(dbms_random.value(1,3))*5); -- 10 or 15
    v_acco_id   :=trunc(dbms_random.value(1,v_num/2)); -- 1 to 5000
    v_cre_date  :=trunc(sysdate-v_offset)+(i/(60*60*24));
    v_vc_1      :=substr(v_str1,1,51+mod(i,10));
    v_vc_2      :=substr(v_str2,1,46+mod(i,15));
    insert into transaction_heap
      (tran_type,tran_id,ACCO_type,ACCO_id,cre_date,vc_1,vc_2,date_1,num_1,num_2)
    values
      (v_tran_type,v_tran_id,v_ACCO_type,v_ACCO_id,v_cre_date
      ,V_vc_1,v_vc_2,v_date_1,v_num_1,v_num_2);
end loop;
commit;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
end;
/
--
execute s_snap.my_snap(' finished th insert test1')
--

Here are the results of three runs for all tests:

Elapsed time in seconds:

                    random insert             ordered-by-account       perfect IOT order match
Segment type    Test 1    2    3            Test 1    2    3           Test 1    2     3    
------------    ------ ---- ----            ------ ---- ----            ----- ---- ----
No insert          2.1  0.1  0.1               0.1  0.1  0.1              0.1  0.0  0.1   

                                    AVG                         AVG                        AVG
                                   ----                        ----                       ----
Heap              52.3 20.5 11.4   28.1       33.4 13.8 10.7   19.3       9.1 13.0  9.8   10.6
IOT              137.1 83.7 26.4   82.4      110.8 63.6 39.0   71.2       6.4  6.7  4.0    5.8
Partn'd IOT       33.2  9.5  8.7   17.1       17.1 10.4 14.3   13.9       8.1  5.3  4.5    6.0
                                   ----                        ----                       ----

Allow me to summarise that.

First, the preparation of the data and cycling around the loop, the no-insert tests, are negligible. {The first no_insert test is taking 2 seconds due to the parsing and collecting dictionary information overhead of the first thing done}. We can ignore the overhead of the test harness.

Much more significantly:

  • Each iteration of each test, things get faster {with one odd exception}
  • For Random and semi-ordered inserts, inserting into the IOT is slower than the Heap by a factor of 300% to 400% (3 times to 4 times slower) – what we all probably expected.
  • Inserting into the IOT is Faster than the heap when the insert order matches the structure of the IOT in the perfect_IOT test. Not just the same but 40% faster
  • The partitioned IOT is as Fast or Faster than the the Heap table, never mind the IOT. Over all the partitioned IOT takes 50% TO 75% THE TIME OF THE HEAP to be populated.

Hands up everyone expecting this? NB My hand would be down if I had not done similar tests before.

Why do we see these results?

I’ll just mention that the reason all the tests get quicker with each iteration is that the computer is doing the same thing over and over again and gets used to it, the relevant circuits in the chip are “oiling in”. Of course not, how silly – but it seems to be how non-IT people think of it :-) Basically more and more of each segment is getting cached and that reduces physical IO and speed things up. You see this a lot in testing and you need to be constantly vigilant about it. You might think I should really run the tests until they are steady-state, but that is a very false situation. On real system you don’t tend to get the same thing run over and over and over again on a system with nothing else running.

Much more interestingly, why is the partitioned IOT winning? Well, for the Heap table data is being poured into the leading edge of the table, which takes a little processing and some blocks to be created. However, there are also two indexes being maintained, the Primary Key and that Unique key. For each insert oracle has to find the right spot in each index (root block ->branch block->leaf block->check for row) and insert the new record, occasionally splitting a full leaf block.

For the IOT and the Partitioned IOT there is no table to maintain. In effect there are just two indexes to maintain. For the normal IOT this benefit is outweighed by the cost of working with a less space efficient Primary Key index. As there is the whole of the row in each index entry there are fewer index entries possible per leaf block and so more block need to be read in from disc and more splits are needed. Thus the inserts into the IOT are slower.
But for the partitioned IOT not only is there no table to maintain but the table (really the Primary Key) is partitioned – on date. We are creating data for a single day at a time. All the relevant blocks will be in the one partition (and it’s the latest and smallest partition) and quickly all cached in memory and processed in memory. It’s the efficient use of the cache again. For both the IOT and the Heap, the primary key segment being processed is larger and the inserts are spread across the whole index.

Ordering data to match the index order (ordered_by_account tests) helps for all three segment types – heap, IOT and partitioned IOT. We have two indexes and the data is being ordered for both in effect, as one is on the columns we are ordering the data creation for and the other, the unique key, is generated by an ascending sequence. The gain is about 25, 30% and is due to working through the indexes in order as opposed to skipping about. If the indexes were significantly larger than the buffer cache or on a highly active system where the turnover in the cache was very high (indeicated by a low buffer cache hit ratio :-) ) then the benefit would be higher.
The perfect ordering test (perfect_iot_order_match) is a special case. Not only is the data being processed in order of the indexes but only one part of the index is being updated – for a single account. In this situation the maintaining of the index is so efficient that the time saved not processing the heap table means both the IOT and IOT_P win out over the heap. So in this special case, inserts into IOTs are faster than inserts into a heap table.

And finally a quick summary:

  • Inserting records into an IOT carries about a 200-400% overhead, depending on the number of indexes
  • Ordering the data to match the IOT gives a 25-30% advantage, but it similarly aids update of the primary key on the heap
  • Partitioning the IOT such that a smaller chunk of the index is processed can be highly beneficial. I don’t cover it specifically in the above but you can probably appreciate that partitioning on a heap can be beneficial to processing indexes on the heap also.
  • If you can organise to create your IOT such that data is created only in one or a few distinct points in the structure (you might have to tweak you data load too of course), insert performance might match or even beat that for a heap table with the same indexes
  • Using methods to maximise efficient use of your buffer cache and reduce IO is a very powerful performance tuning aid, not just for select

BTW I am conscious of the fact that the above may surprise many and you might wonder if my data or tests are flawed. I have stripped out most of the code from this post as it was just becoming massive and confusing, but contact me and I will happily supply all the example scripts and data.

IOT P6(a) Update November 8, 2011

Posted by mwidlake in Architecture, development, performance, Testing.
Tags: , , , ,
3 comments

In my last post, IOT part 6, inserts and updates slowed down, I made the point that IOT insert performance on a relatively small Oracle system was very slow, much slower than on a larger system I had used for professional testing. A major contributing factor was that the insert was working on the whole of the IOT as data was created. The block buffer cache was not large enough to hold the whole working set (in this case the whole IOT) once it grew beyond a certain size. Once it no longer fitted in memory, Oracle had to push blocks out of the cache and then read them back in next time they were needed, resulting in escalating physical IO.

I’ve just done another test which backs up this claim. I altered my test database so that the block buffer cache was larger, 232MB compared to 100MB in my first tests. The full IOT is around 200MB

Bottom line, the creation of the IOT was greatly sped up (almost by a factor of 4) and the physical IO dropped significantly, by a factor of 20. As a result, the creation of the IOT was almost as fast as the partitioned IOT. It also shows that the true overhead on insert of using an IOT is more like a factor of 2 to 4 as opposed 6 to 8.

You can see some of the details below. Just to help you understand them, it is worth noting that I had added one new, larger column to the test tables (to help future tests) so the final segments were a little larger (the IOT now being 210MB as opposed to 180MB in the first tests) and there was a little more block splitting.

                        Time in Seconds
Object type           Run with       Run with
                     100MB cache    232MB cache
------------------  ------------    -----------   
Normal Heap table          171.9          119.4   
IOT table                1,483.8          451.4     
Partitioned IOT            341.1          422.6 

-- First reading 100MB cache
-- second reading 232MB cache 
STAT_NAME                            Heap    	IOT	      IOT P
-------------------------------- ---------- -----------  ----------
CPU used by this session            5,716         7,222       6,241
                                    5,498         5,967       6,207

DB time                            17,311       148,866      34,120
                                   11,991        45,459      42,320

branch node splits                     25            76          65
                                       25            82         107

leaf node 90-10 splits                752         1,463       1,466
                                      774         1,465       1,465

leaf node splits                    8,127        24,870      28,841
                                    8,162        30,175      40,678

session logical reads           6,065,365     6,422,071   6,430,281
                                6,150,371     6,544,295   6,709.679

physical read IO requests             123        81,458       3,068
                                      36          4,012       1,959

physical read bytes             2,097,152   668,491,776  25,133,056
                                1,400,832    34,037,760  16,048,128

user I/O wait time                    454       139,585      22,253
                                       39        34,510      19,293

The heap table creation was faster with more memory available. I’m not really sure why, the cpu effort was about the same as before and though there was some reduction in physical IO with the larger cache, I suspect it might be more to do with both the DB and the machine having been recently restarted.

All three tests are doing a little more “work” in the second run due to that extra column and thus slightly fewer rows fitting in each block (more branch node and leaf node splits), but this just highlights even more how much the IOT performance has improved, which correlates with a massive drop in physical IO for the IOT creation. If you check the session logical reads they are increased by a very small, consistent amount. Physical read IO requests have dropped significantly and, in the case of the IOT, plummeted.

I believe the 90:10 leaf node splits are consistent as that will be the maintaining of the secondary index on ACCO_TYPE and ACCO_ID, which are populated in order as the data is created (derived from rownum).

What this second test really shows is that the efficiency with which you are able to make use of the database cache is incredibly significant. Efficiently accessing data via good indexes or tricks like IOTs and hash tables is important but it really helps to also try and consider how data is going to be recycled within the cache or used, pushed out and then reused. A general principle for batch-type work seems to me to be that if you can process it in chunks that can sit in memory, rather than the whole working set, there are benefits to be gained. Of course, partitioning can really help with this.

{If anyone is wondering why, for the heap table, the number of physical IO requests has dropped by 70% but the actual number of bytes has dropped by only 30%, I’m going to point the finger to some multi-block read scan going on, either in recursive code or, more likely, my code that actually gathers those stats! That would also help explain the drop in user IO wait time for the heap run.}

Just for completeness, here is a quick check of my SGA components for the latest tests, just to show I am using the cache size I claim. All of this is on Oracle 11.1 enterprise edition, on a tired old Windows laptop. {NB new laptop arrived today – you have no idea how hard it has been to keep doing this blog and not play with the new toy!!!}. If anyone wants the test scripts in full, send me a quick email and I’ll provide them.:

-- sga_info.sql
-- Martin Widlake /08
-- summary
set pages 32
set pause on
col bytes form 999,999,999,999,999 head byts___g___m___k___b
spool sga_info.lst
select * 
from v$sgainfo
order by name
/
spool off
clear col
--
NAME                             byts___g___m___k___b RES
-------------------------------- -------------------- ---
Buffer Cache Size                         243,269,632 Yes
Fixed SGA Size                              1,374,892 No
Free SGA Memory Available                           0
Granule Size                                4,194,304 No
Java Pool Size                              4,194,304 Yes
Large Pool Size                             4,194,304 Yes
Maximum SGA Size                          401,743,872 No
Redo Buffers                                6,103,040 No
Shared IO Pool Size                                 0 Yes
Shared Pool Size                          142,606,336 Yes
Startup overhead in Shared Pool            50,331,648 No
Streams Pool Size                                   0 Yes

Friday Philosophy – The One Absolute Requirement for System Success October 14, 2011

Posted by mwidlake in development, Friday Philosophy, Perceptions.
Tags: , , , ,
5 comments

Alternative title “The lady from Patient Admin – she says YEEESSSS!!!!!!”

What must you always achieve for an IT system to be a success?

  • Bug free? Never happens.
  • Within budget/time frame? That would be nice.
  • Includes critical business functionality? Please define critical.
  • Secure? Well, it’s important for many systems but then it is often lacking (even when it is important).
  • That it is to specification? Well we all know that’s wrong.

There is only one thing that an IT system must always achieve to be a success.

User Acceptance.

For an individual system other considerations may well be very important, but the user acceptance is, I think, non-negotiable.

The user must get enough out of using the system for it to be worth their while, otherwise at best they will resent using it and at worst… Well, at worst they will use it but will put in any old rubbish to fulfill the dictate that it be used. You would be better off if they did not use the system at all. Here are a couple of examples from my working past.

In the first one, I was involved in extending a hospital management system so that it kept track of the expected departure times for patients, allowing a predication of when beds would become available and calculation of expected occupancy rates. Yes, this was a while ago (maybe 1990) and on an a system that was old then. The information was needed by someone with the title “bed nurse” {or something similar} so that they could better prepare for bringing patients in and keeping a higher bed usage ratio. This was to make the hospital more efficient? No, it was to satisfy a politically demanded report to the NHS executive. Oh, the overall intention was to increase efficiency but the report soon became more important than the idea. So, we added columns in tables and field on screens and prompts for the ward staff to fill in the information. And they didn’t. The nurses were busy, they were pretty demoralized due to having recently been used by the government as a way to control public sector pay and they had nursing duties to do. They were not going to waste a couple of minutes trying to check when Mrs Jenkins was going to be sent home when Mrs Leonard needed a bed pan. The nursing staff were given a hospital-wide telling off, this information had to be entered. They put in the data – but guessed wildly. The design was fine, the report was logically accurate, only the correct staff could run it, but No User Acceptance and thus a failure.

So I added something else. It was a very crude screen that showed a “diagram” of the ward – Down the left and right side of a VT220 screen you saw little oblong boxes with a bed number, name in it, a consultant’s initials, a medical speciality code and the arrival and departure datetime. This was some information we already had plus the new information we wanted and something quite basic, limited and slow to draw. But it was useful to the ward staff. They could find any patient, they knew who to call if there was an emergency {not the actual consultant of course, but their secretary}, they could check when they were leaving, they could see what time someone was expected. From anywhere where there was a terminal, not just the entrance to the ward, they could see all this information. They used it.  They put in the expected departure time {sobering thought, this might not be expected leaving alive} and the bed nurse could plan and the report could be run.

Second example, different hospital. We were putting together a system to schedule outpatient clinics. We knew what we were doing, it’s pretty simple. You have some people (a consultant and probably a senior house officer), a period for the clinic (3 or 4 hours) and a set of people to see, say 40.  Give some flexibility in slot lengths (some people need 5 minutes, some 15) and allow the patients to be booked in. Check for and stop double booking. We did not go and ask the patient admin staff, we knocked up the design and the screens and asked them to test. After all, I was very experienced now, I’d been doing these systems for 3 years… They very quickly came back to us and said it was rubbish. Oh dear.

We went and saw them. I think it was a couple of us programmers, our development manager, the hospital liaison for the project and the patient admin staff. “What’s the problem?” There were a few but the main one was that you could not double book a slot. Why would you want to? Do two patients really want to be consulted at the same time with the same doctor?.
“Err, maybe, it might happen, can we just be able to double book?” OK, we could maybe alter things to allow two patients to be seen at the same time… The patient admin staff are not looking happy. The hospital liaison is looking confused – “You can’t do that! Patient confidentiality can’t be broken!” he says. It got worse. “We need to book all the patients into the first slot, with the consultant, so the letters go out to them saying come to see Mr Winders at 1pm”. The admin staff are now looking very shifty.

If any of you have worked in the health service you are probably way ahead of me. The admin staff needed to book all the patients in at this first slot so that they would all turn up, the consultant would see the two or three he was interested in – and then go and play golf. The SHO would then plough through the rest of the patients for the following three or four hours. If you have ever had to turn up at the start of a consultancy session and sat there for three hours, now you know why. You see, back then, the consultant was only a very small step away from deity level (and I leave it to you to decide if it was a step up or down). What they said went and if they wanted to go and play golf or store 200 medical records in the boot of their car or refuse to speak to “that stupid idiot in renal medicine” then you worked around it. {I’m assured that things are a lot better now, but I’d love to know how it really is}.

We had designed a sensible system, the users needed a non-sensible {to our mind} system. Even the NHS liaison chap had never appreciated exactly how much the consultants abused the system, he thought they just booked the people s(he) wanted at the start of the session, but no. The consultant decided that day who was interesting and as a result every patient had to be there at the start.

I count myself lucky that I learnt from direct experience so soon in my working life that (a) you have to deliver what the user will accept and (b) the only way to know what they want is to show them the system and talk with them.

{For those of you who do not understand the alternative title at the top, it is all about an old DelMOnte fruit juice advert which became a bit of a catchphrase at the time}

{And are you happy now Dom? :-) }

In Defense of Agile Development (and Their Ilk) September 21, 2011

Posted by mwidlake in development, Management.
Tags: , , , ,
10 comments

In my previous post I asked the question “why doesn’t Agile work?”. I’m not sure the nuance of the question came over correctly.

I’d just like to highlight that the question I asked was “Why does agile not work”. It was not “Why is Agile rubbish“. I’ve said a few times in the past couple of weeks that I like the ideology of Agile and I am (and have been for years and years) a strong proponent of prototyping, cyclic development, test driven design and many other things that are part of the Agile or XP methodologies.

That distinction in the title is a really important distinction and one I’d hoped I’d made clear in my post. Looking back at my post though, I think it is clear I failed :-(. I highlighted reasons why I think Agile does not work and in my head I was thinking “if we avoid these, Agile could work” – but when you write something down it does not matter what is in your head if it does not reach the paper.

I’m actually frustrated that in the last few years I have not seen Agile really succeed and also that this must be the normal situation, going on the response you get when the topic of Agile comes up with fellow technicians and comments on my own blog.

However, on that post about Agile two people who’s opinion I deeply respect came back at me to say “Agile does work!”. Cary Millsap, who many of you will have heard of as the “Method R” guy and the person behind Oracle Flexible Architecture. And Mike Cox, who most of you won’t have heard of but Mike taught me a lot about sensible development back in the 90’s. He’s one of the best developers I have ever had the pleasure of working with and I know he has had great success with Agile and RED. I’m not sure if they read my post as “Agile is Rubbish” or they are, like me, simply frustrated that it can work but so often does not.

So I’ve been thinking about this a lot this weekend and I was helped by Cary’s paper on the topic that he mentioned in his comment. I’d highly recommend downloading it as it is an excellent description of not only why Agile can help but describes how and some of the pitfalls {I’d started my own post on that, but go read Cary’s}. I should add, you can see Cary present his case for Agile at the UKOUG conference this year.

So where does this bring me to? Well, I think “Is Agile good or bad” has become almost an “IT religion” topic, people love it or loath it and it is based on what they have seen of the methodology in real life. No, that’s wrong, it is based on what they have seen that has been labelled with that methodology in real life. Or worse, it is based on anecdotal opinion of those around them. The thing is, if you look at what XP is supposed to consist of or what Agile Programming is supposed to consist of, most of us would agree that a great deal of it makes sense in many situations. I’d disagree with some of the details in Cary’s paper but overall I’m in strong agreement. Sadly, What Agile and XP is supposed to be is not well matched by what you see on the ground in most cases. So even if these methodologies are right for the situation, what has been implemented is not the methodology but probably more a slap-dash process that simply jettisons documentation, design and proper testing. This whole thread sprung from my lamenting the demise of database design and several of the comments highlighted that the introduction of Agile seemed to equate, at least in part, with the demise of design. As MIke and Cary say, and as I think anyone who has successfully utilized Agile would say, Design is an integral part of Agile and XP methodology.

Agile can and does work. But many things can and do work, such as taking regular exercise to keep healthy or regularly maintaining your house to keep it weathertight. Like Agile, both take effort but the overall benefit is greater than the cost. And like Agile, do it wrong and you can make things worse. If your window frames are starting to rot and you just slap a new layer of top-coat on them all you will do is seal in the damp and rot and hide the problem – until the glass falls out. Going for a regular 5 mile run is good for you – but not if you are 10 stone (60KG) overweight and have not run in years. A 5 mile run is also not a good idea if you want to be a long-jumper. Right training (methodology) for the right aim. Also, just like keeping healthy, house maintenance or anything that takes effort but works, proponents tend towards extremism – probably as a reaction to the constant {perceived} pig-headedness of critics or the failure of people to just do what now seems so sensible to them {think reformed smokers}. I’ll have to buy Cary and Mike pints to make up for that jibe now, and promise them it was not aimed at them personally…

Sadly, the reality is, Agile does not work 90% of the time it is tried. So, does that mean Agile is actually rubbish? Or at least, not fit for purpose, because many companies are not able to use it? Companies are there to achieve something and the IT systems are part of achieving that something. If Agile cannot aid that IT department then Agile is the wrong way for that department and company.

*sigh* I’ve gone on and on about this and still not got to my own main point, which is this.

- Can we identify reasons for Agile and XP Failing.
– Having identified the Reasons, can we fix them in simple ways?
– Can we create some simple guidelines as to when a project should be more Agile and when it should be more Up-Front design.

I’d love to know people’s opinions on those three points above.

Friday Philosophy – The Dying Art of Database Design? September 9, 2011

Posted by mwidlake in Architecture, development, Friday Philosophy, rant.
Tags: , , ,
35 comments

How many people under the age of {Martin checks his age and takes a decade or so off} ohh, mid 30’s does any database design these days? You know, asks the business community what they want the system to do, how the information flows through their business, what information they need to report on. And then construct a logical model of that information? Judging by some of the comments I’ve had on my blog in the last couple of years and also the meandering diatribes of bitter, vitriolic complaints uttered by fellow old(er) hacks in the pub in the evening, it seems to be coming a very uncommon practice – and thus a rare and possibly dying skill.

{update – this topic has obviously been eating at my soul for many years. Andrew Clark and I had a discussion about it in 2008 and he posted a really good article on it and many, many good comments followed}

Everything seems to have turned into “Ready, Fire, Aim”. Ie, you get the guys doing the work in a room, develop some rough idea of what you want to develop (like, look at the system you are replacing), start knocking together the application and then {on more enlightened projects} ask the users what they think. The key points are the that development kicks off before you really know what you need to produce, there is no clear idea of how the stored data will be structured and you steer the ongoing development towards the final, undefined, target. I keep coming across applications where the screen layouts for the end users seem to almost be the design document and then someone comes up with the database – as the database is just this bucket to chuck the data into and scrape it out of again.

The functionality is the important thing, “we can get ‘someone’ to make the database run faster if and when we have a problem”.

Maybe I should not complain as sometimes I am that ‘someone’ making the database run faster. But I am complaining – I’m mad as hell and I ain’t gonna take it anymore! Oh, OK, in reality I’m mildly peeved and I’m going to let off steam about it. But it’s just wrong, it’s wasting people’s time and it results in poorer systems.

Now, if you have to develop a simple system with a couple of screens and a handful of reports, it might be a waste of time doing formal design. You and Dave can whack it together in a week or two, Chi will make the screens nice, it will be used by a handful of happy people and the job is done. It’s like building a wall around a flower bed. Go to the local builders merchants, get a pallet of bricks, some cement and sand (Ready), dig a bit of a trench where you want to start(Aim) and put the wall up, extending it as you see fit (Fire). This approach won’t work when you decide to build an office block and only a fool from the school of stupid would attempt it that way.

You see, as far as I am concerned, most IT systems are all about managing data. Think about it. You want to get your initial information (like the products you sell), present it to the users (those customers), get the new (orders) data, pass it to the next business process (warehouse team) and then mine the data for extra knowledge (sales patterns). It’s a hospital system? You want information about the patients, the staff, the beds and departments, tests that need doing, results, diagnoses, 15,000 reports for the regulators… It’s all moving data. Yes, a well design front end is important (sometimes very important) but the data is everything. If the database can’t represent the data you need, you are going to have to patch an alteration in. If you can’t get the data in quick enough or out quick enough, your screens and reports are not going to be any use. If you can’t link the data together as needed you may well not be able to DO your reports and screens. If the data is wrong (loses integrity) you will make mistakes. Faster CPUS are not going to help either, data at some point has to flow onto and off disks. Those slow spinning chunks of rust. CPUS have got faster and faster, rust-busting has not. So data flow is even more important than it was.

Also, once you have built your application on top of an inadequate database design, you not only have to redesign it, you have to:

  • do some quick, hacky  fixes to get by for now
  • migrate the existing data
  • transform some of it (do some data duplication or splitting maybe)
  • alter the application to cope
  • schedule all of the above to be done together
  • tie it in with the ongoing development of the system as hey, if you are not going to take time to design you are not going to take time to assess things before promising phase 2.

I’m utterly convinced, and experience backs this up, that when you take X weeks up front doing the database design, you save 5*X weeks later on in trying to rework the system, applying emergency hacks and having meetings about what went wrong. I know this is an idea out of the 80’s guys, but database design worked.

*sigh* I’m off to the pub for a pint and to reminisce about the good-old-days.

Follow

Get every new post delivered to your Inbox.

Join 161 other followers