jump to navigation

Really Simple Naming Standard To Simplify Everything November 29, 2022

Posted by mwidlake in Architecture, database design, development, PL/SQL, SQL.
Tags: , , ,
5 comments

Many years ago I came up with a way of naming indexes, variables, table aliases, and several other things that is incredibly simple and, for me at least, Just Works. This might not sound like a big deal but it’s one of the things that has really helped me create readable & easily understandable code, consistent naming for objects, and database designs that make a lot of sense just by looking at them.

Not only that, but this naming convention also saves me a lot of time & mental effort and allows me to apply most of my limited brain capacity on making my SQL & PL/SQL code work, and not on stuff that ultimately should not matter – like what to call a variable.

The Starting Point

We generally name our tables after the objects or data they hold, so they are called PERSON, ORDER, ORDER LINE, ADDRESS, INPATIENT_STAY, and for intersection entities they are called things like PERSON_NAME, EMPLOYEE_ROLE. Anyone looking at a database like this will get the general idea of what ii in each table and how some tables relate to each other. That’s all good.

Naming columns is generally easy too. FIRST_NAME, LAST_NAME, HOUSE_NUMBER, HOUSE_NAME, ORDER_DATE but some columns are more generic like ID or UNIQUE_ID for a system generated primary key. START_DATE & END_DATE, CREATED_DATE, STATUS all generally makes sense within the table. A lot of places have some simple standards for column names, such as name DATE columns so that you indicate if it has a time element (ADMITTED_DATETIME), or that if it is an inherited key from a parent table, that is indicated in the name (CUSTOMER_ID, CUST_ID, CTSMR_ID… hmm, maybe this is where issues start occurring).

Table and column names are generally defined once, mostly when the application is first designed (please tell me you at least design you schema at the start) and then groups of new ones are introduced when new functionality is added.

But then you need to come up with thinks like index names or what your variable names will be in code and often there are no standards for that. This can also be true of those columns inherited from a the other end of a foreign key. So try this:

Naming Secondary Objects

You use a short (general 4 letter) acronym or alias based on the leading letters of the words in the parent object name.

If name of the parent object is one word, like PERSON then the alias is the first four letters – PERS

If the name of the parent object is two words like PESON_NAME then the alias is the first two letters of the two words – PENA

If the name of the parent object has more than two words like PATIENT_CASENOTES_FRONTSHEET then the alias is the first letter of each word – PCF. A long table like REGIONAL_SALES_SUMMARY_BY_MONTH would be RSSBM (these are true acronyms)

Apply this naming convention takes no thought and no decision making in 99% of cases. And, especially once you have been looking at the system for a couple of hours, the aliases make total sense:

You get the idea.

You might have wanted to use ORD as the alias for ORDER, as it reads better or you are used to it being reduced to ORD, but stick to the rules. As soon as you break the rules you start losing the benefit as you have to remember the exceptions and mistakes are likely to happen “further down” the naming hierarchy. Over the years I have been using this method it is when I am less firm about applying them rigorously that I get less benefit. But see the end of this post for what you do when you are forced to make exceptions.

Using The Alias

The first thing you do with the alias is use it in column names. I don’t like having a column called simply “ID” for a system generated unique identifier. It means you have loads of columns in the data dictionary called ID and it makes looking up information on the column a bit more clunky. In many systems I’ve seen a real mix of using full_table_name_id, whitespaceremoved_id, part_of_nme_id. Not any more, now you can have a rule that it is always tablealias_id. Similarly the primary key created on a table is tablealias_pk.

Foreign keys? It’s tablealiasfrom_tablaliasto_fk so the foreign key from the PERSON table to their current ADDRESS is PERS_ADDR_FK. If things are a little more complex such as having links from a CUSTOMER table to a business’s head office address and delivery address you might amend the relevant column names to include a brief descriptor of the address type, and then use the alias rules for that descriptor too, as I show below:

The name of the sequence for generating a unique identifier for the CATALOGUE table? CATA_SEQ { I don’t know why people insist on putting SEQ or similar in the name of sequences, it just seems almost universal…}

I also use this naming convention for my indexes. What do you think the below index is on?

PERS_SENA_FINA_DOB

Yes, SECOND_NAME, FIRST_NAME, DOB on the person table. If I see that index name in an explain plan I know immediately what columns are involved on what table. Now, if I was being lazy that index would be SN_FN_DOB and in some ways it is easier to read – but once you start breaking the standard then you have to make a decision when adding indexes if you stick to the rule or not, you get more situations where you might not be sure which column SN really is for (OK, maybe not in this case).

Simplification Of Code

Where this all really comes into it’s own is naming variables and table aliases in code. I strongly believe any production-quality SQL should always use table alias and the use of A,B,C etc for them should be punishable with chilli juice in the left eye. With the naming method I describe here you no longer have to think about variable names at all and it makes the code incredibly readable.

I did not have to think about the table alias when I wrote the below. I did not have to check what I had set them to when I used them in the code. Anyone looking at the code can see that the table aliases mean something and it makes it clear what each column is, what table it is from, even if the actual column name is a little ambiguous. With alias of A, B, C or T2134, T3341 etc you endlessly have to keep check which table (or subquery) is being referenced.

(please don’t critique the code or tell me about bugs or *anything*, it’s just a stupid example I knocked up in a text editor to make the point, OK?)

I use the same alias method for naming variables in PL/SQL too, as then I don’t need to think much about the names of many of them. The below is a sample of some PL/SQL code, using a lot of variables. I have a standard batch processing architecture using the tables PROCESS_MASTER, PROCESS_RUN and PROCESS_LOG. What to call the package level record variables derived from them? I prefix all package variables “PV” and record variable are suffixed with “REC” (not RECO, it’s not derived from an object name is my excuse) so they are PV_PRRU_REC & PV_PRMA_REC, no thought needed. And once you spot the other three table names referenced in the code, the other variables make sense. For as long as you look at the code you’ll know what TONA, RONA and ROTY are referring to:

Pesky Issues

The most common drawback of the “four letter alias” (as I inaccurately think of it) is alias clashes. This tends to happen most with table names that are two words long. CONSULTATION_DEPENDENCIES and COUNTRY_DETAILS would both have an alias of CODE. So you pick one and make the alias first letter of the first work, three letter of the second, so COUNTRY_DETAILS gets an alias of CDET. In my experience of using this naming convention for, oh, 25 years I’ve only had a few clashes and the 1:3 change in the letter pattern has worked.

The other drawback is getting an alias that is not, let us say, acceptable in polite company. AREA_SERVICES was an early one. SOFTWARE_HARDWARE_INTEGRATION_TESTS was another, but I strongly suspect that was an aimed-for alias by someone who was not happy in their role. Again, the swapping to a 1:3 letter derivation cures the problem.

There are usually so few exceptions that you are forced to introduce that everyone involved on the system gets to know them very easily.

When I was working on a very, very large project for an energy company I got them to start using this naming convention and people did get really quite worked up about the potential for alias clashes so they introduced a reference table that you populated with all the table names and it would generate the alias and, if it clashed with an existing one it would swap to 1:3 then 3:1 then raise an error for manual intervention. I always felt they had taken things a little too far.

SUMMARY

Generate a short alias for all objects by taking the first letter(s) of the object name.

Use that in column, constraint, index, and sequence names.

Now uses the alias in your code for table aliases and variables.

Greatly reduce both the time you spend coming up with names for these things, and the potential for code confusion.

Spend that time and brain effort on making the actual code do what it is supposed to do and go home early.

Advertisement