When natural keys act unnaturally

Many web application frameworks provide auto-incrementing integer columns as the default primary key (PK) for database tables. This is fine for many use cases, but it’s important to remember that this column should, by default, be treated as a technical key and that there are many situations when this default PK column should either be changed, or retained as a surrogate key but not reused and exposed to users as a business key (a.k.a. natural key). This is one of the rookie mistakes I made when first building an app that an organization actually became dependent on.

At first glance, it’s not really a big issue, and perhaps the use of the simple-to-understand auto-incrementing integers as business keys should be okay for many small-business use cases. After all, it reflects people’s natural way of numbering things. Give someone the simple task of keeping track of documents or queues or whatever entity, and they will most likely, instinctively, just use natural (cardinal) numbers, whether they’re just writing it on papers or cards or even if they’re using Excel; it’ll be the same. It takes an overthinker, and maybe certified system architects, to even think about using complicated multi-segment serial numbers at once.

But, as with many design decisions, the issues and disadvantages crop up sooner or later after implementation. With the use of auto-incrementing integers as user-facing keys in applications, there are many, but perhaps the most interesting and surprising one I encountered was the inadvertent assumption by users that these integer PKs increment predictably, in that if I, say, create 1,000 records in our brand new application, the 1,000^th record should have ID #1000. It was a surprising problem because I had the same assumption too—but I was wrong.

It took a user report for me to discover that database engines actually do not guarantee that auto-incrementing sequences are gapless for performance and concurrency reasons. It means that the next PK after record #999 can only be expected to be larger than 999, but will not necessarily be 1000. This is a problem for users because they have come to rely on the (reasonable) assumption that the generation of new record numbers are gapless, so that by just looking at the largest, latest record ID (i.e., PK) they can quickly tell how many records we have of that entity in the system. Unfortunately, this is one point where the technological concept of PKs deviates from the misleadingly similar lay concept of serial numbers. Database engines have to deal with concurrency and performance considerations. People just expect the next number after 41 to be 42, not 74. Natural keys should not act unnaturally.

In the application I built for our company, we just learned to live with the auto-increment integer PKs reused as business keys, for both technical and organizational reasons, the technical reasons being the headaches associated with database (a.k.a. model) changes, which in a model-view-controller framework often necessitate adjustments in the other layers, and especially so for such a critical column or attribute as a user-exposed PK. But the technical changes required weren’t even as daunting as the organizational adjustments that we had to make (re-orienting and re-training around a new business key)—and people are harder to change than software.

If I could go back in time, however, I would’ve probably leaned into a design that we did eventually implement for the application. We at least were able to limit the use of the auto-incrementing integers internally, to within our organization; when at one point external parties needed to interface with our internal records, I developed a new reference code column that was a randomly-generated string of characters (for example, V3XBND4R). In other words, I created a new business key, separate from the established technical key, which would have been merely a surrogate key had I not exposed it to users in the initial design. This is what I recognize now to be the general best-practice design, and this is what I plan to implement from the start in similar future projects, minus the exposed technical key.

(Of course, there are still trade-offs with this design. With a business key that is not an auto-incrementing integer key, users who have to count records will need to have some sort of reporting mechanism implemented earlier in the course of the software’s evolution.)

The use of a random string as a business key has an additional benefit: it solves the usability issue of simple integers being confusing and prone to typos. The concept of numerical PKs being unique, unambiguous references to records is obvious, elegant and sufficient for me as a software developer, but in reusing these integers as business keys, I forgot that while computers are comfortable working with numbers, humans are not so much, especially when it’s something they have to do repetitively all day. In areas of the application where users had to enter just the record’s ID number to perform an action, they would occasionally make mistakes because the app can only validate if it’s an existing ID/PK, not if it’s the actual ID that they intended to enter. This was an issue particularly because the IDs were incrementing integers: it was easy for example to mistype “998” as “989”, and the app will still accept “989” because that already exists in the database.

Eventually, instead of just the ID number (PK) itself, I made the application display another text attribute next to the ID number everywhere to offer users a way to validate if they’re working with the correct record, a kind of check digit (but not exactly). It’s easier to spot the difference between “998 Juliet” and “989 Romeo” than just “998” and “989”. This change actually just reflected the practice that the organization had already developed around the record IDs outside of the application, in spreadsheets and paper forms—an instance of offline practices informing better digital designs.

With a random string as a business key, even without the additional text attribute, erroneous inputs are reduced because small deviations (say, V3XBND4T instead of V3XBND4R) will be rejected by the application because the mistyped key is (very highly likely) non-existent.

Just to be clear, if there is a lesson to be taken from all this, it’s not that humans are fallible, easily misguided beings; it’s that, as application developers, we often need the humbling but important reminder that we’re ultimately designing for humans, not computers.