Why You Should Graduate from Using Auto-Increment ID for Database Record?

Moving on from serial id when inserting a new database record.

Database schema.

In any database, we always need an identifier to differentiate one record from one another, commonly known as Primary Key. Now let’s say we want to create a table to record user information. It’s very common for us the create a table like:

create table user(
id serial primary key,
full_name varchar
);

Most of us used to create the primary key serial. Rememberserial is implemented through a sequence, which acts a bit like a counter. Each time we insert new data to the current table, the next sequence value will be incremented. For instance, if the current table has 5 rows inside and the last row has id 5, the next data we insert to the table will create a record with id 6.

So what’s the problem with using the serial id for the primary key?

Unwanted Information Disclosure

Let's say you create a page to view a user profile, and you create a path in the format /users/user_id where user_id is basically refers to the primary key on the user table. If you want to count the number of rows in a given table, all you have to do is to have the system generate a new entity and inspect its id. Imagine you go to Facebook, create a new account. You can roughly know the total user records inside Facebook.

Easily Iterated Entity

Let’s say the previous endpoint is open to the public. Even the most noob attacker now can harvest our table information just by looping and hitting the endpoint starting from endpoint 1 to N. It becomes very easy to scrape all your entities.

A Bottleneck for Horizontal Scalability

Sooner or later our table gonna grow so big, and we want it to still serve the user as fast as possible. At first, the choice is obvious to vertically increase the database size, from 2 core 4 GB to 4 core 8 GB, etc. Sooner or later we will hit the limit of how big our database can grow. Then we learn about horizontal scaling where we can have several master database to split the incoming load.

Now imagine we have 5 master databases that handle the insertion of a new record, and we set the table definition just like before using serial int as the primary key. Naturally, we do not want database A and database B to have the same id value that references a different user. Record with id 1 should belong to a user across the service. It’s an important requirement because we commonly have other tables that used foreign key or references to the user id. Imagine we have user A with id 1 on database A, and user B with id 1 on database B. Then we have an address table with id 1 and user id 1. Does it belong to user A or user B? Hence, we need to keep a unique id across all of our databases.

In order to keep the user id unique, we need a way to store the actual last sequence value before inserting the data into any database. Let’s say we do it the most naive way, we ask all the database to return its last sequence value and take the biggest one and increment it by one. As you can see this is introducing a bottleneck because you need to run the query to all databases just to find the next increment id.

I have seen a service that in order to fix this problem they keep sequence in another database like Redis that is still fast enough to give out the information. I am against using another database such asRedis because it’s pretty weird to have a different table just to count the next sequence value. In the worst-case scenario when using a database like Redis is when theRedis is restarted, we simply lost track of the sequence. Moreover, we added too many choreography just to do a simple thing like inserting a new record. As you can see although this may work up to a certain level, it’s clearly an inappropriate solution.

So what’s the solution?

Instead of using serial id use uuid as the primary key. UUID stands for Universally Unique IDentifier. UUID is guaranteed to be Universally Unique, which makes it good candidates for the primary key.

Remember that the UUID value is randomly generated. This property means:

  • It doesn’t expose information like how many records is currently in our table.
  • It cannot be easily iterated because it’s random.
  • It doesn’t introduce a bottleneck in our system for horizontal scalability because we do not need to find out what’s the next sequence value. We can just generate a new UUID and insert it into any of the databases.

Hence, it solves all the problem with using serial id.

What’s the drawback of using UUID?

Basically, we don’t have a guaranteed uniqueness. However, the probability of a collision is rather small. Keep in mind that in the extremely unlikely case of colliding UUIDs, it will be caught by the DB thanks to the primary key constraint. In a worst-case scenario, we can just try to reinsert the new data again with a newly generated UUID.

For the curious soul out there, you can read the probability of collision in detail here.

Voilà, and we are done. Now, you guys should have a better understanding of why we should graduate from using auto-increment or serial id and use UUID instead.

Software Engineer | INTJ | Choleric | The Questioner (CD) | Creator & Advisor

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store