Exposing sequential IDs is bad! Here is how to avoid it.

Featured

When working on LOGaritmical, I initially had my primary keys defined as UUIDs. I took this approach for two reasons: security and to avoid collisions even when there are many rows. My initial reasoning was that I will probably need to store each log line in a separate entry and considering that one log can have a few thousands of lines, there was a small risk of overflowing the Integer. Was my reasoning correct? Probably not.

Article originally posted on my personal blog: How not to expose your primary keys

Furthermore, I stumbled upon an interesting article about using UUIDs as primary keys. There were some really great points about performance and query optimization. To keep it short, It is best to use numerical values as the primary key. So, I started to change my implementation. But then, another problem came into my head: security.

Why is exposing the PK bad?

By their nature, Auto Incremented primary keys grow by one for each new entry. This is great for avoiding collision, but it means that they are easily guessable. If you somehow find out that a user with the ID 27 exists, it most probably means that there are at least 26 other users with IDs 1 to 26. An attacker can try to exploit this by sending requests with different IDs. Furthermore, since the Administrator is usually the first user registered, it can easily guess the ID and try to get access.

There were also many reports, even on high-profile sites, where a full or partial dump of the contents could be done by simply incrementing the ID. This is how Parler data was exposed, for example. Other such attacks are rather easy to find, so exposing the internal ID is bad practice. You can still use an auto-incremented primary key internal and use it for all foreign keys as well, but whenever it is sent externally, an alternative must be found.

The Play Session Cookie

You may think that as long as your URLs do not contain the primary key, everything is ok. However, here is another scenario: cookies. Whenever a user is logged in, a session must be saved so that the user can remain logged in and can properly navigate the restricted pages. Play Framework does this by storing a Play Session Cookie that looks something like this:

Play Framework Session Coockie

This can easily be found using the browser’s inspector. At first sight, it may seem a random string of characters, however it is much more than that. It is actually a BASE64 encoded JSON string with some additional, non-human-readable data, at the end. Decoding the string is easy and, even though changing it in the cookie may not be possible, it can still provide valuable information to an attacker if the PK is stored.

Solution: How to avoid exposing the PK

You use an UUID! But wait, didn’t I just say that using UUIDs is bad? You still use a numerical, auto-incremented, value for the PK and FK, but attach to it, in a separate column, a UUID. Whenever you are dealing internally, in the same application/system with the data, the numerical primary key is used.

@Entity
@Table(name = "users")
public class UserDO {
    @Id
    @Column(name = "id", nullable = false)
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Integer id;

    @Column(name = "uuid", columnDefinition = "VARCHAR(36)")
    @Type(type = "uuid-char")
    private UUID uuid;

    @Column
    private String username;

    @Column
    private String email;

    @Column
    private String passwordHash;

    @Column
    private String passwordSalt;
}

However, when the data needs to be sent externally, you provide the UUID. Now the attacker can’t guess the UUID (or at least he will have a really hard time in doing so) and the internal workings are hidden. You store the user UUID in the session and use that one when performing security checks. You provide the UUID to the article/user/what-ever in the URL and use that one when searching for the right item. Once you find it, internally, you can start referencing it by the PK, just be careful not to expose it outside.

Any additional data that is referenced using FK can still be easily retrieved by the DB and Hibernate since the numerical value is being used. In short, you get the best of both worlds by using a numerical PK internally and an UUID externally. Initial search for an item my be a bit slower, but I believe it is negligible.

Article originally posted on my personal blog: How not to expose your primary keys

H2
H3
H4
3 columns
2 columns
1 column
Join the conversation now
Ecency