DEV Community

Cover image for Designing APIs for humans: Object IDs

Designing APIs for humans: Object IDs

Paul Asjes on August 30, 2022

Choosing your ID type Regardless of what type of business you run, you very likely require a database to store important data like custo...
Collapse
 
jhelberg profile image
Joost Helberg

Enumeration attacks don't exist when using row-level-security. The prefixing is nice though and adds more than just human-readability. Uuids are problematic though, as they don't index very well.
Thanks for suggesting prefixing, I may investigate that for future use; there is a lot to say about it.

Collapse
 
alchemist_ubi profile image
Yordis Prieto • Edited

Hey, thank you so much for such insight. I am wondering about two things.

  1. Do you save the IDs as string in your databases following that format of [object type]_[object id] or do you only save the ID part and add the prefixes at the application layer?

  2. Any reason why you didn't follow URN? Without being dogmatic, just a simple format as [object type]:[object id] rather than [object type]_[object id].

Collapse
 
paulasjes profile image
Paul Asjes
  1. We do store the IDs as strings including the prefix in the database. This helps immensely when we're doing things that don't include the application layer, like data analysis.

  2. As TJ Mazeika mentioned, copy and pasting is easier with underscores than with colons :)

Collapse
 
alchemist_ubi profile image
Yordis Prieto

Hey @paulasjes, I'm coming back to this after a while. After reading the whole article and the comments, I read dev.to/paulasjes/comment/28ecl, and I am a bit confused now.

You said in that comment: "We don't use the ID as the primary key, as mentioned before we do some sharding magic with the exposed ID so the internal ID is a little different."

But your comment here says: "We do store the IDs as strings, including the prefix in the database."

It seems to conflict with each other; I am wondering which one it is.

I appreciate any help you can provide.

Collapse
 
alchemist_ubi profile image
Yordis Prieto

Do you do any optimization for those keys as primary keys?

Collapse
 
tmazeika profile image
TJ Mazeika

Regarding 2, I'm going to assume that it's because the latter is easier to copy and paste. Try double clicking cus:123 vs. cus_123.

Collapse
 
fillon profile image
fillon

Very good article. It should ease support when all your tables use UUID and trying to debug.

On the implementation part, how do you store the ids in a table (PK)?

Do you store the prefix_ in the table or just the part and handle the prefix outside the database?

Collapse
 
paulasjes profile image
Paul Asjes

It of course depends on your implementation and how you organise your database, but I'd just use the ID including prefix as the primary key. You could separate it into multiple columns, but that just introduces potential fail states where you accidentally use the randomised part without the prefix.

Collapse
 
davidspiess profile image
David Mair Spiess • Edited

Very interesting, thank you for the insights!
How do you decide how long a generated ID should be?
I noticed, that some stripe IDs are longer than others.

For example
customers: cus_MNlbRsTWfvcJ01
payments: ch_3AhqJiJdgChykuGw0S2YVeil

Does this mean you guess for each ressource the probability of a collision separately?
Did you ever need to increase the ID length for a specific resource after some time?
How do you store this ID efficient in your database? Do you use it as primary key or do you have a separate internal unique identifier?

Collapse
 
paulasjes profile image
Paul Asjes

Excellent questions! There is some additional magic that goes into the generated part of the ID. Long story short, we use part of the ID for database sharding. Some resource IDs are indeed longer than others, this is mainly to avoid collisions for resources that we expect to have a lot more of.

We have in the past increased the length of IDs. One example that comes to mind are API keys, which we changed to be up to 255 characters in length.

We don't use the ID as the primary key, as mentioned before we do some sharding magic with the exposed ID so the internal ID is a little different.

Collapse
 
michaelfecher profile image
Michael Fecher

very good question, was asking myself the same when i read the article.

let's try the tag to notify some moderators from stripe to get their attention. :D

stripe

Collapse
 
davidszabo97 profile image
Dávid Szabó

I'd be really interested in this, unfortunately, as far as I see Paul didn't answer your questions. @paulasjes I'm really hoping you have a few minutes to answer these questions. Thank you!

Collapse
 
omegarogue profile image
OmegaRogue

Snowflakes are similar in that they dont collide, but are completly numerical and the generation method involves the unix time stamp, meaning that by sorting them in ascending order you still get the same benefits you get from using sequential integers

Collapse
 
michaelfecher profile image
Michael Fecher

Another question regarding "exposure" of those IDs to REST APIs.
Officially, the underscore isn't supported in URLs (same as for colon).
Why did you chose it anway?
I'd rather go for a dash than for underscore.

Collapse
 
paulasjes profile image
Paul Asjes

Ease of use mainly, specifically for copy and pasting. Try double clicking on "pi-123" and "pi_123" to see the difference.

Collapse
 
iamngoni profile image
Ngonidzashe Mangudya

Interesting 👌Whats the best way of choosing the prefix itself?

Collapse
 
paulasjes profile image
Paul Asjes • Edited

Here's my recommendation:

  1. Plan your prefixes. Have an internal style guide for how to name objects. If you don't, you end up with inconsistent schemes. For example if you had a bank account object you could do:

    ba_

    or

    bankacct_

    Either is fine as long as you're consistent with all your objects.

  2. Remember your audience. Whether the object is public or internal only the intended audience is still an engineer. Your prefix should be obvious to anyone even if they don't have the necessary context. We made this mistake with PaymentIntents and SetupIntents:

    pi_

    and

    seti_

    (notice how they aren't consistent)

    If we could go back and redo those we'd name them payint_ and setint_ respectively. Slightly longer prefixes make understanding them much easier. You might have heard of PaymentIntents but you might not connect the dots with pi_, but you likely will with payint_.

Collapse
 
iamngoni profile image
Ngonidzashe Mangudya

Thank you.

Collapse
 
juan_es_teban profile image
Juan Esteban Garcia

Paul, thanks for sharing this article - it really made me think a lot about the way our existing API is designed. I have one question for your and I'd appreciate your insights... how would you go about implementing this ID format in an existing API with hundreds of users? Any guidance would be highly appreciated.

THANK YOU.

Collapse
 
paulasjes profile image
Paul Asjes

That's definitely tricky, but I'd just bite the bullet and start using prefixes for all new objects first. Downside is that you'd have a world where you have a mixture of both IDs, but hopefully over time the prefixed IDs would become dominant.

You could run a migration to add prefixes to older IDs, but you'd have to make sure that users of your API can still use the old IDs without a prefix to ensure backwards compatibility.

Collapse
 
juan_es_teban profile image
Juan Esteban Garcia

This is helpful, Paul. Thank you so much.

Collapse
 
drdamour profile image
chris damour • Edited

Exposing an id directly outside your app domain is fine for json-rpc.

RESTful practice is to always expose “ids” as urls, then any client can fetch that resource and know what actions can be taken on it. You change your http services, you 301 the old href to the new and the client updates all its references. It works beautifully, and its so simple to make that leap. The primary reason clients want an id is to plug it into some spot in another services url..but if you respond with that services href in your initial response they never need to do that plugging to begin with.

Then if youre really worried about replay attacks or enumerating your hypermedia controls hrefs can use temporary urls that only work for a window of time from a specific client (eg you append a signed jwt to em, with a client id/ip present). The href is opaque to the honest client, and the nefarious client cant hack the http request by just changing one part of it. And if the honest client takes to long, your 302 em to an auth challenge. Trust but validate with zero trust! But none of these things are options in json-rpc which is what you seem to be working with…too bad

Collapse
 
mdesousa profile image
Mario DeSousa

Great article Paul! These are great insights on the benefits of the IDs with a prefix.
I was wondering if you ever faced any challenges where the business team decides to rename a certain object... as an example, changing "customer" to "client". In this case the IDs that start with "cust_" would lose the meaning, and possibly cause confusion?

has stripe ever faced this? how was it resolved? just change the IDs going forward and leave legacy IDs as they are?

also, have you ever faced issues with changes to the prefix breaking code for your developers? for example, if their code expects an id with "cust_" and suddenly starts receive an id with "cli_"... has this been a problem?

Collapse
 
paulasjes profile image
Paul Asjes

It's tricky for sure. As a general rule once an object is named we don't ever rename it. We certainly would never start returning unexpected IDs without some sort of initial outreach to users. In your example, instead of changing "customer" to "client" we'd probably have those as two separate resources initially and deprecate the older one over time, whilst keeping the new resource backwards compatible with the older IDs. That way we don't ever unintentionally break anyone's integration.

Naming things is hard and we get it wrong sometimes too. One recent example I can think of is that we used pi_ for PaymentIntents. Later on we introduced SetupIntents, but couldn't use the prefix si_ as that was already being used for Subscription Items. We ended up in a world where we use pi_ and seti_, which is confusing as conceptually those are two similar objects.

We learned that when choosing prefixes it's better to lean on the more verbose side to be clear and to avoid future naming collisions. If we could go back and redo it I think we'd probably end up going with payint_ and setint_ for PaymentIntents and SetupIntents respectively.

Collapse
 
timothyokooboh profile image
timothyokooboh

Awesome article! Thank you for the insights.

What about combining prefixes with uuid?
That is either [resource_type]_[uuid] or using the URN spec syntax [resource_type]:[uuid]

Collapse
 
paulasjes profile image
Paul Asjes

You could certainly combine prefixes with UUIDs, we don't at Stripe because we also use database sharding. The IDs we generate have a shard key baked into them for faster lookup.

As for URN, we opted to use underscores rather than colons as it makes for easier copy and pasting (try double clicking on cus:123 versus cus_123).

Collapse
 
timothyokooboh profile image
timothyokooboh

Awesome! Thanks for the reply.

Collapse
 
abhikbanerjee99 profile image
Abhik Banerjee

This is a very interesting read indeed! Learned something new. Hopefully will incorporate in our company's next project.