Alex Rudenko

Posted on Dec 27, 2017 • Originally published at 60devs.com on Dec 27, 2017

Making Better HTTP APIs

#webdev #http #testing

For the last couple of years, I have been working with various HTTP APIs. Quite often those APIs were not the public and only made available to partner companies. Also, I have seen APIs developed by fellow developers and participated in the development of several APIs myself. Quite often these APIs have a design flaw which makes it harder to create reliable integrations using the API.

The problem I’m talking about is duplicate resource creation on errors. It’s essential when resource creation is bound to critical real-world operations such as payments.

Let’s take Paypal’s Create Payment API as an example: when you create a new payment resource (by issuing a POST request to /v1/payments/payment), PayPal charges the user immediately. If the transaction is successful, you get back the status code 201 Created complemented with the payment id. That means that if you encounter network issues while sending this request, there is no easy way to tell if the payment was successful or not because you don’t know the payment id. What’s worse is that if you have an automatic retry on network errors, you will undoubtedly charge your users twice at some point.

Of course, this is a known problem with the API and PayPal offers a solution to it. See How to avoid duplicate payments? You can use the header PayPal-Request-Id (not documented at the Payments API page though), or you can misuse the invoice number to de-duplicate the requests. But does the solution have to be so complicated? Both ways are not user-friendly: the consumer needs to have a reliable mechanism for generating request IDs so that duplicate requests have the same request ID; in the second case, what if you need to support multiple payments per one invoice? There might be a more elegant solution.

Solving duplicate resource creation with POST/PUT resource creation

You can easily avoid the problem if the POST request would do nothing more than a database entry and generation of the resource id. The flow is like this:

With this flow, it’s easy to retry requests when network failures happen. If you retry the POST request, it would just result in a duplicate empty resource. If you retry the PUT request, you are safe because PUT request is idempotent.

I find the pattern of POST/PUT creation to be more elegant despite the fact that it requires two requests to create a resource entirely. You may not like this approach, but my point is that you should support some way to de-duplicate POST requests if they result in significant real-world consequences. If you provide no such mechanism, your API is hardly suitable for stable and reliable integration.

Thanks for reading and hope it was helpful. Follow me on Twitter and share the problems you encountered with APIs and how you solved them.

Top comments (17)

Javier Ferrer • Jan 2 '18

I've worked with a similar approach, the difference was to do not rely on the server in order to generate the identifiers. I mean, it's the client who actually generate the identifier (UUID), and tells to the server all the resource details (including the identifier).

This way we would avoid having to always make the first POST request, and something also important, not having to return something such as the assigned identifier in this POST call response. This last concern is derived form CQRS concepts where the commands doesn't return information.

Thanks for sharing!

Alex Rudenko • Jan 2 '18

Thanks for the comment. Getting away with a PUT request only would be great but I am sure it's not always desirable to let clients decide on identifiers. Nevertheless, if possible, it's definitely better!

Javier Ferrer • Jan 2 '18

Just curious because maybe you've faced different scenarios which would justify that assertion. In which cases you would find not desirable to let the client decide on the resources identifiers?

Thx!

Alex Rudenko • Jan 2 '18

Mostly for these three reasons:

when IDs need to have a special format. For example, sequential, prefixed or have some other rules. This could be desired for different purposes, mostly, for easier handling by humans (uuid is less readable and quite long)
to avoid conflicts between different clients (if it's not uuid)
to simplify implementation of the clients which I cannot control (it could be wrong etc and it may be working only by accident)

Javier Ferrer • Jan 11 '18

My point of view:

IDs need to have a special format: In that case I would go for not having such "human readable identifier" as the actual resource identifier. I mean, we could have the resource identifier represented by an UUID, and an additional field with such format.
Avoid conflicts: As you said, this wouldn't be the case while using UUIDs.
Clients which I cannot control: The proposed solution would make the clients perform 2 different requests (POST + PUT), so it would be strange for a client being able to do so but not being able to generate an UUID and making one PUT request.
It may be working only by accident: If the client generates an invalid UUID it will throw an error such as invalid request from the server, or if the identifier is not generated randomly, it will throw an error while trying to save it, leaving no margin for this hypothesis, doesn't it?

Thanks for sharing!

Alex Rudenko • Jan 11 '18 • Edited

Thanks for the comment! I agree that what you suggest is a good solution and can work well for many use cases. But 1) you cannot always choose UUID as id for different reasons and an additional generated id may not help (if you can, use UUID though) 2) UUID generation on clients you don't control can break at any time in production if done wrong and then you will have frustrated users who always get an error. So what you say: give the clients a chance to mess up; what I say: give the clients no chance to mess up at the expense of two HTTP requests. You can decide what to do in any given case and if you want to optimize the two requests and compress the logic into one. But if the operation is quite important I would advocate for paying the price of two requests but having full control over IDs on the backend and making sure nothing can be messed up (by anyone except you as the backend developer)

Javier Ferrer • Jan 11 '18

👍👍👍

Thanks for the reply!

rhymes • Dec 28 '17

I like your approach/idea but I didn't understand how you apply it to the "duplicate payment problem"

If I understood correctly you say that we should issue a POST to get the final resource id/url, and then PUT the request BUT if it's a payment transaction with a network error, doesn't it result in the same problem?

CLIENT: POST /payments/new
SERVER: HERE YOU ARE THE ID
CLIENT: PUT /payments/123

let's assume now there's a network error and the client doesn't know if it's PUT went through or not so it decides to resubmit it, thinking that PUT, being idemponent, is safe.

Aren't we still left with 2 payments?

Let me explain: HTTP says PUT should be idempotent (so without side effects and with the same result). This is fine to update a record or an image or whatever but a payment it's not a side effect free operation.

So if I try two times to PUT the payment and the server both times issues a payment to the third party, I'm still left with double the amount less in my account.

Alex Rudenko • Dec 28 '17 • Edited

In the case of PUT operation, the server can easily check if a payment has been already sent for this particular payment id. When the server gets a new PUT request, it checks if the payment has been sent to a 3rd party and never sends it twice (this can be ensured internally via locks and/or transactions). For POST requests it's not that easy because it's hard to know the intent of the caller (whether it's a duplicate request or a new payment). Therefore, you need to come up with less elegant solutions such as adding a RequestId header and by this you shift the responsibility to the caller.

rhymes • Dec 28 '17

Oh now I got it, thanks! You use the id on the server instead of a generic "request id" on the client to shift responsibility.

I thought you were trying to do that still keeping the responsibility on the client, that's why I didn't understand.

Thanks! :-)

Alex Rudenko • Dec 28 '17

Have you ever dealt with the problem of duplicate requests yourself? If so, how?

rhymes • Dec 28 '17 • Edited

Yep, but using request ids, not really handy.

I see that Stripe API uses "Idempotency IDs" which allows the client to tell the server "Hey, this POST request is idempotent, treat it as such"

To perform an idempotent request, provide an additional Idempotency-Key: header to the request.

How you create unique keys is up to you, but we suggest using V4 UUIDs or another appropriately random string. We'll always send back the same response for requests made with the same key, and keys can't be reused with different request parameters. Keys expire after 24 hours.

See stripe.com/docs/api#idempotent_req... and masteringmodernpayments.com/blog/i...

Patrick Schönfeld • Dec 29 '17

„Both ways are not user-friendly: the consumer needs to have a reliable mechanism for generating request IDs“

Given that we are talking about payments and those are usually bound to a transaction like an order, there should already be such an identifier, e.g an order ID.

Although your idea looks interesting and nice, this doesn’t seem like the perfect use case for it.

Additionally having some sort of request ID included in the request could make troubleshooting easier if something goes wrong.

Alex Rudenko • Dec 29 '17

Given that we are talking about payments and those are usually bound to a transaction like an order, there should already be such an identifier, e.g an order ID.

Normally, yes. Even more, if your implementation is thorough enough, you can save all outgoing requests with their request IDs in a database related to the, say, customer record. But is it the best possible way?

Additionally having some sort of request ID included in the request could make troubleshooting easier if something goes wrong.

This is nice in any case. But is there really a need to use the request ID to turn non-idempotent POST requests into idempotent PUT-like requests? I would be really happy if all APIs in the world used at least request IDs to de-duplicate the POST requests. But it's not the best way because those requests IDs in the headers usually are optional and often overlooked. Heck, even PayPal's API doc does not mention it in the section about payment creation. I would prefer APIs to be designed in such way so that it's impossible to double-process things in the first place.

Although your idea looks interesting and nice, this doesn’t seem like the perfect use case for it.

Do you know a better example from your experience to illustrate the problem?

Patrick Schönfeld • Dec 29 '17

But is it the best possible way?

There is no universal answer to that question. Everything has its drawbacks.

But, depending on what you are optimizing for: Yes, it can be the best solution.

In this case there are drawbacks with your solution: it requires a second roundtrip for something that is a single operation, actually.
If you are aiming for a lot of requests this might not be what you want.

Second, one could say that it is not very RESTful, since you are basically introducing state. What if network errors lead to a lot of first requests to happen but not the second one? Then you have records you need to cleanup.

But it's not the best way because those requests IDs in the headers usually are optional and often overlooked.

Now that is a circular argument.

If sending a request ID is required for your use case, then you can use that field and be done with it, isn’t it so?

Alex Rudenko • Dec 29 '17

it requires a second roundtrip

This is true. I mention this in the post.

for something that is a single operation, actually

I would argue that one can view it as two operations: 1) getting an id for a new resource and 2) creation of the resource using PUT.

it is not very RESTful, since you are basically introducing state

The Request ID also introduces state because the second request with the same ID is rejected. Also, it changes the usual way POST requests work.

Then you have records you need to cleanup.

Indeed, you can clean them up later. It's easier than to cancel a duplicate payment though.

If sending a request ID is required for your use case, then you can use that field and be done with it, isn’t it so?

I would, and I would advocate for providing it, at least. Unfortunately, in practice, it's not always available because it's not a standard way of doing things. I would prefer to have a solution which eliminates duplication by design.

Alex Rudenko • Dec 28 '17

Btw great website with patterns: restalk-patterns.org/index.html

View full discussion (17 comments)