DEV Community

loading...

dev.to API - returning 404 permanently once received it once.

InHuOfficial
Specialising in accessibility and website load speed / performance. If you have a question about [accessibility] or [page-speed-insights] ask away and I will help any way I can!
・1 min read

Just a quick one, has anyone ever played with the "unofficial" parts of the dev.to API?

I am trying to enumerate for new users (I know, naughty but I have good intentions!) and put together a page that welcomes new users to the site automatically.

The page works fine but gathering the users is proving to be problematic.

The way I am doing it at the moment is starting at the latest userID and then looking at the next userID and seeing if the page exists using the end-point https://dev.to/api/users/{userID}.

The problem is that once that end-point returns a 404 error (as the user does not exist yet) it will always return a 404 error to my script, even when the record does exist in the future and returns a 200 status in the browser.

I thought it might be rate limiting or something but the script works fine if I skip any userIDs I have tried previously (and if it is rate limiting a 404 error is misleading)?

I have tried sending headers with cache-control: no-cache and pragma: no-cache just in case but this still happens.

I am using cURL in php.

I am currently checking for a new user every 30 seconds, but even if I change this to 10 minutes I get the same problem.

Final question: is there an API end-point that actually lists new users as that would save me polling like this.

Discussion (6)

Collapse
djuber profile image
Daniel Uber

Just to be clear with what you're seeing (restating your question):

  • You're using the dev.to/api/users/ endpoint and adding one to the user id until you hit a 404 (and remembering that as the "first unseen user id").

  • You're coming back a few minutes later and starting from first unseen user id and incrementing, until you get to the next 404.

  • You're still seeing a 404 response on the ID that was not seen previously, but you're seeing requests for user ids larger than the missing one responding with a 200 response (and user information) during your subsequent checks.

One possible cause for this is that users can be deleted - if you remove your account after registering, the ID number doesn't get recycled or added back to the available pool of IDs, it's gone forever. If your forward search hits on a deleted user, that would give a 404, even though it might not be the most recently created user id, and would give a 404 response every time you requested it.

github.com/forem/forem/blob/9ff7f8... I don't see anything in the api code that offers more than "user by name, user by id, my own user" searches (there shouldn't be undocumented or hard to find endpoints to provide the information you're looking for). Peeking at the user model I didn't notice a default_scope that would limit this - so a 404 for a lookup by integer id should indicate there is in fact no such row in the database, and the question whether it's because such an id has been used and retired, or not yet used, is not answered by the API.

The take away is some user id numbers will be permanent 404's, and it might not be the case that the first 404 you encounter is an unassigned ID (you might want to adjust the script to try until you see 5 sequential ids giving 404s, or continue scanning forward unless the last 200 you saw was less than 6 hours old, since signups are pretty frequent).

Collapse
inhuofficial profile image
InHuOfficial Author

Thank you for the response ❤️, but I obviously wasn’t clear in my explanation.

I go through until I reach a couple of 404s and I assume at that moment in time that the last 200 found is the current maximum user ID (the last ID in the database).

If I then check that same ID 10 minutes later (when it does exist) I still get a 404 not found when using CURL.

However I can navigate to the same endpoint in the browser and get a 200 and the user information.

It is as if once I hit a 404 it gets cached on subsequent requests when using CURL.

Hopefully that makes more sense?!

Collapse
djuber profile image
Daniel Uber

Yes, and I can replicate that as well now.

I fetched one more than the maximum id, received a 404, then was able to view it in the browser using the api endpoint (the user now existed), and from another http client on the command line, but still get a 404 in the api client I made the first request with.

If I change user agents the issue goes away. I suspect a caching layer (I see varnish in the response headers?) is capturing enough "unique" information about your request to re-serve the last copy.

I was able to work around this by adding a ?name=x query to the request (user id is made up in this example):

GET https://dev.to/api/users/123123
=> 404
// Via: 1.1 vegur, 1.1 varnish, 1.1 varnish
// X-Served-By: cache-den8243-DEN, cache-pwk4962-PWK
// X-Cache: MISS, HIT

GET https://dev.to/api/users/123123?name=x
=> 200 with json body of user 123123
// Via: 1.1 vegur, 1.1 varnish, 1.1 varnish
// X-Served-By: cache-den8243-DEN, cache-pwk4962-PWK
// X-Cache: MISS, MISS
Enter fullscreen mode Exit fullscreen mode
Thread Thread
inhuofficial profile image
InHuOfficial Author

Interesting I did try adding a random parameter on the end of the string and it didn’t work, I will revisit that idea and send random user agent strings and see if that then works!

Thanks so much for looking at it (and I am glad it was not just me!) I will let you know if and when it works! ❤️

Thread Thread
inhuofficial profile image
InHuOfficial Author

P.s congrats at joining the dev.to / forem team!

Thread Thread
djuber profile image
Daniel Uber

Thanks! It's great to be here.

Since this looks like an unwanted api behavior, I copied it over to forem's github tracker (github.com/forem/forem/issues/13293).

Also (though it doesn't help your script directly) it looks like the cached 404 response does expire automatically, maybe after 1 hour.

Forem Open with the Forem app