DEV Community

Stephen Senkomago Musoke
Stephen Senkomago Musoke

Posted on

Best Practices for Receiving Files via REST API

I am looking for best practices for receiving zip or text files via DRF based REST based API:

  • small files < 1MB to be stored on the file system (with authentication of course)

  • traffic will start at 25 to 100 clients a day growing later

  • What responses do I send back?

  • Any Performance issues that I will have to think about?

  • How do I verify the multipart files are actually received without corruption?

Top comments (1)

Collapse
 
zanehannanau profile image
ZaneHannanAU • Edited
  • Authenticate the user, with the upload.
  • Don't unzip, and if you do, use internal (eg uuid) names only and rarely extract non-store files, hold a structure of information in as well (aka -rw-rw-rw). Eg: 80dbf74f-8b72-40b1-ab62-7bd155188bc2 β†’ /file/in/a/directory.txt.
  • Send back an amount of file information with semantic response codes: 201 Created (everything has worked correctly), 401 (requires auth), 413 Payload Too Large (if exceeding max), or 507 Insufficient Storage (server full).
  • If you store files on-disk, store small files (eg sub-4KiB; or just below the block size) in a length-delimited means in a single file. This prevents some degree of abuse with zero or 1-length payloads. Having 4/6 bytes per file instead of 4 KiB (or more) to hold onto will save a large amount of space.
  • Don't load a whole file into memory. Stream it to the user, or use an apache means. Don't keep whole files in memory whilst dealing with them.
  • TCP has corruption prevention and so on built in. Tar, zip, rar, gz (and so on) have their own error-prevention built in as well.

After a few thousand users (or well, concurrent connections), you will likely need multiple servers. Do not attempt to use sequential numbering and IDs. Use multiple servers to their full extent. Use randomness and agree on a pseudo-random value.

This is more of a high-level overview of how I would personally perform it, as opposed to an actual implementation. The tl;dr is: don't over-complicate stuff, use the platform, and make documentation for what you're doing.