DEV Community

NAD
NAD

Posted on

Best Practices to upload more than 100K PDFs through REST API

I am working on application that needs to accept as an input list of files do some edits on the file and send them back through a call back URL.

  • Is there a better design ?
  • How can I avoid loss of data in case the service is down in callback ?
  • Max size of files through HTTP?

Top comments (5)

Collapse
 
_hs_ profile image
HS

An idea to try:
There's header called Content-Range and for list of files one by one upload can be combined front and back side with some "jobId" or such. So you can create job id on upload with list of file descriptors like name, total size, range, and some boolean uploaded. You make POST to create job id and file descriptors. You store progress of each file on both sides, frontend to know how to continue on crash, like which files to retry and from which point; back side on each chunk uploaded update Cache or DB record appropriately. Now this can only work if you have specific way of temp files and delete on complete strayegy (if you care about space). It's supper complicated and I'm not sure how useful. If you start uploading 2GB and temp file is persisted up to 1.6GB then you might see purposes as you can restart upload, server sends uploaded range and then that content range kicks in telling your frontend app from which point to read the file and continue the upload.

Note: Content-Range is mainly to continue download or server to server, and your requiring reverse, upload that is from client to server. So you could potentially use any other header but it might be worth while to investigate does framework you use or could use (or some library) solved this before

Collapse
 
karandpr profile image
Karan Gandhi • Edited

1.) Yeah. If the edits take a lot of time, your request will be pending and the user might cancel it. Just upload the files and set a response as "files uploaded successfully". Map the files to the user via Auth or Session and create a job ID for edits.
Once the edits are done notify the user that the files are ready for download.

2.)Limit the files uploaded in a single. As in 5 files and maximum 250MB. You can't do anything in case of service issues.
3.)The file size is probably dependent on the framework I guess. I have easily uploaded 1ish GB files in Express without any issues. From the internet 2-4GB works depending on OS.

Collapse
 
barakplasma profile image
Michael Salaverry

Try using a queue to schedule the work. Something like npmjs.com/package/p-queue which supports retry logic and is event driven.

Collapse
 
nveendl profile image
NAD

Thanks all for your replies.

One more clarification to add is the processing of the files will be client side as I need to read token at client machine to sign the document.

So the idea mainly from a portal I need to select files to be signed and then there will be a service started locally the take those selected files and then send them back to the portal.

the portal will connect with the local service (client side) REST through https.

Any other ideas can be better than explained above ?

Collapse
 
karandpr profile image
Karan Gandhi

Unless you are using a native application ,this will be problematic.
Browsers don't have access to methods in client OS.