Discussion on: Efficiently Streaming a Large AWS S3 File via S3 Select

View post

Replies for: Hi Idris, Great post! I tried to do a similar code where I select all data from a s3 file and recreated this same file locally with the same exact ...

Hi,
Glad that you liked the post and it helped you in your use-case.
With this process of streaming the data, you have to keep retrieving the file chunk from S3 until you reach the total file size. I would recommend to clone this repo and compare with your local code to identify if you missed something 😉

Optionally, I would recommend to also check out the sequel to this post for parallel processing 😁

Parallelize Processing a Large AWS S3 File

Idris Rampurawala ・ Jun 25 ・ 6 min read

#aws #python #showdev #productivity

Vinícius • Aug 5 '21

Thank you! I found out what I was missing, I made the start_byte = end_byte + 1. Losing one row per chunk. Your next article was exact what I was looking for for the next step of my program.