Looking around it does seem to be able to read from a single gzip but it doesn't seem to be straightforward. If you need to unzip the file then any gains on storage would be nullified.
I would be very interested to try it if you know a way and compare size and performance. Normally storage is cheap, at least a lot cheaper than other resources so performance is in most cases the priority and the compression is a nice to have (depends on the use case of course)
Sadly I don't use Dask, but in the past I have used zcat on a Linux command line to stream a csv to stdin for a script to then process without needing the whole of the data uncompressed in memory/on disk.
Cool I can totally see a use case for that streaming into something like Apache Kafka. I will prototype a couple of things and see if it can become another little article. Thanks!
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
If you are worried about space wouldn't you work from a gzipped CSV file? I wonder what the size of the CSV file is when gzipped - 9?
I believe Dask doesn't support reading from zip. See here: github.com/dask/dask/issues/2554#i....
Looking around it does seem to be able to read from a single gzip but it doesn't seem to be straightforward. If you need to unzip the file then any gains on storage would be nullified.
I would be very interested to try it if you know a way and compare size and performance. Normally storage is cheap, at least a lot cheaper than other resources so performance is in most cases the priority and the compression is a nice to have (depends on the use case of course)
Sadly I don't use Dask, but in the past I have used zcat on a Linux command line to stream a csv to stdin for a script to then process without needing the whole of the data uncompressed in memory/on disk.
Cool I can totally see a use case for that streaming into something like Apache Kafka. I will prototype a couple of things and see if it can become another little article. Thanks!