DEV Community

loading...

Please ELI5 what Parquet is for, and NOT for

Pacharapol Withayasakpunt
Currently interested in TypeScript, Vue, Kotlin and Python. Looking forward to learning DevOps, though.
・1 min read

I am trying to understand how good is Apache Parquet for

  • Data storage format (when you DO NOT have a Hadoop; only on your local computer)
    • How big is the size?
    • How reliable is it?
  • Query-able format
    • Do I have to index first? (Probably unique indices are not possible?)
    • Speed?
    • Resource usage?

As far as I understand, Parquet may not be good for frequent writes or updates; but is it good enough for a static database?

You can compare to the always popular SQLite, as a benchmark; disregarding SQLite features, such as foreign keys, unique indices, full text search and multiple tables.

BTW, I have seen SQLite file size goes to 700 MB for a few megabytes for final CSV data, and not sure if it is reliable as a storage anymore...

Discussion (0)