DEV Community


Please ELI5 what Parquet is for, and NOT for

Pacharapol Withayasakpunt
Currently interested in TypeScript, Vue, Kotlin and Python. Looking forward to learning DevOps, though.
・1 min read

I am trying to understand how good is Apache Parquet for

  • Data storage format (when you DO NOT have a Hadoop; only on your local computer)
    • How big is the size?
    • How reliable is it?
  • Query-able format
    • Do I have to index first? (Probably unique indices are not possible?)
    • Speed?
    • Resource usage?

As far as I understand, Parquet may not be good for frequent writes or updates; but is it good enough for a static database?

You can compare to the always popular SQLite, as a benchmark; disregarding SQLite features, such as foreign keys, unique indices, full text search and multiple tables.

BTW, I have seen SQLite file size goes to 700 MB for a few megabytes for final CSV data, and not sure if it is reliable as a storage anymore...

Discussion (0)