zchtodd

Posted on Oct 9, 2019

What database do you use?

#discuss

I'm curious what databases people are using for their projects, and why you might have decided on that.

I'll list some of the databases I've used and my experiences with them.

PostgreSQL

I'm using PostgreSQL for a Python project, alongside SQLAlchemy as the ORM. The data model for the project is fairly complex, so it's been good to have SQL for the flexibility.

One downside so far has been that some of the queries have gotten pretty gnarly, and modifying them can feel like moving blocks in a Jenga tower.

Performance can also be a little unpredictable. For instance, one query either took ~400ms or 20 seconds to execute. In that case, the query was executing immediately after a big insert, and the table statistics were out of date, causing the planner to generate a really poor plan.

Issuing an ANALYZE command refreshed the table stats, and the planner started producing the 400ms version of the query. As far as I know, it's not possible to pin a query plan in PostgreSQL.

Cassandra

My experience with Cassandra was mixed. I think this was because the project was not well suited to an eventually consistent key-value store. The application developers spent a lot of time coming to terms with eventual consistency, lack of transactions, and the rigidity of the data model.

Cassandra can, however, support an amazingly high write through-put. If you don't need the flexibility of SQL, and a single machine can't support your demands, then Cassandra might be a viable option.

What are you using?

I'd love to hear what databases you're using, as well as the positives and negatives that come with them.

Top comments (12)

rhymes • Oct 9 '19

I've been using PostgreSQL for a long time and it honestly hasn't disappointed me.

Usually when I had slow queries the problem was the query itself (including indexes on the columns) or the server load. To be honest I rarely found bugs that affected the outcome of a query or skyrocketed the execution time.

Are you having problems with the output of EXPLAIN ANALYZE ?

Also: since the introduction of JSONB in PostgreSQL there's a lot of stuff you can get away with it without having to introduce a second DB in your stack.

zchtodd • Oct 9 '19

PostgreSQL is probably my favorite database too, and I totally agree, it's almost always my crappy queries that are to blame!

The case I talked about in the post was really interesting though, because the query planner was producing radically different plans, all depending on how "fresh" its table stats were.

I learned that table stats are automatically refreshed on a commit, but my slow query was running inside the same transaction, after a huge insert. So the insert skewed the table data in a way that the stats didn't reflect, therefore producing a bad plan.

The ANALYZE command (which is different than EXPLAIN ANALYZE), when executed against a table, tells PG to sample the table contents and refresh its statistics.

Brad • Oct 9 '19 • Edited

With my personal projects I always go with Mongodb for the simple reason that there are free tier hosting solutions out there (Yes I am that cheap).

At work I'd go with what is in the requirements or whatever is supported with the cloud provider we have to build on.

Jeremy Law • Oct 9 '19

PostgreSQL if I need a relational db, Mongo if I don't.

zchtodd • Oct 9 '19

I'm curious about what Mongo offers in those situations, because I've never used it, and I think my first instinct is to just use PG no matter what. (And I don't mean that in any kind of flame-bait way, I've just never used it.)

Jeremy Law • Oct 9 '19

For me it's personal preference. I work mostly with Javascript, so Mongos query syntax and JSON like document structure feel familiar.

Avalander • Oct 9 '19

Same.

Marvin Trilles • Oct 9 '19

mssql at my 9-5 work, mongo and mysql for all others :)

zchtodd • Oct 9 '19

I've never used MS SQL myself. One really random feature that I wish Postgres had, that MS SQL actually has, is "temporal tables". You can basically write a query and say "as of" a certain time, and the results will reflect what was in the database as of that time.

System time in SQL Server