DEV Community

Cover image for Postgres Is Underrated—It Handles More than You Think

Postgres Is Underrated—It Handles More than You Think

Jason Skowronski on October 09, 2019

Thinking about scaling beyond your Postgres cluster and adding another data store like Redis or Elasticsearch? Before adopting a more complex infra...
Collapse
citizen428 profile image
Michael Kohl

Postgres 12 finally added generated columns, which also come in handy for fulltext search (no more triggers for updating tsvector columns):

pgdash.io/blog/postgres-12-generat...

Collapse
thojest profile image
thojest

Hi, this is a really nice article. The funny thing is, it comes at the right time. We use Postgres and are currently considering to add Redis behind our rust backend with websockets to enable push notifications for turtle (turtle.community), but are not quite sure if this is the right way.

Does anyone have some inspiration or experience on that topic?

Collapse
zchtodd profile image
zchtodd

I'm not sure if it fits your use case perfectly, but take a look at NOTIFY, introduced in PG 9.0 -- it's basically a push notification mechanism built into PostgreSQL.

Collapse
thojest profile image
thojest

thx for the info, will have a look!

Collapse
citizen428 profile image
Michael Kohl

Not Rust, but I wrote a post about using LISTEN/NOTIFY and a small Ruby app to push DB updates over a websocket:

Collapse
thojest profile image
thojest

thx a lot :)

Collapse
iamkarshe profile image
Utkarsh Kumar Raut

Really nice project, Turtle.

Collapse
markpieszak profile image
Mark Pieszak

I'm loving your articles Jason, great job again !
It's hard to find in-depth and super high-quality articles like these in this day and age :)

Collapse
mostlyjason profile image
Jason Skowronski Author

This is an amazing complement! Mind if I use this quote on my site?

Collapse
markpieszak profile image
Mark Pieszak

Of course! My pleasure :)

Anytime, keep up the great work!

Collapse
sonnk profile image
Nguyen Kim Son

Great article Jason! It's really important to limit the number of technologies used in a project and not follow trends blindly! This is quite a similar analogy to the microservice vs monolith debate.

Collapse
rhamedy profile image
Rafiullah Hamedy

Great write up Jason. I wasn't aware of PL/Python until now. Definitely a big fun of the JSON data types and have never used hstore but, thank you for sharing the use case.

There is also Postgres-XL that seems to offer a lot and only 5 years old compared to 3 decade-old PostgreSQL.

Collapse
davedecahedron profile image
David Howell • Edited on

This is a great article!

You mention “Use multi-column indexes sparingly“, which is generally good advice, however I would qualify that by saying avoid redundant indexes. It’s important to know that multicolumn (btree) indexes have a specific ordering to the columns. If you don’t filter or join a table using one or more of the columns in that order then it can’t use the index.

For example an index on columns a,b. If you only filter on column b then this index can’t be used. It can be used if you filter by just a, or both a and b. Following on from that, if you frequently filter by both a and b then this is a good index to have. If you also have both a single column index on a, and a multicolumn index on a,b then in that specific case the single column index is the redundant one.

Collapse
brianjohnsonsr profile image
Brian Johnson

I think another really important PG plugin to add is TimeScaleDB -- the ability to add efficient time series data collection/query to Postgres is awesome.

Collapse
robconery profile image
Rob Conery

Hey Jason great post! Quick thing on the full text example you have: if you use plianto_tsquery it will split the words for you and add the AND. If you use phraseto_tsquery it will apply a positional argument (<->) instead of the &, which is great if you're looking for a name or place. The new websearch_to_tsquery (in PG 11) is a great general purpose query builder as well.

Collapse
ccleary00 profile image
Corey Cleary

I wasn't aware of tsquery, thanks for pointing that out! - I'm working on a project right now where full on Elasticsearch is probably overkill, but needed something more robust than just doing LIKE

Collapse
scottw profile image
Scott Watermasysk

100% especially on search. Search in Postgres is very underrated.

Collapse
steph_baltus profile image
Stephanie Baltus

Hey, many thanks for this article, I'm working with Postgres for a while now, but still, I've learned a lot of great stuff here !

I just wanted to add that Postgres also offers HyperLogLog as a data type, just by adding an extension (see this great article for more details.)

Collapse
cwreacejr profile image
Charles Reace

Don't forget Materialized Views: can be an awesome way to optimize searches on things that you might be tempted to dump into some NoSQL text-based search tool.

Collapse
aurelmegn profile image
Aurel

Thanks for this post,
I wrote about using postgres to setup a distributed database. Here is the link for those who are interested dev.to/sh1ftsh/setting-up-distribu...

Collapse
fultonbrowne profile image
Fulton Browne

Great article, I use postgres on heroku for database stuff all the time, I love it and I recommend it all the time.

Collapse
amatosg profile image
Alejandro • Edited on

This article was definitive, it was the last straw. I have been thinking about switching my Java application to postgres from mysql and now I'm in the middle of the process using pgloader and everything has worked as expected.

Yet I have an unresolved question: does it make sense to have a 2nd level cache in Hibernate when postgresql already has caching? thanks!

Collapse
mostlyjason profile image
Jason Skowronski Author

Thanks I'm glad enjoyed it! The Hibernate second level cache lives on your application server, whereas the postgresql cache lives on the database of course. This matters because it's faster to retrieve or update data already stored on your application server. It reduces network and database load by removing duplicate queries and batching writes. This is great for applications with many reads and infrequent writes, or cases where eventual consistency on writes is acceptable.

Collapse
thejoezack profile image
Joe Zack

Thanks for the post, I learned a lot of new things! I'd also like to mention that OLAP style queries can be rough with lots of data.

Collapse
davedecahedron profile image
David Howell

I am also thinking about this. RDBMS generally are good at most types of workloads but mixed workloads like OLTP and OLAP on the same system will interfere with each other.

How do people do embedded reporting AND transactions in modern web apps?

NoSQL solutions like ElasticSearch are mentioned but they seem more appropriate for search. Data warehouse solutions like Snowflake, BigQuery, Redshift are good for internal analytics and reporting but they just don’t have the concurrency to support direct queries from customer facing apps.

How else to do this without complex data pipelines or complex infrastructure involving Kafka?

Collapse
thejoezack profile image
Joe Zack

I do it with a complex data pipeline and infrastructure involving Kafka!

I've got my eye on Apache Druid, though I haven't spent any real time (sorry, pun totally intended) with it.

Thread Thread
davedecahedron profile image
David Howell

I’ve looked at Druid, also considering MemSQL , ClickHouse and others

Collapse
idoshamun profile image
Ido Shamun

Wow! An awesome review of Postgres important features.
I can't stress out more how devs sometimes just follow the hype without thinking. When someone suggests a NoSQL data store, I always ask her/him to convince me why SQL doesn't work here.

Collapse
okbrown profile image
Orlando Brown

Where have I read this article before? Feels like either this is a clone or the other is a clone. Brb...

Collapse
okbrown profile image
Orlando Brown

Found it, it's the same guy. Fair play!
infoq.com/articles/postgres-handle...

Thanks for the other one, it was very useful.

Collapse
mostlyjason profile image
Jason Skowronski Author

Maybe InfoQ or Hackernoon? I posted it there as well :)

Collapse
hasii2011 profile image
Humberto A Sanchez II

Nice article

Collapse
zzcoder profile image
zzcoder

Just want point out one error (or misunderstanding), B-Tree is not binary as stated in "B-tree indexes B-tree indexes are binary trees"

Collapse
mostlyjason profile image
Jason Skowronski Author

Thanks I meant to say "balanced trees"

Collapse
codethug profile image
Nicolas Quijano

More in-depth articles of this kind will make this old code bum happy ;)

Collapse
zoechi profile image
Günter Zöchbauer

Nice article!

Nit - there is one redundant "are" in "Extensions are to Postgres are"

Collapse
mostlyjason profile image
Jason Skowronski Author

Thanks updated!

Collapse
fil2fip profile image
fil2fip

Nice article ! Thank you.

I think there'e a typo in the check strings length example (b- vs b=).

Collapse
mostlyjason profile image
Jason Skowronski Author

Thanks updated!

Collapse
kelsonpw profile image
Kelson Warner

The first paragraph is repeated twice

Collapse
mostlyjason profile image
Jason Skowronski Author

Doh! Thanks I fixed it :)

Collapse
vitorscassiano_34 profile image
Vitor Cassiano

Great article.
Congratz!

Collapse
pavelbisse profile image
Pavel Bisse

Hey Jason, thanks for nice post!
There is a HyperLogLog PostgreSQL extension btw.
github.com/citusdata/postgresql-hll

Collapse
alfuken profile image
Comment marked as low quality/non-constructive by the community. View Code of Conduct
Bohdan Schepansky • Edited on

Yyyyeah... You love PG... Until you need to do a simple dumb forking count on it. That's where PG sucks big time.

Collapse
davedecahedron profile image
David Howell

Is “forking” count some special operation or just a nice word in place of swearing?

Regarding (exact) count, pretty much every system has trouble doing this quickly and on most cases you really don’t need an exact count. This is especially true for medium to large data.

Table/index statistics that are kept up to date will give a good approximation.

HyperLogLog was one option mentioned which will give good-enough approximations, another approach is log-normal histograms. I don’t think this challenge is unique to PG.

Collapse
alfuken profile image
Comment marked as low quality/non-constructive by the community. View Code of Conduct
Bohdan Schepansky

Yes, it's a replacement word, and a reference to "The Good Place" TV series.

And regarding count, no system suffer as much from it as PG. Approximations are rarely an option, and while this challenge is definitely not unique to PG, the PG is doing it worst of all DBMS. Except, maybe, sqlite, and even that I'm not sure.
Such a simple functionality, but can't be implemented properly for more than a dozen of years already. PG's count can be 15x times slower than that of MySQL and 70x slower than Oracle or SQLServer. And that sheet has been there for ages. And is till there.