Thinking about scaling beyond your Postgres cluster and adding another data store like Redis or Elasticsearch? Before adopting a more complex infra...
For further actions, you may consider blocking this person and/or reporting abuse
Hi, this is a really nice article. The funny thing is, it comes at the right time. We use Postgres and are currently considering to add Redis behind our rust backend with websockets to enable push notifications for turtle (turtle.community), but are not quite sure if this is the right way.
Does anyone have some inspiration or experience on that topic?
I'm not sure if it fits your use case perfectly, but take a look at NOTIFY, introduced in PG 9.0 -- it's basically a push notification mechanism built into PostgreSQL.
thx for the info, will have a look!
Really nice project, Turtle.
I'm loving your articles Jason, great job again !
It's hard to find in-depth and super high-quality articles like these in this day and age :)
This is an amazing complement! Mind if I use this quote on my site?
Of course! My pleasure :)
Anytime, keep up the great work!
I think another really important PG plugin to add is TimeScaleDB -- the ability to add efficient time series data collection/query to Postgres is awesome.
This is a great article!
You mention “Use multi-column indexes sparingly“, which is generally good advice, however I would qualify that by saying avoid redundant indexes. It’s important to know that multicolumn (btree) indexes have a specific ordering to the columns. If you don’t filter or join a table using one or more of the columns in that order then it can’t use the index.
For example an index on columns a,b. If you only filter on column b then this index can’t be used. It can be used if you filter by just a, or both a and b. Following on from that, if you frequently filter by both a and b then this is a good index to have. If you also have both a single column index on a, and a multicolumn index on a,b then in that specific case the single column index is the redundant one.
Great article Jason! It's really important to limit the number of technologies used in a project and not follow trends blindly! This is quite a similar analogy to the microservice vs monolith debate.
Great write up Jason. I wasn't aware of
PL/Python
until now. Definitely abig
fun of theJSON
data types and have never usedhstore
but, thank you for sharing the use case.There is also
Postgres-XL
that seems to offer a lot and only5
years old compared to3
decade-oldPostgreSQL
.Hey Jason great post! Quick thing on the full text example you have: if you use
plianto_tsquery
it will split the words for you and add the AND. If you usephraseto_tsquery
it will apply a positional argument (<->) instead of the &, which is great if you're looking for a name or place. The newwebsearch_to_tsquery
(in PG 11) is a great general purpose query builder as well.I wasn't aware of
tsquery
, thanks for pointing that out! - I'm working on a project right now where full on Elasticsearch is probably overkill, but needed something more robust than just doingLIKE
100% especially on search. Search in Postgres is very underrated.
This article was definitive, it was the last straw. I have been thinking about switching my Java application to postgres from mysql and now I'm in the middle of the process using pgloader and everything has worked as expected.
Yet I have an unresolved question: does it make sense to have a 2nd level cache in Hibernate when postgresql already has caching? thanks!
Thanks I'm glad enjoyed it! The Hibernate second level cache lives on your application server, whereas the postgresql cache lives on the database of course. This matters because it's faster to retrieve or update data already stored on your application server. It reduces network and database load by removing duplicate queries and batching writes. This is great for applications with many reads and infrequent writes, or cases where eventual consistency on writes is acceptable.
Hey, many thanks for this article, I'm working with Postgres for a while now, but still, I've learned a lot of great stuff here !
I just wanted to add that Postgres also offers HyperLogLog as a data type, just by adding an extension (see this great article for more details.)
Great article, I use postgres on heroku for database stuff all the time, I love it and I recommend it all the time.
Thanks for this post,
I wrote about using postgres to setup a distributed database. Here is the link for those who are interested dev.to/sh1ftsh/setting-up-distribu...
Thanks for the post, I learned a lot of new things! I'd also like to mention that OLAP style queries can be rough with lots of data.
I am also thinking about this. RDBMS generally are good at most types of workloads but mixed workloads like OLTP and OLAP on the same system will interfere with each other.
How do people do embedded reporting AND transactions in modern web apps?
NoSQL solutions like ElasticSearch are mentioned but they seem more appropriate for search. Data warehouse solutions like Snowflake, BigQuery, Redshift are good for internal analytics and reporting but they just don’t have the concurrency to support direct queries from customer facing apps.
How else to do this without complex data pipelines or complex infrastructure involving Kafka?
I do it with a complex data pipeline and infrastructure involving Kafka!
I've got my eye on Apache Druid, though I haven't spent any real time (sorry, pun totally intended) with it.
I’ve looked at Druid, also considering MemSQL , ClickHouse and others
Is “forking” count some special operation or just a nice word in place of swearing?
Regarding (exact) count, pretty much every system has trouble doing this quickly and on most cases you really don’t need an exact count. This is especially true for medium to large data.
Table/index statistics that are kept up to date will give a good approximation.
HyperLogLog was one option mentioned which will give good-enough approximations, another approach is log-normal histograms. I don’t think this challenge is unique to PG.
Don't forget Materialized Views: can be an awesome way to optimize searches on things that you might be tempted to dump into some NoSQL text-based search tool.
Wow! An awesome review of Postgres important features.
I can't stress out more how devs sometimes just follow the hype without thinking. When someone suggests a NoSQL data store, I always ask her/him to convince me why SQL doesn't work here.
Where have I read this article before? Feels like either this is a clone or the other is a clone. Brb...
Found it, it's the same guy. Fair play!
infoq.com/articles/postgres-handle...
Thanks for the other one, it was very useful.
Maybe InfoQ or Hackernoon? I posted it there as well :)
Nice article ! Thank you.
I think there'e a typo in the check strings length example (b- vs b=).
Thanks updated!
thx a lot :)
The first paragraph is repeated twice
Doh! Thanks I fixed it :)
Nice article!
Nit - there is one redundant "are" in "Extensions are to Postgres are"
Thanks updated!
More in-depth articles of this kind will make this old code bum happy ;)
Just want point out one error (or misunderstanding), B-Tree is not binary as stated in "B-tree indexes B-tree indexes are binary trees"
Thanks I meant to say "balanced trees"
Great article.
Congratz!
Hey Jason, thanks for nice post!
There is a HyperLogLog PostgreSQL extension btw.
github.com/citusdata/postgresql-hll
Nice article