Discussion on: Choosing Your Data Warehouse

View post

"Our particular data schema didn't lend itself to easily use the lowest tier Redshift instances. This meant a 40x price increase for a cluster using the next instance size up."

That's... no. I've been using Redshift since it was released, and I'm here to help.

If you're able to run your reporting workload on MySQL at all, you could almost certainly run it on one (1) dense-storage Redshift node for $.85/hr. If for some reason your data set is smaller but your compute needs more intense, a small cluster of 4-8 dense-compute nodes (at $.25/hr each) would work. There is no way you would ever need either of the XL node types.

What you say about the schema not fitting suggests you got the wrong idea about how hard it is to tune data distribution in Redshift. It's not that difficult. For a typical workload involving large facts and small dimensions, just set everything to EVEN and you'll be fine.

It may ultimately be fine to run this workload on MySQL. If that's what your team has capability with, great. And since MySQL finally added hash joins, it's not literally impossible to run serious analytic queries on it, like it used to be. Just recognize that it's not the right tool for the job, and you're giving up a ton of functionality (and potentially performance) compared to Postgres, Redshift, Greenplum, Vertica ($$), or (blech) Oracle.