I had a bit too much 🍷 yesterday so i'm a bit foggy, however that's not stopping me from learning today!
Spark appears to use ParquetOutputCommitter - a function of Hadoop to write parquet files to S3.
Digging into an issue we've had writing Spark to S3 we came across a fix described here which involves setting a config value in hadoop:
Testing it - it appears to work.
I made a PR to set this going forward from the Hadoop / Hive side of things.
This can also be set in Spark with the following property:
The buzz around the office is that Helm 3 beta 2 has been released. Helm 3 is an important release for Helm as it removes the dependency on Tiller.
It is so important it seems to have warranted a 7 part blog series on their website
Also Microsoft has a great article describing why Helm 3 is important.
Apache Zeppelin and Jupyter are both interactive notebooks that you can use to do data science things like perform calculation, plot graphs, etc.
Jupyter notebooks run python in the background. Apache Zeppelin uses JVM underneath the hood.
As for features I enjoy, Jupyter is an offshoot of iPython, which I enjoy quite a bit for doing Python work.
Apache Zeppelin seems to be a little more robust for non-python languages, and also their demo is pretty sweet, being able to use their sweet Angular graph UI is pretty swell.
I have been promising myself i'm going to learn prometheus for too long. It's time to dig into the awesome-prometheus list....
I also created a new GKE cluster on my own Google Cloud account for testing.. Compared to Azure and AWS it's Kubernetes easy mode. More to come with monitoring and my new test cluster next week :)
Happy Labor day - seeya Tuesday!