As we come to the end of the year I thought I should share some of the articles that I bookmarked this year during my journey in learning and working with Databricks, a Unified Data Analytics Platform.
Using a remote Databricks cluster from a local Jupyter notebook. This article shows how Data Scientists can work in their familiar local environments with JupyterLab and access remote data and remote clusters in a consistent way.
Data pipeline with Structured Streaming. This article illustrates how to build data pipelines for high volume streaming use cases like mobile game analytics using Databricks Delta.
Building a Machine Learning Data Pipeline with Delta Lake. This article demonstrates how Delta Lake is the ideal platform for the machine learning life cycle because it offers tools and features that unify data science, data engineering, and production workflows.
Schema enforcement is the yin to schema evolution’s yang. This article shows how Delta Lake uses schema validation on write to keep compatibility with the target table.
Migrating Transactional Data to a Delta Lake. This article explains how to tackle some of the challenges with moving data from databases to data lakes. In this example they use the AWS Database Migration Service.
Migrating from Hadoop to modern cloud platforms. This article talks about the challenges with Hadoop architectures and how to move towards modern cloud data platforms.
I hope these articles are also useful to you. Keep an eye on the Databricks blog for 2020 and if you need help solving a big data problem please reach out, I might be able to help. Happy New Year!