DEV Community

Rajnish
Rajnish

Posted on • Originally published at rajnishspandey.hashnode.dev

Databricks introduction

Databricks

it is a unified, open analytics platform for building, deploying, sharing and maintaining data, analytics, and AI solutions at scale.

Databricks Architecture and Services

Clusters

  • it’s a collection of VM (Virtual Machines) instances.

  • over which computational workloads are distributed across workers

Comparison

There are two types

All-Purpose Clusters Job Clusters
Analyse data collectively using interactive Notebooks Run automated jobs
Create cluster from the workspace or API The Databricks job scheduler creates job clusters when running jobs
Configuration information is retained for upto 70 clusters for upto 30 days Configuration information is retained for upto 30 most recently terminated cluster

Top comments (0)