DEV Community

Cover image for How to Create Jobs With Multiple Tasks in Databricks
John Paul Ada
John Paul Ada

Posted on

How to Create Jobs With Multiple Tasks in Databricks

Original Blog Post: My Blog - How to Create Jobs With Multiple Tasks in Databricks

Note: This video is in Filipino/Tagalog.

When working with data it's very useful to be able to create data pipelines where we can have tasks in a pipeline run in sequence or in parallel or have dependencies witch each other, where the next tasks will only run when their dependencies or previous tasks have finished running.

To accomplish, usually we'd use external task orchestrators like Airflow, Prefect, Nifi, or Oozie.

In the version of Databricks as of this writing, by default we are unable to create jobs with multiple tasks as shown here:

Single Task Job

But there's a way to add multiple tasks to a job in Databricks, and that's by enabling Task Orchestration. At the time of this writing, Task Orchestration is a feature that's in public preview. This means that it's currently available for everyone, but it's not enabled by default.

To enable it we first go to the the Admin Console:

Admin Console

Then go to Workspace Settings tab:

Workspace Settings

Then we'll search Task on the search bar. We'll then be able to see the switch for Task Orchestration:

Enable Task Orchestration

It might take some time to take effect but once that's enabled, we will now be able to see a button for adding another task to our job:

Add Job Button

After that we can just add a task similar to what we're used to. The only difference is now there is a Depends on field where we can specify the dependencies of our current task:

Depends On

Once we're done adding that, we can run our job and see something like this when it's complete!

Complete Run

And that's a wrap! I hope this helps you setup your own data pipeline in Databricks!

Discussion (0)