Databricks comes with a CLI tool that provides a way to interface with resources in Azure Databricks. It’s built on top of the Databricks REST API and can be used with the Workspace, DBFS, Jobs, Clusters, Libraries and Secrets API
In order to install the CLI, you’ll need Python version 2.7.9 and above if you’re using Python 2 or Python 3.6 and above if you’re using Python 3.
To install the CLI, use the following pip command:
pip install databricks-cli
or if you’re using Python 3:
pip3 install databricks-cli
You can confirm that everything is working by running the following command:
Before we can actually use the CLI and its commands, we’ll need to set up Authentication for it. To do this, we use a Databricks personal access token. To generate a token, head into user settings in your Azure Databricks profile and go to access tokens.
Click on Generate New Token and enter a description and expiration period.
Once you’ve done that you’ll be presented with a token. Copy and Paste this into somewhere safe as this will be the only time you’ll see it. Once the token is created, you should see it in your Access Tokens list.
Now that we have a token, we can set up authentication to use the Databricks CLI. To do this, open a command prompt and type in the following command:
databricks configure --token
You’ll need to provide the host and token in order to authenticate it. The host should just be https://.azuredatabricks.net. You do not need to include the 0=whatevernumberhererubbish. If you do, you’ll get JSONDecodeErrors when trying to call commands.
Once everything is set up, we can now use the Databricks CLI to interface with our resources in Databricks! To test it out, we can see what’s in our workspace directory just like we would on our local machine by running the following command:
databricks workspace ls /Users/<your\_username>
You should get a list of all the resources inside your username folder.
If you want to get some help as to what command parameters you can use in the CLI, you can run the following command:
databricks <resource> -h
For example, if I wanted to know what commands I could use when working with the DBFS CLI, I’d run the following command:
databricks fs -h
And I’d get the following output:
In this short article, we’ve set up the Databricks CLI tool so we can interface with our Databricks workspace. Just remember that your Access Tokens will have a limited lifespan and once it runs out, you’ll need to reconfigure you Databricks workspace.
But if you love using CLI instead of UI (Who doesn’t?) then this is a handy tool to use against your Databricks instance.
As an added bonus, the CLI project is hosted on GitHub! So if you want to develop more features for it, go ahead :)
I hope this article has helped you. If you have any questions, just comment below and I’ll do my best to answer it!
A social network for devs?
Level up every day