DEV Community

Darren Fuller
Darren Fuller

Posted on

Nu Shell and Databricks

I'm a big fan of the command line. It's often something that can seem daunting to people at first, but with a little time and patience you can often speed up many tasks just by knowing some useful commands and how to chain them together.

Most of the time I'm in Powershell which, thanks to Powershell Core, is now cross-platform and incredibly powerful. But I'm finding myself also using Nu more and more. In both shells I also use the Databricks CLI a lot. Want to check the status of jobs? Use the CLI. Want to upload and download data? Use the CLI. And so on.

Whilst the Databricks CLI is useful, there's times where I want a little more power over it. Such as, using the CLI to find a Databricks runtime version which is under Long Term Support (LTS) and is Photon enabled. So, I can do this using, for instance, the Databricks CLI and some jq. But I'm also lazy and wanted something that's a bit easier to query, and displays nicer, and is easier to output to something like CSV afterwards.

Well, I can get all of that from Nushell. The only downside is that it's quite a few commands to get the data into the right shape to make querying it easy. So, instead, lets do the tedious bits and save them as a command aliases. So, lets fire up Nushell and give it a go.

First up, lets find our config file.

> config path
C:\Users\DarrenFuller\AppData\Roaming\nushell\nu\config\config.toml
Enter fullscreen mode Exit fullscreen mode

Yours will look different to this, but this is the file we need to add our command aliases to.

Now, lets work out what our command looks like. I want to create a command that calls the Databricks CLI for the runtime versions and adds some useful information such as if it's an LTS version. So what does that look like?

>  databricks clusters spark-versions 
    | from json 
    | get versions 
    | insert isLTS { get name | str contains "LTS" } 
    | insert isML { get name | str contains "ML" } 
    | insert photonEnabled { get name | str contains -i "Photon" }
    | insert details { get name | parse "{runtime} (includes Apache Spark {spark},{remainder}" }
    | insert runtime { get details.runtime } 
    | insert spark { get details.spark }
    | reject details
Enter fullscreen mode Exit fullscreen mode

I've put that over multiple lines to make it easier to read, but if you want to run it you'll need to have it all on the same line, like this.

> databricks clusters spark-versions | from json | get versions | insert isLTS { get name | str contains "LTS" } | insert isML { get name | str contains "ML" } | insert photonEnabled { get name | str contains -i "Photon" } | insert details { get name | parse "{runtime} (includes Apache Spark {spark},{remainder}" } | insert runtime { get details.runtime } | insert spark { get details.spark } | reject details
Enter fullscreen mode Exit fullscreen mode

So what's it doing? Lets break it down a bit.

databricks clusters spark-versions
Enter fullscreen mode Exit fullscreen mode

Run the Databricks CLI to get the available runtime information

from json
Enter fullscreen mode Exit fullscreen mode

Parses the response from JSON as a table

get versions
Enter fullscreen mode Exit fullscreen mode

Gets the "version" part of the response object

insert isLTS { get name | str contains "LTS" }
Enter fullscreen mode Exit fullscreen mode

Adds a new "isLTS" column by looking for the term "LTS" in the runtime name

insert isML { get name | str contains "ML" }
Enter fullscreen mode Exit fullscreen mode

Adds a new "isML" column by looking for the term "ML" in the runtime name

insert photonEnabled { get name | str contains -i "Photon"
Enter fullscreen mode Exit fullscreen mode

Adds a new "photonEnabled" column by doing a case-insensitive search for "Photon" in the runtime name

insert details { get name | parse "{runtime} (includes Apache Spark {spark},{remainder}"
Enter fullscreen mode Exit fullscreen mode

Adds a new "details" column by parsing the name name and extracting key information (in curly braces)

insert runtime { get details.runtime }
Enter fullscreen mode Exit fullscreen mode

Adds a new "runtime" column by getting the runtime information from the details column

insert spark { get details.spark }
Enter fullscreen mode Exit fullscreen mode

Adds a new "spark" column by getting the spark version information from the details column

reject detail
Enter fullscreen mode Exit fullscreen mode

Removes the "details" column

That's a lot of commands to run each time, so lets instead save this as a command alias in our config file.

startup = [
    "alias dbx-runtimes = ( databricks clusters spark-versions | from json | get versions | insert isLTS { get name | str contains \"LTS\" } | insert isML { get name | str contains \"ML\" } | insert photonEnabled { get name | str contains -i \"Photon\" } | insert details { get name | parse \"{runtime} (includes Apache Spark {spark},{remainder}\" } | insert runtime { get details.runtime } | insert spark { get details.spark } | reject details )"
]
Enter fullscreen mode Exit fullscreen mode

Here I've aliased the command with the name dbx-runtimes. I've also had to escape the double-quotation marks. But now that we have this we can run all of the above by simply calling the alias.

> dbx-runtimes
Enter fullscreen mode Exit fullscreen mode
────┬──────────────────────────────────┬────────────────────────────────────────────────────────────────────┬───────┬───────┬───────────────┬────────────────────────────┬───────
 #  │               key                │                                name                                │ isLTS │ isML  │ photonEnabled │          runtime           │ spark
────┼──────────────────────────────────┼────────────────────────────────────────────────────────────────────┼───────┼───────┼───────────────┼────────────────────────────┼───────
  0 │ 6.4.x-esr-scala2.11              │ 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11)     │ false │ false │ false         │ 6.4 Extended Support       │ 2.4.5
  1 │ 7.3.x-cpu-ml-scala2.12           │ 7.3 LTS ML (includes Apache Spark 3.0.1, Scala 2.12)               │ true  │ true  │ false         │ 7.3 LTS ML                 │ 3.0.1
  2 │ 7.3.x-hls-scala2.12              │ 7.3 LTS Genomics (includes Apache Spark 3.0.1, Scala 2.12)         │ true  │ false │ false         │ 7.3 LTS Genomics           │ 3.0.1
  3 │ 10.2.x-gpu-ml-scala2.12          │ 10.2 ML (includes Apache Spark 3.2.0, GPU, Scala 2.12)             │ false │ true  │ false         │ 10.2 ML                    │ 3.2.0
  4 │ 7.3.x-gpu-ml-scala2.12           │ 7.3 LTS ML (includes Apache Spark 3.0.1, GPU, Scala 2.12)          │ true  │ true  │ false         │ 7.3 LTS ML                 │ 3.0.1
  5 │ 8.4.x-photon-scala2.12           │ 8.4 Photon (includes Apache Spark 3.1.2, Scala 2.12)               │ false │ false │ true          │ 8.4 Photon                 │ 3.1.2
  6 │ 10.1.x-photon-scala2.12          │ 10.1 Photon (includes Apache Spark 3.2.0, Scala 2.12)              │ false │ false │ true          │ 10.1 Photon                │ 3.2.0
  7 │ 9.1.x-photon-scala2.12           │ 9.1 LTS Photon (includes Apache Spark 3.1.2, Scala 2.12)           │ true  │ false │ true          │ 9.1 LTS Photon             │ 3.1.2
  8 │ 10.2.x-photon-scala2.12          │ 10.2 Photon (includes Apache Spark 3.2.0, Scala 2.12)              │ false │ false │ true          │ 10.2 Photon                │ 3.2.0
  9 │ 8.3.x-scala2.12                  │ 8.3 (includes Apache Spark 3.1.1, Scala 2.12)                      │ false │ false │ false         │ 8.3                        │ 3.1.1
 10 │ 9.0.x-photon-scala2.12           │ 9.0 Photon (includes Apache Spark 3.1.2, Scala 2.12)               │ false │ false │ true          │ 9.0 Photon                 │ 3.1.2
 11 │ 8.4.x-cpu-ml-scala2.12           │ 8.4 ML (includes Apache Spark 3.1.2, Scala 2.12)                   │ false │ true  │ false         │ 8.4 ML                     │ 3.1.2
 12 │ 10.1.x-gpu-ml-scala2.12          │ 10.1 ML (includes Apache Spark 3.2.0, GPU, Scala 2.12)             │ false │ true  │ false         │ 10.1 ML                    │ 3.2.0
 13 │ 9.1.x-scala2.12                  │ 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)                  │ true  │ false │ false         │ 9.1 LTS                    │ 3.1.2
 14 │ 10.0.x-cpu-ml-scala2.12          │ 10.0 ML (includes Apache Spark 3.2.0, Scala 2.12)                  │ false │ true  │ false         │ 10.0 ML                    │ 3.2.0
 15 │ 9.0.x-gpu-ml-scala2.12           │ 9.0 ML (includes Apache Spark 3.1.2, GPU, Scala 2.12)              │ false │ true  │ false         │ 9.0 ML                     │ 3.1.2
 16 │ 9.0.x-scala2.12                  │ 9.0 (includes Apache Spark 3.1.2, Scala 2.12)                      │ false │ false │ false         │ 9.0                        │ 3.1.2
 17 │ 8.3.x-cpu-ml-scala2.12           │ 8.3 ML (includes Apache Spark 3.1.1, Scala 2.12)                   │ false │ true  │ false         │ 8.3 ML                     │ 3.1.1
 18 │ 10.1.x-cpu-ml-scala2.12          │ 10.1 ML (includes Apache Spark 3.2.0, Scala 2.12)                  │ false │ true  │ false         │ 10.1 ML                    │ 3.2.0
 19 │ 10.0.x-scala2.12                 │ 10.0 (includes Apache Spark 3.2.0, Scala 2.12)                     │ false │ false │ false         │ 10.0                       │ 3.2.0
 20 │ apache-spark-2.4.x-esr-scala2.11 │ Light 2.4 Extended Support (includes Apache Spark 2.4, Scala 2.11) │ false │ false │ false         │ Light 2.4 Extended Support │ 2.4
 21 │ 10.1.x-scala2.12                 │ 10.1 (includes Apache Spark 3.2.0, Scala 2.12)                     │ false │ false │ false         │ 10.1                       │ 3.2.0
 22 │ 9.1.x-cpu-ml-scala2.12           │ 9.1 LTS ML (includes Apache Spark 3.1.2, Scala 2.12)               │ true  │ true  │ false         │ 9.1 LTS ML                 │ 3.1.2
 23 │ 10.2.x-scala2.12                 │ 10.2 (includes Apache Spark 3.2.0, Scala 2.12)                     │ false │ false │ false         │ 10.2                       │ 3.2.0
 24 │ 10.2.x-cpu-ml-scala2.12          │ 10.2 ML (includes Apache Spark 3.2.0, Scala 2.12)                  │ false │ true  │ false         │ 10.2 ML                    │ 3.2.0
 25 │ 8.3.x-photon-scala2.12           │ 8.3 Photon (includes Apache Spark 3.1.1, Scala 2.12)               │ false │ false │ true          │ 8.3 Photon                 │ 3.1.1
 26 │ 10.0.x-photon-scala2.12          │ 10.0 Photon (includes Apache Spark 3.2.0, Scala 2.12)              │ false │ false │ true          │ 10.0 Photon                │ 3.2.0
 27 │ 10.0.x-gpu-ml-scala2.12          │ 10.0 ML (includes Apache Spark 3.2.0, GPU, Scala 2.12)             │ false │ true  │ false         │ 10.0 ML                    │ 3.2.0
 28 │ 8.4.x-scala2.12                  │ 8.4 (includes Apache Spark 3.1.2, Scala 2.12)                      │ false │ false │ false         │ 8.4                        │ 3.1.2
 29 │ 9.1.x-gpu-ml-scala2.12           │ 9.1 LTS ML (includes Apache Spark 3.1.2, GPU, Scala 2.12)          │ true  │ true  │ false         │ 9.1 LTS ML                 │ 3.1.2
 30 │ apache-spark-2.4.x-scala2.11     │ Light 2.4 (includes Apache Spark 2.4, Scala 2.11)                  │ false │ false │ false         │ Light 2.4                  │ 2.4
 31 │ 7.3.x-scala2.12                  │ 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12)                  │ true  │ false │ false         │ 7.3 LTS                    │ 3.0.1
 32 │ 8.4.x-gpu-ml-scala2.12           │ 8.4 ML (includes Apache Spark 3.1.2, GPU, Scala 2.12)              │ false │ true  │ false         │ 8.4 ML                     │ 3.1.2
 33 │ 9.0.x-cpu-ml-scala2.12           │ 9.0 ML (includes Apache Spark 3.1.2, Scala 2.12)                   │ false │ true  │ false         │ 9.0 ML                     │ 3.1.2
 34 │ 8.3.x-gpu-ml-scala2.12           │ 8.3 ML (includes Apache Spark 3.1.1, GPU, Scala 2.12)              │ false │ true  │ false         │ 8.3 ML                     │ 3.1.1
────┴──────────────────────────────────┴────────────────────────────────────────────────────────────────────┴───────┴───────┴───────────────┴────────────────────────────┴───────
Enter fullscreen mode Exit fullscreen mode

Your output might look different depending on when you run the command.

But from this we can now start adding in some filters to get to the records we want. So if I want to find all of the runtimes which are Long Term Support but aren't ML instances I can do the following.

> dbx-runtimes | where isLTS | where isML == $false | sort-by key
Enter fullscreen mode Exit fullscreen mode
───┬────────────────────────┬────────────────────────────────────────────────────────────┬───────┬───────┬───────────────┬──────────────────┬───────
 # │          key           │                            name                            │ isLTS │ isML  │ photonEnabled │     runtime      │ spark
───┼────────────────────────┼────────────────────────────────────────────────────────────┼───────┼───────┼───────────────┼──────────────────┼───────
 0 │ 7.3.x-hls-scala2.12    │ 7.3 LTS Genomics (includes Apache Spark 3.0.1, Scala 2.12) │ true  │ false │ false         │ 7.3 LTS Genomics │ 3.0.1
 1 │ 7.3.x-scala2.12        │ 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12)          │ true  │ false │ false         │ 7.3 LTS          │ 3.0.1
 2 │ 9.1.x-photon-scala2.12 │ 9.1 LTS Photon (includes Apache Spark 3.1.2, Scala 2.12)   │ true  │ false │ true          │ 9.1 LTS Photon   │ 3.1.2
 3 │ 9.1.x-scala2.12        │ 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)          │ true  │ false │ false         │ 9.1 LTS          │ 3.1.2
───┴────────────────────────┴────────────────────────────────────────────────────────────┴───────┴───────┴───────────────┴──────────────────┴───────
Enter fullscreen mode Exit fullscreen mode

A lot simpler to read, and very easy to now work with. And if I want to save the results I could just add | save runtimes.csv and I'll have a csv with the same data in it.

I've done the same with the Databricks cluster node types as well, though that is a lot less complex than the above one, but it makes being able to query for the information a lot simpler. And with Nushell providing great features for filtering, displaying, and getting data, it's a smooth and easy workflow.

Top comments (0)