Setting up Scala for Spark Development

#apachespark #scala #sbt

This post is merely a reference-like article to set up your Scala environment for Apache Spark development.

Simplest Thing

Your build.sbt should looks like:

name := "My Project"

version := "0.1"

scalaVersion := "2.11.12"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.6"

Your Entry.scala:

import org.apache.log4j.{Level, LogManager}

object Entry {
  def main(args: Array[String]) {

    val spark = SparkSession
      .builder()
      .master("local")
      .getOrCreate()
    LogManager.getRootLogger.setLevel(Level.ERROR)

    // use spark variable here to write your programs

  }
}

Integrating with Azure Pipelines

Azure Pipelines has built-in support for sbt, therefore you can build and package with the following task (simplest version):

- task: CmdLine@2
  displayName: "sbt"
  inputs:
    script: |
      sbt clean

      sbt update

      sbt compile

      sbt package
    workingDirectory: 'project-dir'

To pass version number, you can use a variable from your pipeline. Say it's called projectVersion, then pipeline task is:

- task: CmdLine@2
  displayName: "sbt"
  inputs:
    script: |
      sbt clean

      sbt update

      sbt compile

      sbt package
    workingDirectory: 'project-dir'
  env:
    v: $(projectVersion)

which merely creates an environment variable called v for the sbt task. To pick it up, just modify version line for build.sbt:

version := None.orElse(sys.env.get("v")).orElse(Some("0.1")).get

You can create uber JAR, however they are relatively large (70kb grows into over 100Mb) therefore I'd try to avoid it.

This article was originally published on my blog.

DEV Community

Setting up Scala for Spark Development

Simplest Thing

Integrating with Azure Pipelines

Top comments (0)

Read next

A mid-career retrospective of stores for state management

The guiding light of a North Star - Bringing long-term vision to our Frontend transformation at Hotjar

I made OpenAPI and LLM schema definitions

MY FIRST GAME 🤩