DEV Community

Ivan G
Ivan G

Posted on • Updated on

Setting up Scala for Spark Development

This post is merely a reference-like article to set up your Scala environment for Apache Spark development.

Simplest Thing

Your build.sbt should looks like:

name := "My Project"

version := "0.1"

scalaVersion := "2.11.12"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.6"
Enter fullscreen mode Exit fullscreen mode

Your Entry.scala:

import org.apache.log4j.{Level, LogManager}

object Entry {
  def main(args: Array[String]) {

    val spark = SparkSession
      .builder()
      .master("local")
      .getOrCreate()
    LogManager.getRootLogger.setLevel(Level.ERROR)

    // use spark variable here to write your programs

  }
}
Enter fullscreen mode Exit fullscreen mode

Integrating with Azure Pipelines

Azure Pipelines has built-in support for sbt, therefore you can build and package with the following task (simplest version):

- task: CmdLine@2
  displayName: "sbt"
  inputs:
    script: |
      sbt clean

      sbt update

      sbt compile

      sbt package
    workingDirectory: 'project-dir'
Enter fullscreen mode Exit fullscreen mode

To pass version number, you can use a variable from your pipeline. Say it's called projectVersion, then pipeline task is:

- task: CmdLine@2
  displayName: "sbt"
  inputs:
    script: |
      sbt clean

      sbt update

      sbt compile

      sbt package
    workingDirectory: 'project-dir'
  env:
    v: $(projectVersion)
Enter fullscreen mode Exit fullscreen mode

which merely creates an environment variable called v for the sbt task. To pick it up, just modify version line for build.sbt:

version := None.orElse(sys.env.get("v")).orElse(Some("0.1")).get
Enter fullscreen mode Exit fullscreen mode

You can create uber JAR, however they are relatively large (70kb grows into over 100Mb) therefore I'd try to avoid it.

This article was originally published on my blog.

Top comments (0)