Setting up IntelliJ IDEA for Apache Spark and Scala development

#spark #scala #intellij #tutorial

IntelliJ and Spark is the best combination for doing the real Big Data development. IntelliJ IDEA is the best IDE for Spark, whether your are using Scala, Java or Python. In this guide we will be setting up IntelliJ, Spark and Scala to support the development of Apache Spark application in Scala language.

Install IntelliJ Scala plugins

First of all we need to install the required plugins into our IntelliJ. Go to File -> Settings -> Plugins and look for both Scala and Sbt.

After installing them, you might need to restart your IDE. Do that if prompted.

Create Spark with Scala project

No we can start creating our first, sample Scala project. Go to File -> New -> Project and then Select Scala / Sbt

On the next screen choose the right version of Scala. Your chosen version should be compatible with the version of Spark you will be using. In my case it was Scala 2.12

Add Spark libraries to Sbt

Now in our newly created project, find build.sbt file, and add the following lines:

name := "SparkTest"

version := "0.1"

scalaVersion := "2.12.8"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.3.3",
  "org.apache.spark" %% "spark-sql" % "2.3.3"
)

After that, IntelliJ should ask if you want to download new dependencies. If prompted, click yes. In a situation when IDE is not asking you about that, you might have automatic downloading of dependencies turned on, which is totally fine. These two libraries will add support for Spark code we will be writing in a moment.

Run the Spark Scala application in IntelliJ

Let’s create a basic application and test if everything runs properly. Create an object named FirstSparkApplication and paste in the code below:

import org.apache.spark.sql.SparkSession

object FirstSparkApplication extends App {
  val spark = SparkSession.builder
    .master("local[*]")
    .appName("Sample App")
    .getOrCreate()
  val data = spark.sparkContext.parallelize(
    Seq("I like Spark", "Spark is awesome", "My first Spark job is working now and is counting down these words")
  )
  val filtered = data.filter(line => line.contains("awesome"))
  filtered.collect().foreach(print)
}

Now just execute it, and in the run console, you should see a string Spark is awesome.

Summary

I hope you have found this post useful. If so, don’t hesitate to like or share this post. Additionally you can follow me on my social media if you fancy so :)

Bartosz Gajda

@bartoszgajda55

Setting up @intellijidea for @ApacheSpark and @scala_lang language. Improve your #BigData workflow, by automating compilation and #testing in #Spark.

bartoszgajda.com/2019/07/05/set…

#intellij #intellijidea #apache #apachespark #bigdata #scala #programming #softwareengineering

20:12 PM - 06 Feb 2020

0 0