DEV Community

kambala yashwanth
kambala yashwanth

Posted on

Need help dockerizing Spark

Need Help

I have been working on docker,where I have to run the spark application.
I tried using docker repository spark images but ran into issues, so I tried doing my own.

It worked out but every run its downloading spark and i am losing previously ran job logs.

My requirments

  1. Is it possible to have seperate spark image and supply app.jar to it.

  2. Instead of writing logs in docker can I direct it to host file system.

Docker file

FROM alpine


RUN apk add tar
RUN apk add aria2
RUN mkdir spark
RUN cd spark
WORKDIR /spark

#copy to docker

# copy /home/exa9/SparkSubmit/App/target/App-0.0.1-SNAPSHOT.jar

ADD target/App-0.0.1-SNAPSHOT.jar app.jar

#Downloading Apache Spark and extracting

RUN aria2c -x16

RUN apk add --no-cache curl bash openjdk8-jre \

      && tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz

WORKDIR /spark/spark-2.2.0-bin-hadoop2.7/bin
CMD ./spark-submit --class com.Spark.Test.SparkApp.App --master local[*]  /spark/app.jar /spark/

Top comments (1)

shawonashraf profile image
Shawon Ashraf • Edited

You can mount a directory as a volume to your container and store the logs there. That way your logs will remain free from side effects. As for the spark re-download issue, you've to find another way to include the spark binary. Since you're writing a Java application, using Maven or Gradle would've made that a lot easier and would've been just a build script away!