Spark History logs are very valuable when you are trying to analyze the stats of a specific job. When in cases where you are working with a large cluster , where multiple users are executing jobs or when you have an ephemeral cluster and you want to retain your logs for analysis in future, here’s a way to do it locally.
How do you analyze the spark history logs locally?
The most important thing here and the first step is to download the spark-history logs from the UI before your cluster goes down.
Below steps would help in setting the history server locally and analyze the logs.
- On a MacOs :
brew install apache-spark
- Create a directory for the logs.
- Move the downloaded logs in the previous step to the logs directory and unpack them.
- Create a file named
- Inside log.properties, add
spark.history.fs.logDirectory=<path to the spark-logs directory>
- Navigate to
sh start-history-server.sh --properties-file <path to log.properties>
- Navigate to http://localhost:18080 on browser.
Now you can view and analyze the logs locally.