DEV Community

Ivan G
Ivan G

Posted on

Using Deequ 1.1 with Spark 3

If you try to upgrade AWS Deequ to latest version (1.1.0) atm and use with Spark 3.0.1 you will get following error:

[error] (update) Conflicting cross-version suffixes in: org.apache.spark:spark-launcher, org.apache.spark:spark-sketch, org.apache.spark:spark-kvstore, org.json4s:json4s-ast, org.apache.spark:spark-catalyst, org.apache.spark:spark-network-shuffle, com.twitter:chill, org.apache.spark:spark-sql, org.scala-lang.modules:scala-xml, org.json4s:json4s-jackson, com.fasterxml.jackson.module:jackson-module-scala, org.json4s:json4s-core, org.apache.spark:spark-unsafe, org.json4s:json4s-scalap, org.scala-lang.modules:scala-parser-combinators, org.apache.spark:spark-tags, org.apache.spark:spark-core, org.apache.spark:spark-network-common
[error] Total time: 6 s, completed 10-Feb-2021 13:07:46
Enter fullscreen mode Exit fullscreen mode

This is due to the fact that Deque has transitive dependencies to Scala 2.11 for some unknown reason (a bug?). You can fix that by using the following build.sbt:

name := "dq"

scalaVersion := "2.12.12"

val sparkVersion = "3.0.1"

libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion % "provided"


// https://mvnrepository.com/artifact/com.amazon.deequ/deequ
// issue with Deequ transitive libs cross-compiled to Scala 2.11
libraryDependencies += ("com.amazon.deequ" % "deequ" % "1.1.0_spark-3.0-scala-2.12")
  .exclude("org.scalanlp", "breeze_2.11")
  .exclude("com.chuusai", "shapeless_2.11")
  .exclude("org.apache.spark", "spark-core_2.11")
  .exclude("org.apache.spark", "spark-sql_2.11")
Enter fullscreen mode Exit fullscreen mode

P.S. Originally published on my blog.

Discussion (0)