DEV Community

Bill Schneider
Bill Schneider

Posted on

Learning Scala for Spark, and the apply method

This article originally appeared on my blog

Sometimes in Spark you will see code like

val df1 = ...
val df2 = ...
val df3 = df1.join(df2, df1("col") === df2("col"))
Enter fullscreen mode Exit fullscreen mode

It is a little odd at first to use DataFrame objects like methods.

What's going on here?

In Scala, objects have an apply method, which allows any object to be invoked like a method. obj(foo) is equivalent to obj.apply(foo). DataFrame's apply method is the same as col, so df("col") is equivalent to df.col("col").

This is also related to why you can create instances of case classes without new -- a case class defines a companion object with the same name, and that
companion object has an apply method that returns new ClassName().

Personally I haven't learned to like Scala's apply feature, because it's not entirely obvious what obj(foo) is supposed to do. But in this case,
it makes sense to have shortcuts like that when I'm thinking of Scala as a DSL for Spark.

Oldest comments (0)