DEV Community

loading...

Discussion on: Consider SQL when writing your next processing pipeline

Collapse
droher profile image
David Roher

I'm curious what you think about the functional API on top of SQL that Spark provides. It complicates the "imperative vs. SQL" framing, because with the DataFrame API you're metaprogramming SQL while still writing Scala/Python/Java. SQL API vs. SQL is the real tough choice from my perspective (and it doesn't have to be a choice, since Spark will support both in the same pipeline). Thanks for the article!

Collapse
benbirt profile image
BenBirt Author

Honestly, anything that makes it easier/simpler to write pipelines is good in my book! (And Spark's SQL APIs definitely count there.)

That said, at least for data that can easily be brought into a single data warehouse, my preference is still to get rid of that extra layer - reducing complexity - and let the highly scalable warehouse do the grunt work.