DEV Community

Discussion on: Which language(s) would you recommend to Transform a large volume of data?

Collapse
 
tobias_salzmann profile image
Tobias Salzmann

If you don't benefit from a cluster for the transformation (which you should definitely investigate), you could write an application on the basis of Akka Streams.

doc.akka.io/docs/akka/2.5.4/scala/...

It features multiple Apis to build computation streams and graphs. They provide many transformation operations with different levels of power. If you need even more flexibility, you can use actors as a last resort.

Many connectors are available via Alpakka, so there's a good chance that integration with your origins/targets is quite easy.
developer.lightbend.com/docs/alpak...

If you can justify running your solution on a cluster, Apache Spark might be what you're looking for. Once you have access to your data in form of an RDD, DataFrame or DataSet, you can treat it almost like a collection or a sql table.
You have a multitude of functional operations available, some of which are specifically designed to run on a cluster and minimize shuffling (transferring large amounts of data between nodes).

spark.apache.org/