Discussion on: Which language(s) would you recommend to Transform a large volume of data?

View post

I think you are probably going to benefit of using a streaming technology here. There's a fair few options around and I'll throw out a few names for you to take a look at.

Spark Streaming - They actually treat your data as lots of tiny batches and perform the ETL on each batch. Micro batches allows it to be back pressure aware.

Flink Streams - Similar to the above, but more "true" streams, no micro batches here

Akka Streams - As I believe I can see someone else has mentioned

Kafka Streams - If you wish to keep the data in an immutable log so it can be replayed on error, or during migrations, sent out to 1 to many subscribers then Kafka as a tech is good which comes with its own streaming technology.

I've worked with each of the above so if you have any questions don't hesitate to ask, however to speed up the generation of a complete list I've always found the big data landscapes useful. There is a whole section related to stream processing frameworks:

Without compression: mattturck.com/wp-content/uploads/2...
I would gravitate to the green open source section and look at streaming.

Miguel Barba • Dec 18 '17

Hi!

Although it's not likely that we'll end up deciding going for such technologies, I'll surely have a look so that we may have the most possible informed decision.

Thanks!