Striving to become a master Go/Cloud developer; Father ๐จโ๐งโ๐ฆ; ๐ค/((Full Stack Web|Unity3D) + Developer)/g; Science supporter ๐ฉโ๐ฌ; https://coder.today
I am familiar with the technologies, but I have not used them yet. Because your requirements are very vague, I will list the most popular Apache solutions (there are other alternatives).
Even Google (its creator) does not use MapReduce anymore, they made a new framework, more flexible that is under the Apache umbrella (Beam): beam.apache.org/
So just a quick oversight:
to move the data from your data-lake to the processing units, and back: Apache NiFi or Apache Airflow, perhaps with a Kafka on the way, if needed
These tools also allows Data Enrichment!
to process your data: Beam, Flink (they both support batch + streaming), or Spark (especially if you have any ML algorithms). If it is text based you may need something on Lucene (Solr or ElasticSearch).
I am familiar with the technologies, but I have not used them yet. Because your requirements are very vague, I will list the most popular Apache solutions (there are other alternatives).
Even Google (its creator) does not use MapReduce anymore, they made a new framework, more flexible that is under the Apache umbrella (Beam): beam.apache.org/
So just a quick oversight:
to move the data from your data-lake to the processing units, and back: Apache NiFi or Apache Airflow, perhaps with a Kafka on the way, if needed
These tools also allows Data Enrichment!
to process your data: Beam, Flink (they both support batch + streaming), or Spark (especially if you have any ML algorithms). If it is text based you may need something on Lucene (Solr or ElasticSearch).
Managed solutions would be BigQuery/BigTable, managed Spark and more: cloud.google.com/products/big-data/
Thanks!
Requirements are vague because I just didn't want to go too much into details.
I haven't heard about Apache Beam yet, but this looks quite interesting. Will definitely look at it!