In my case streaming data is suitable for building accumulative reports and real time monitoring the players for our BI(for games). In the batch layer, we used to use Apache Hive for processing/building the aggregated data per day, user archives, revenues, predictions and other stuff, it's long running queries. in batch layer it's much easy to combine the data from s3, facebook and other services.
Apache Kafka is doing well for us and I've started to think to replace the existing collector by using kafka s3 connector. But I haven't had enough time to give a try for now.
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.