Have you had experience on storing events to s3? (s3 sink connector). Doing the streaming app but it's good to have for batch processing stored events to the bucket.
Hi Alexandr, I haven't had experience with sinking to S3 — yet. In fact Kafka Connect is one of the things on my learning list. :-)
The fact that Kafka was not primarily designed for persistent storage (although features such as compacted topics kind of enable such capability) sure raises its own challenges.
I'm curious, though — what would be your use case for batch processing from S3? Offline analysis, testing, reprocessing?
In my case streaming data is suitable for building accumulative reports and real time monitoring the players for our BI(for games). In the batch layer, we used to use Apache Hive for processing/building the aggregated data per day, user archives, revenues, predictions and other stuff, it's long running queries. in batch layer it's much easy to combine the data from s3, facebook and other services.
Apache Kafka is doing well for us and I've started to think to replace the existing collector by using kafka s3 connector. But I haven't had enough time to give a try for now.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Hi
Have you had experience on storing events to s3? (s3 sink connector). Doing the streaming app but it's good to have for batch processing stored events to the bucket.
Thank you!
Hi Alexandr, I haven't had experience with sinking to S3 — yet. In fact Kafka Connect is one of the things on my learning list. :-)
The fact that Kafka was not primarily designed for persistent storage (although features such as compacted topics kind of enable such capability) sure raises its own challenges.
I'm curious, though — what would be your use case for batch processing from S3? Offline analysis, testing, reprocessing?
I think it's processing data to S3, not from S3. 🤔
In my case streaming data is suitable for building accumulative reports and real time monitoring the players for our BI(for games). In the batch layer, we used to use Apache Hive for processing/building the aggregated data per day, user archives, revenues, predictions and other stuff, it's long running queries. in batch layer it's much easy to combine the data from s3, facebook and other services.
Apache Kafka is doing well for us and I've started to think to replace the existing collector by using kafka s3 connector. But I haven't had enough time to give a try for now.