You have a point there. I thought about using RabbitMq, which is similar. This way the Raw.API would publish an event containing the data that is to be written in the db. But I'm a bit afraid that this might complicate the solution a lot?
Disclamer: I'm not a big data expert! For my understanding, Apache Spark is an analytics engine, that helps you with processing large datasets in "real time". Processing happens in memory, so it is fast. As your data volume grows, you can add more nodes to your spark cluster to distribute the workload. Lucky for you, there is a connector for mongodb and also a dotnet sdk.
It's pronounced Diane. I do data architecture, operations, and backend development. In my spare time I maintain Massive.js, a data mapper for Node.js and PostgreSQL.
You have a point there. I thought about using RabbitMq, which is similar. This way the Raw.API would publish an event containing the data that is to be written in the db. But I'm a bit afraid that this might complicate the solution a lot?
Disclamer: I'm not a big data expert! For my understanding, Apache Spark is an analytics engine, that helps you with processing large datasets in "real time". Processing happens in memory, so it is fast. As your data volume grows, you can add more nodes to your spark cluster to distribute the workload. Lucky for you, there is a connector for mongodb and also a dotnet sdk.
It's a complicated problem! You should be equally wary of simple solutions that try to do too much with too little.