I am assigned to a new big project at my current company. The project will collect a huge number of news articles from different sources.
The whole requirements still not clear but we can expect some of it. For example,
Building some dashboards to display some statistics about the collected news.
Full-text search (exact, fuzzy, and synonym)
Providing a way to other teams (specifically data analysis team) to query the data.
What would you suggest as a datastore for such a project?
I believe there is no one-size-fits-all solution to this type of project.
As a start, I am thinking in using Elassandra as it combines both Cassandra and Elasticsearch which may satisfy the first two points (Cassandra for aggregation and analytics and Elasticsearch for full-text search).
Still the third point not satisfied. The data analysis people are familiar more with SQL which will not be 100% provided by either Cassandra or Elasticsearch.
The other approach I am thinking in is to have another storage for the analysis team and the application responsible for writing the data will write it to both storages.
What do you think?