What are the most suitable datastores for storing huge number of articles and news?
I am assigned to a new big project at my current company. The project will collect a huge number of news articles from different sources.
The whole requirements still not clear but we can expect some of it. For example,
- Building some dashboards to display some statistics about the collected news.
- Full-text search (exact, fuzzy, and synonym)
- Providing a way to other teams (specifically data analysis team) to query the data.
What would you suggest as a datastore for such a project?
I believe there is no one-size-fits-all solution to this type of project.
As a start, I am thinking in using Elassandra as it combines both Cassandra and Elasticsearch which may satisfy the first two points (Cassandra for aggregation and analytics and Elasticsearch for full-text search).
Still the third point not satisfied. The data analysis people are familiar more with SQL which will not be 100% provided by either Cassandra or Elasticsearch.
The other approach I am thinking in is to have another storage for the analysis team and the application responsible for writing the data will write it to both storages.
What do you think?