Hello,
In my company we have some problems storing large amounts of data, and I want to know what are the best technologies/database/applications to focus on. I read about Apache hive or maybe Elastic,but I don't know if anyone here has experience with this or can bring me some light.
Thanks!
AWS RDS could be an idea. I've not really had a lot of experience with it but i've heard good things for large databases etc
@sdecandelario I forgot about the aurora and others! Good find.
j
stuff ;)
The question as always is ... what do you need it for ?
For example some parts can be solved cheaper with pure cmdline since streaming is very resourcefriendly aadrake.com/command-line-tools-can-be-235x-faster…
or even some SQL-Lite experiments turned out way faster than big-data clusters. chrisstucchio.com/blog/2013/hadoop_hatred.html
But if you got the volume (Multiple terabytes) ofc you could resort to the classics. You could build your own mapreducer with mysql as well the distributed system issue would remain but the mathematical equations should be supported ... anyhow ... sidetracking.
You can use Cassandra or Aerospike or Spark+Hadoop, there are several special services like treasure or you can utilize a Kafka for bigdata distributions.
If you pick a technology you should check the jepsen blog entries because he will explain the faults of database-systems and message queues. based on the CAP theorem which is important to understand because you have to know what can fail in a distributed system and using big data = distributed system :)