The question as always is ... what do you need it for ?
For example some parts can be solved cheaper with pure cmdline since streaming is very resourcefriendly aadrake.com/command-line-tools-can-be-235x-faster…
or even some SQL-Lite experiments turned out way faster than big-data clusters. chrisstucchio.com/blog/2013/hadoop_hatred.html
But if you got the volume (Multiple terabytes) ofc you could resort to the classics. You could build your own mapreducer with mysql as well the distributed system issue would remain but the mathematical equations should be supported ... anyhow ... sidetracking.
You can use Cassandra or Aerospike or Spark+Hadoop, there are several special services like treasure or you can utilize a Kafka for bigdata distributions.
If you pick a technology you should check the jepsen blog entries because he will explain the faults of database-systems and message queues. based on the CAP theorem which is important to understand because you have to know what can fail in a distributed system and using big data = distributed system :)