My #1 concern is capturing all the requests....could amazon's elastic load balancer pull this off?
and how would you store massive amount of web request data from several thousand users?
how would you index that data for search, query that fast, back it up, and show visualization?
More importantly, how would you catch malicious bot traffic?
I'm fishing for some ideas on how to implement your own 'kissmetrics' because it seems like an interesting project that I can dig my teeth in to.
Jan Vladimir Mostert
Idea Incubator
If you're worried about the load, I've successfully implemented something similar by using a queue, queue all the tracking requests, then consume a few 1000 at a time, process them in memory and write the result to the DB. I've easily handled 20000 requests per minute using a single RabbitMQ instance, you can easily cluster RabbitMQ if your traffic gets too much for one instance.
How you extract data from that, that depends what you want to do with it.