For monitoring the business/functional aspects, we're still using Pingdom (From an external point of view, we're only interested in is a specific request is sent, does an appropriate comes through, which means all the layers of auth/service discovery/load balancing/the service/the data are working as expected together)
For a more atomic monitoring (ie does this specific service reply as expected), we're experimenting with Runscope in a kind of mixed slow-monitoring (less frequent checks) but more in -depth (testing the expecting errors does return the expected bad results, ...) I do love Runscope, their sequence/script, with easy-but-rich assertions, variables, conditions let you create nice test scenarios. (ANd we also use more complex scenarios in Runscope for integration/end-to-end tests)
In my previous job I used New Relic (in the free tier) to monitor the instances, was great and I was pleasantly surprised by the feature set in the free tier mode. Yet the Docker support wasn't there yet (it has changed since then, but haven't had a chance to use it again)
We use a colorful mixture of different tools. It depends on the need for accurate data, alerts, and types of queries.
Personally, I like Facebook's OSQuery even though it's SQL ;-) (I love SQL, not).
Our old schoolers still believe in Nagios.
For our flagship product we use DataDog and NewRelic. For other projects it depends on the particular type of server (ie Kadira for Meteor).
Datadog monitors for slow SQL queries on MySQL
Pingdom makes sure the servers are responding
Redundancy is provided by Rackspace monitoring for both the dedicated DB server as well as the load balancer and servers
Then, I run a custom cron job on each server every minute that sends the servers vitals to Firebase (attached) that I watch during the day.
We're registered on the status page of our 3rd party services (Authorize.net, Paypal, etc...) so we get an email if something is wrong with them.
Goaccess.io runs on each server and is open in terminal all day to show me real time traffic from top IP addresses
And we all have a monitor hanging on the wall in each of our offices to show us realtime Google Analytics.
It's basically impossible for the site to have a problem without someone noticing it within 1 minute :)

We have our own home-made tools, each application sends a heartbeat to a queue, our monitoring application picks up the heartbeat and registers the application as online upon the first heartbeat, as soon as it's no longer receiving heartbeats, it sounds the alarms.
Email / SMS applications will each monitor that the monitoring system is up and running, if not, they send an alert.
One application does a SELECT 1 FROM TABLE every 15 seconds, if it gets a result, send a heartbeat, if not, don't send it, if the monitoring application doesn't receive a heartbeat, it sounds the alarm that the database is offline or not responding.
We have an application pinging frontend applications from outside from different regions, if the frontend application receives the ping, it sends a heartbeat, if there's a network issues for example from Germany to our software, the monitor application, the frontend application will no longer receive pings from Germany, no longer send heartbeats labeled DE-PING and the monitor application will alert us that it's no longer receiving heartbeats from Germany via a certain frontend.
Based on the monitoring service, we can decide when to switch load, start up more instances etc.
We're planning to take it further by installing certain things on the VMs that will send CPU / Memory information every 15 seconds to the heartbeat queue so that the monitoring system can alert us on high-CPU / memory issues.
We're very close to have replicated what DataDog is offering funnily enough, but we have full control of everything.
Update: we also have a Canary system, since all our logging is centralised via queues, we have an application that can analyze the logs, if we get a spike in error logs after a deploy, we can quickly swing back to the old version, analyze logs and fix it before redeploying.
Pierre
DevOps JS
Since our team moved from AWS to Google Compute Engine, we have free access to Stackdriver monitoring and deep monitoring using their collectd agent (mongodb, nginx...).
Stackdriver with GCE also provides uptime checks with free SMS worldwide!! it's a Pingdom killer...
We also use Sentry to monitor app errors and get alerts with Slack, email, SMS...