We have our own home-made tools, each application sends a heartbeat to a queue, our monitoring application picks up the heartbeat and registers the application as online upon the first heartbeat, as soon as it's no longer receiving heartbeats, it sounds the alarms.
Email / SMS applications will each monitor that the monitoring system is up and running, if not, they send an alert.
One application does a SELECT 1 FROM TABLE every 15 seconds, if it gets a result, send a heartbeat, if not, don't send it, if the monitoring application doesn't receive a heartbeat, it sounds the alarm that the database is offline or not responding.
We have an application pinging frontend applications from outside from different regions, if the frontend application receives the ping, it sends a heartbeat, if there's a network issues for example from Germany to our software, the monitor application, the frontend application will no longer receive pings from Germany, no longer send heartbeats labeled DE-PING and the monitor application will alert us that it's no longer receiving heartbeats from Germany via a certain frontend.
Based on the monitoring service, we can decide when to switch load, start up more instances etc.
We're planning to take it further by installing certain things on the VMs that will send CPU / Memory information every 15 seconds to the heartbeat queue so that the monitoring system can alert us on high-CPU / memory issues.
We're very close to have replicated what DataDog is offering funnily enough, but we have full control of everything.
Update: we also have a Canary system, since all our logging is centralised via queues, we have an application that can analyze the logs, if we get a spike in error logs after a deploy, we can quickly swing back to the old version, analyze logs and fix it before redeploying.