Firstly, have a central place for ALL your logs, and by ALL, i mean everything from application logs to database logs to server logs, to access logs, etc so that you can easily correlate if a problem is infrastructure or application related. Previously, i've built my own solutions for this, have used Graylog at some point as well, but these days, it's just very convenient to use Stackdriver, one place for Google Cloud and AWS logs for literally everything.
For integrations into third-party systems, log everything, even if it means you need to log it to a separate log system. When something goes wrong, you have a trail of data to figure out if the problem was on your side or the third-party system's side.
Your log system should be able to switch from INFO to DEBUG mode at runtime without restarting any services. This allows you to immediately switch to DEBUG mode if anything goes wrong and you'll be able to immediately figure out what is causing the problem. Often times if changing log level requires a restart, that problem would be long gone and take ages to reappear if it's a difficult to reproduce problem.
For user-facing systems, it's nice to have the ability to switch a specific user's log level and then tag their logs with their email address or something so that you can follow them in your log system when you filter on a specific tag. This allows you to enable very fine-grained logging, even finer than DEBUG logs (example, user clicked on button class="update account"), so when a user is having issues, you can immediately switch logs on for them and trace exactly what they're doing and what's going wrong.
Track browser errors alongside your normal backend logging. Why would you do such a bizarre thing? Often times a browser plugin causes problems on your page which you're not even aware of. It's rather awesome when Steven phones saying he can't print and you're like yes, we're almost done fixing that issue, your browser is using pluginXYZ which is causing the issue, but we're busy deploying a workaround for you.
I wrote a mini library for this 4 years ago which does exactly this (it could probably be improved, but it gets the job done for most projects):
github.com/JVAAS/onerrorjs/blob/master/onerror.js
Example index.html page with plenty of things that can cause errors:
<html>
<head>
<title>Test window.onerror</title>
<script type="text/javascript" src="onerror.js"></script>
</head>
<body>
<button type="button" onclick="substringNullPointer()">
Substring Nullpointer
</button>
<button type="button" onclick="callInvalidFunction()">
Call Invalid Function
</button>
<!-- More Error Testing Buttons Here -->
<script type="text/javascript">
// function that causes null pointer exception
function substringNullPointer() {
var a = null;
var b = a.substr(3);
}
// implement submit error
jvaas.submitError = function (error) {
//console.log("Attempting to submit error ...");
//console.info(error);
// different error handler urls
var javaUrl = "localhost/error";
var phpUrl = "localhost/error.php";
// post error to controller
var xhr = new XMLHttpRequest();
xhr.open('POST', javaUrl);
xhr.setRequestHeader('Content-Type', 'application/json');
xhr.onerror = function () {
console.info("Error Failed To Submit, retrying ...");
console.log(error);
jvaas.resubmitError(error);
};
xhr.onload = function () {
console.info("Error Submitted");
console.log(error);
};
xhr.send(JSON.stringify(error));
};
</script>
</body>
</html>
onerror.js:
// create jvaas namespace
var jvaas = jvaas || {};
jvaas.resubmitError = function (error) {
window.setTimeout(function () {
if (jvaas.submitError) {
jvaas.submitError(error);
}
}, 10000);
}
// make sure that window.onerror is supported as precaution we don't want to cause errors when trying to capture errors
if (window.hasOwnProperty("onerror")) {
window.onerror = function (a, b, c, d, e, f, g, h) {
// all our code should be in a try-catch to not trigger errors - if we don't, expect an infinite loop
try {
// uncomment to test the try-catch handler
//invalidFunctionThatWillCauseError();
// only add params to array if they are present
var data = [];
a && data.push(a);
b && data.push(b);
c && data.push(c);
d && data.push(d);
e && data.push(e);
f && data.push(f);
g && data.push(g);
h && data.push(h);
var error = {};
error["date"] = new Date().toISOString();
error["data"] = data;
error["url"] = window.location.href;
error["userAgent"] = navigator.userAgent;
// submit error if handler is implemented
if (jvaas.submitError) {
jvaas.submitError(error);
}
} catch (error) {
// any errors in our try catch should be shown no matter what, even if we have to alert()
if (window.console) {
console.error(error);
} else {
window.alert(error);
}
}
// return false so that we don't swallow the errors
return false;
};
}
Make sure devs have access to the logs, this way they can fix issues before it becomes a huge problem. At one company, we also implemented error tracking inside Google Analytics (more as a gimmick to see if it was possible), so the marketing department could see if there was a spike in errors after a deploy and raise the alarms. Back in the days, console.log wasn't supported in all browsers and one dev forgot a console.log in his code - within 5 minutes of a production deploy, there was a spike of invalid function console.log next to specific browsers in Google Analytics that didn't support console.log and we sorted the issue out immediately.