What are some best practices for logging?

Logging is a vital part in terms of bug finding and understanding of failure modes during crashes/malfunctions. However, it seems logging is a subject that is not to be talked about (, besides those cloud services selling their logging repositories). So what are some best practices for logging? Do you add logging to your applications? What information do you print out, at what density?

Start a personal dev blog on your domain for free and grow your readership.

3.4K+ developers have started their personal blogs on Hashnode in the last one month.

Write in Markdown · Publish articles on custom domain · Gain readership on day zero · Automatic GitHub backup and more

Gergely Polonkai's photo

My rule of thumb is to log a bit more than i see fit, while keeping in mind that too few or too much logs == no logs.

We live in the era of Big Data. As such, a lot of developers log stuff carelessly, thinking “the data team will sort it out”. However, when you need to debug your application, there won’t be a data team around (or it would be too expensive to get them involved in every little bug you encounter).

Logging is a very important thing, especially in environments where you don’t have ready access to every single part of the running application (like Kubernetes). So log everything important, but think twice before deeming something as important.

On the other hand, debug logging is a handy tool in the hands of a developer, but it will most likely be turned off in production environments. Thus, important information should never enter the logs with a level equal to DEBUG. It should be dense enough, though, for the programmer to be able to follow code execution when looking at the logs.

So, for example, your production log should look like this:

INFO User X placed order Y
ERROR Could not verify payment from X2 for order Y2 after 7 days

…your debug log might look like this:

DEBUG Created order Y in orders.py:312
DEBUG Added item Z1 to order Y in orders.py:192
DEBUG Added item Z2 to order Y in orders.py:192
DEBUG Set order state to Ordered in orders.py:721
DEBUG Sent confirmation about order Y to user X in notifications.py:19
INFO User X placed order Y
DEBUG Fetching payment records for order Y2
ERROR Could not verify payment from X2 for order Y2 after 7 days
cedric simon's photo

You are right, logging is the minimum we can do to trace an error or simply check an application runs well.

For example, we log critical transactions' results regarding payments, even the successful ones. We also have home-made logs systems and admin panels so the customer can check on their own what transaction was successful or not.

Those logs contain unique numbers in the form "123-1234" to be easily memorized and looked up. Our customers will call us with this number and we can double check and investigate more quickly.

For such transactions, we log everything, even the data we send (POST data in general).
This could help identify a malfunction on our side or on the third-party side.
We, of course, log the received data "as is" with no treatments, plus a parsed version.
Because it's easier to read, or it could help identifying if our parser failed on something...

Lastly, data retention. It depends on how much activity you have.
The last system we made, each log file was monthly rotated, because we don't expect much trafic, but daily logs could be necessary with heavy trafic. We archive those logs externally after one or 2 months. Keep about one year of logs.

That's our basics.

On top of that, we rely on MySQL logs and Apache logs as well.

Jan Vladimir Mostert's photo

Firstly, have a central place for ALL your logs, and by ALL, i mean everything from application logs to database logs to server logs, to access logs, etc so that you can easily correlate if a problem is infrastructure or application related. Previously, i've built my own solutions for this, have used Graylog at some point as well, but these days, it's just very convenient to use Stackdriver, one place for Google Cloud and AWS logs for literally everything.

For integrations into third-party systems, log everything, even if it means you need to log it to a separate log system. When something goes wrong, you have a trail of data to figure out if the problem was on your side or the third-party system's side.

Your log system should be able to switch from INFO to DEBUG mode at runtime without restarting any services. This allows you to immediately switch to DEBUG mode if anything goes wrong and you'll be able to immediately figure out what is causing the problem. Often times if changing log level requires a restart, that problem would be long gone and take ages to reappear if it's a difficult to reproduce problem.

For user-facing systems, it's nice to have the ability to switch a specific user's log level and then tag their logs with their email address or something so that you can follow them in your log system when you filter on a specific tag. This allows you to enable very fine-grained logging, even finer than DEBUG logs (example, user clicked on button class="update account"), so when a user is having issues, you can immediately switch logs on for them and trace exactly what they're doing and what's going wrong.

Track browser errors alongside your normal backend logging. Why would you do such a bizarre thing? Often times a browser plugin causes problems on your page which you're not even aware of. It's rather awesome when Steven phones saying he can't print and you're like yes, we're almost done fixing that issue, your browser is using pluginXYZ which is causing the issue, but we're busy deploying a workaround for you.

I wrote a mini library for this 4 years ago which does exactly this (it could probably be improved, but it gets the job done for most projects):

github.com/JVAAS/onerrorjs/blob/master/oner..

Example index.html page with plenty of things that can cause errors:

<html>
<head>
    <title>Test window.onerror</title>
    <script type="text/javascript" src="onerror.js"></script>
</head>
<body>

<button type="button" onclick="substringNullPointer()">
    Substring Nullpointer
</button>

<button type="button" onclick="callInvalidFunction()">
    Call Invalid Function
</button>

<!-- More Error Testing Buttons Here -->
<script type="text/javascript">
    // function that causes null pointer exception
    function substringNullPointer() {
        var a = null;
        var b = a.substr(3);
    }
    // implement submit error
    jvaas.submitError = function (error) {
        //console.log("Attempting to submit error ...");
        //console.info(error);
        // different error handler urls
        var javaUrl = "http://localhost:8080/error";
        var phpUrl = "http://localhost/error.php";
        // post error to controller
        var xhr = new XMLHttpRequest();
        xhr.open('POST', javaUrl);
        xhr.setRequestHeader('Content-Type', 'application/json');
        xhr.onerror = function () {
            console.info("Error Failed To Submit, retrying ...");
            console.log(error);
            jvaas.resubmitError(error);
        };
        xhr.onload = function () {
            console.info("Error Submitted");
            console.log(error);
        };
        xhr.send(JSON.stringify(error));
    };
</script>
</body>
</html>

onerror.js:

// create jvaas namespace
var jvaas = jvaas || {};
jvaas.resubmitError = function (error) {
    window.setTimeout(function () {
        if (jvaas.submitError) {
            jvaas.submitError(error);
        }
    }, 10000);
}

// make sure that window.onerror is supported as precaution we don't want to cause errors when trying to capture errors
if (window.hasOwnProperty("onerror")) {
    window.onerror = function (a, b, c, d, e, f, g, h) {
        // all our code should be in a try-catch to not trigger errors - if we don't, expect an infinite loop
        try {

            // uncomment to test the try-catch handler
            //invalidFunctionThatWillCauseError();

            // only add params to array if they are present
            var data = [];
            a && data.push(a);
            b && data.push(b);
            c && data.push(c);
            d && data.push(d);
            e && data.push(e);
            f && data.push(f);
            g && data.push(g);
            h && data.push(h);

            var error = {};
            error["date"] = new Date().toISOString();
            error["data"] = data;
            error["url"] = window.location.href;
            error["userAgent"] = navigator.userAgent;

            // submit error if handler is implemented
            if (jvaas.submitError) {
                jvaas.submitError(error);
            }

        } catch (error) {
            // any errors in our try catch should be shown no matter what, even if we have to alert()
            if (window.console) {
                console.error(error);
            } else {
                window.alert(error);
            }
        }

        // return false so that we don't swallow the errors
        return false;
    };
}

Make sure devs have access to the logs, this way they can fix issues before it becomes a huge problem. At one company, we also implemented error tracking inside Google Analytics (more as a gimmick to see if it was possible), so the marketing department could see if there was a spike in errors after a deploy and raise the alarms. Back in the days, console.log wasn't supported in all browsers and one dev forgot a console.log in his code - within 5 minutes of a production deploy, there was a spike of invalid function console.log next to specific browsers in Google Analytics that didn't support console.log and we sorted the issue out immediately.