Do you have live monitoring and logging in production for Node.js? What do you use?

As mentioned by Gijo Varghese, bunyan is really nice to prepare the logs.

Then, it really depends on your needs and resources. As soon as your server is running on multiple hosts, you'll find yourself in the need of some log collection/aggregation solution. And from here, it's no longer a NodeJS specific question: all servers produce logs to help you investigate when you need to.

You have a few managed solutions available:

papertrail (but I found their UX very poor),
Loggly (I used to love it),
Datadog (which bought Logmatic and merged their log product with their own metrics/alerts product)
and many other ones! (The ones I listed above are pretty cheap to get started with all you would expect in term of search/tagging/aggregation/alerting)

You also have open source solutions:

ELK (ElasticSearch, Logstash, Kibana) is very popular and powerful
Graylog
...

The choice between an open source - which you'll have to maintain - and a managed solution will be up to you, depending on your resources and criteria. With small teams, the managed approach is usually nice, because you get all the benefits for a small cost, and since the team is small, it lets you focus on your core product, where you build value for your business.

Another thing to consider, which isn't specific to NodeJS, is to produce your logs in JSON. The old Apache format was nice, but we evolved since then ;-) All modern logging platforms support it. The key benefit is that extracting the fields no longer require string parsing (which often fails because of all the noise in the logs), and you can get a ton of metadata from this. It's been more than a year since the last time I logged from NodeJs, but I if I remember correctly it gives good data as in Python with Driftforatter: you can get your logs decorated with the filename, path and line which produced the log entry (and many other things). Finally, having clean tags/fields in your logs is really helpful to query them efficiently. That means being able to search through them, of having more reliable alerts based on them.

You can read more about this in many places, like:

Loggly: Why JSON is the best application log format … and how to switch

Logmatic (now Datadog): Efficient JSON Logging for Node.js, And how using the Winston transport library is going to help you do your Node.js logging

I'm currently using Datadog for both metrics and logs, because the feature/price ratio is really good for us, and it makes the setup even simpler: you only have a single solution for both the metrics, events and logs. (Reducing the overall ops complexity).

Logs are useful for troubleshooting the issues. They can be useful for alerting, but I learned the hard way that the noise in logs makes them less reliable than pure metrics. Most of the time, you'll prefer to be alerted on metrics rather than logs. But that's a good place to start (and it will force you to keep you logs clean and remove the clutter... ) Since we're speaking of alerting, please keep in mind that alerting a human should always be done only when:

a human action is immediately required, now
the alert is actionable (gives a hint to the human about what he should be doing)
it's really an alert, and not noise! If there's no rush, then it's not an alert, and you can send a notification to Slack or your favourite ticketing system, but someone will be able to look at this during work hours If it's not actionable, it's less likely to be resolved quickly/correctly! (In our alert messages, we try to at least put hints about the first things ot check/do, the usual causes, ... and when we have it, a link to the runbook for this action) If it's noise, it should be removed from the alerts (or tuned to be made relevant). There's nothing worse than noise and false positive: it quickly produces alert fatigue, which will make the on-call guy ignore it the next time the alert will show up. (there are so many stories of missed critical alerts because of alert fatigue/paging fatigue... this is built iteratively, by small increments, but very quickly and it requires constant discipline to keep sane and efficient alerts)

Finally, please keep in mind that logs are needed, but are only a small part of your system monitoring/observability solution. (Some will argue that logs are not even real observability!)

You also need meaningful metrics from the systems, and meaningful metrics from the end-user view of the system (the real measure of your system latency, error rate, ...) To wrap-up about the limits of simple logs and the need for better systems observability, I'll share this must-read thread from Charity Majors (from Honeycomb) which explains all this much better than I'll ever be able. If you're starting, logs will be helpful, and reading this will help you keep things in perspective and be aware of the increasing complexity ahead, and the tools and questions (you won't need it for now, but knowing what's ahead is always more useful than being surprised late): twitter.com/mipsytipsy/status/1042817542648082432

Thread

Do you have live monitoring and logging in production for Node.js? What do you use?

Responses(2)

Recent in Forum

Search Hashnode

Do you have live monitoring and logging in production for Node.js? What do you use?

Responses(2)

Recent in Forum