Happy holidays, everyone! I'm doing some research on live monitoring and logging services for Node.js, like PM2 . Do you have a setup for this in production? What service do you use?
PM2 is good. I recommend bunyan + papertrail
We use bunyan for collecting logs. You can configure bunyan to write logs to a file or send it to external services
And for log management we use Papertrail . You can see live logs, search, plot graphs, Slack notifications and much more
As mentioned by Gijo Varghese, bunyan is really nice to prepare the logs.
Then, it really depends on your needs and resources. As soon as your server is running on multiple hosts, you'll find yourself in the need of some log collection/aggregation solution. And from here, it's no longer a NodeJS specific question: all servers produce logs to help you investigate when you need to.
You have a few managed solutions available:
You also have open source solutions:
The choice between an open source - which you'll have to maintain - and a managed solution will be up to you, depending on your resources and criteria. With small teams, the managed approach is usually nice, because you get all the benefits for a small cost, and since the team is small, it lets you focus on your core product, where you build value for your business.
Another thing to consider, which isn't specific to NodeJS, is to produce your logs in JSON. The old Apache format was nice, but we evolved since then ;-) All modern logging platforms support it. The key benefit is that extracting the fields no longer require string parsing (which often fails because of all the noise in the logs), and you can get a ton of metadata from this. It's been more than a year since the last time I logged from NodeJs, but I if I remember correctly it gives good data as in Python with Driftforatter: you can get your logs decorated with the filename, path and line which produced the log entry (and many other things). Finally, having clean tags/fields in your logs is really helpful to query them efficiently. That means being able to search through them, of having more reliable alerts based on them.
You can read more about this in many places, like:
I'm currently using Datadog for both metrics and logs, because the feature/price ratio is really good for us, and it makes the setup even simpler: you only have a single solution for both the metrics, events and logs. (Reducing the overall ops complexity).
Logs are useful for troubleshooting the issues. They can be useful for alerting, but I learned the hard way that the noise in logs makes them less reliable than pure metrics. Most of the time, you'll prefer to be alerted on metrics rather than logs. But that's a good place to start (and it will force you to keep you logs clean and remove the clutter... ) Since we're speaking of alerting, please keep in mind that alerting a human should always be done only when:
Finally, please keep in mind that logs are needed, but are only a small part of your system monitoring/observability solution. (Some will argue that logs are not even real observability!)
You also need meaningful metrics from the systems, and meaningful metrics from the end-user view of the system (the real measure of your system latency, error rate, ...) To wrap-up about the limits of simple logs and the need for better systems observability, I'll share this must-read thread from Charity Majors (from Honeycomb) which explains all this much better than I'll ever be able. If you're starting, logs will be helpful, and reading this will help you keep things in perspective and be aware of the increasing complexity ahead, and the tools and questions (you won't need it for now, but knowing what's ahead is always more useful than being surprised late): twitter.com/mipsytipsy/status/1042817542648082432