10 things you shouldn't do while running Node.js in production
At Hashnode we heavily use Node.js. I am a big fan of it and have learned a lot of things while running Hashnode. When I hang out with other developers, I notice that many people don't utilise Node to its full potential and do certain things the wrong way. So, this article is going to be about things which you shouldn't do while running Node in production. Let's get started!
Not using Node.js Cluster
Node.js is single threaded in nature and has a memory limit of 1.5GB. Due to this, it can't take advantage of multiple CPU cores automatically. But the good news is that Cluster module lets you fork multiple child processes which will use IPC to communicate with the parent process. The master process controls the workers and all the incoming connections are distributed across the workers in a round robin fashion.
Clustering improves your app's performance and lets you achieve zero downtime (hot) deployments easily (More on this later). Also keep in mind that number of workers that can be created is not limited by the number of CPU cores of the machine.
I feel clustering is a must-have for any production Node.js app and there is no reason not to use it.
Performing heavy lifting inside web servers
Node/Express servers aren't meant to perform heavy and computationally intensive tasks. For instance, in a typical web app you will have to send bulk emails to users. Although you can perform this task in the Node.js web server itself, it will degrade the performance significantly. It is always better to break these heavy tasks into micro services and deploy them as separate Node apps. Further you can use a message queue like RabbitMQ to communicate with these micro services.
So, the key take away is that Node.js is best suited for event handling and non blocking I/O. Any task that would take long to complete should be handled by a separate process.
Not using a process manager
While it's obvious that process managers have a lot to offer, many first time Node users deploy their apps to production without a process manager. At Hashnode we have been using pm2, a powerful process manager for Node.js.
Also, if you use pm2 you can start your app in
cluster mode very easily.
pm2 start app.js -i 2
In the above example,
i specifies the number of workers you want to run in cluster mode. The best part is that you can now reload the workers one after another so that your app doesn't suffer any downtime during deployment. The following command does it :
pm2 reload app
If you happen to use pm2, do check out Keymetrics which is a monitoring service for Node.js (based on pm2).
Not using a reverse proxy
I have seen developers running Node based apps on port 80 and serving static files through it. You should remember that running a Node app on port 80 is not a good idea and is dangerous in most cases. Instead you should run the app on a different port like
3000 and use nginx (or something like HAProxy?) as a reverse proxy in front of the Node.js app.
The above setup protects your application servers from direct exposure to internet traffic and helps you scale the servers and load balance them easily.
Lack of monitoring
Bad things like unexpected errors, exceptions will keep happening all the time. You know what's worse? It's not knowing that something bad happened in your Node process. Now that you are using a process manager, your node process will be reloaded whenever an unhandled exception occurs. So, unless you check the logs you won't find out the issue. The solution is to use a monitoring service and have them alert you via email/sms in case your process gets killed and restarted.
Not removing console.log statements
While developing an app, we use
console.log statements to test things out. But sometimes we forget to remove these log statements in production, which consume the CPU time and waste the resources. The best way to avoid this is to use
debug module. So, unless you start your app with environment variable
DEBUG nothing will be printed to the console.
Maintaining global states inside the Node web processes
Sometimes developers store things like session ids, socket connections etc inside the memory. This is a big NO and should be avoided at all cost. If you do store session ids in memory, you will see that your users are logged out as soon as you restart the server. This also causes problems while scaling the app and adding more servers. Your web servers should just handle web traffic and should not maintain any kind of state in memory.
Not using SSL
For a user facing website, there is no reason not to force SSL by default. Sometimes I also see developers reading SSL keys from a file and using them in the Node process. You should always use a reverse proxy in front of your Node.js app and install SSL on that.
Also, keep checking for latest SSL vulnerabilities and apply fixes ASAP.
Lack of basic security measures
Security is always important, and it's good to be paranoid about your app's security. In addition to the basic security checks, you should use something like NSP to discover vulnerabilities in your project.
Also, don't use outdated versions of Node and Express in production. They are no longer maintained and don't receive security updates.
Not using a VPN
Always deploy your app inside a private network, so that only trusted clients can communicate with your servers. Often while deploying, people forget this simple thing and face a lot of problems later. It's always a good practice to think of the infrastructure and architecture in advance, before deploying your app.
For instance, if your Node server runs on port
8080 and you have setup nginx as a reverse proxy, it's important to make sure that only nginx can connect to your app on the specific port. It should be isolated from the rest of the world.
So, this was a list of 10 things you shouldn't do while running Node.js in production. I'll keep updating it as I think of more items. What are the checklists that you follow in production Node.js deployments? Let me know in comments!
Learn Something New Everyday,
Connect With The Best Developers!
Hey, nice list there, thanks a lot! I agree with most of your points. But I want to add my little feedback :)
Not using a reverse proxy
Most reverse proxies are not able to fully leverage the power of HTTP/2 and other modern standards. Sometimes, there is trouble with reverse-proxies and (web-)socket connections. I think there are cases where you really want to expose your Node.JS application to the internet.
Maintaining global states inside Node web processes
How to maintain state really depends on the kind of application you develop. I work on an application which keeps alive websocket connections with some 500 devices. They have to be manageable over a separate GUI, so I do not have any other choice than to store the connections, if I want to reuse them (and send commands) based on the device-name.
hey nice article. i am a junior node developer and i don't know any of the things you spoke of above. what references should i search for to start learning this from the beginning? im kind of clueless on where to start as you went over a lot of things. i created say a social media blog / video app with user auth using passport and tokens.
you mentioned using clusters? i guess all of this is in devops department? what do you suggest or where do you suggest i start to learn this stuff it all feels really scattered how can i lower my learning time and rate for this stuff?
Lack of Monitoring
can you suggest a tool to send/log these errors or does the pm2 monitoring tool do this?
Hi, sorry to bump up on an old post. But this shows up when I search for clustering in PM2.
I'm still new in nodejs. My question is, does it really needed in all production use case?
Yes I'm aware that it improves the speed a lot, but in my use case, I use chokidar to watch a directory, and then fork a child_process for uploading it to S3 every time new file added.
When I tried to run PM2 in cluster mode, and it does the same thing twice. How can I improve the app performance without getting this?
HI Nice article Found very useful, If i use the cluster then how socket communication works.
there will be one main process which has the connection and distributes the requests to one of your workers. You do not even have to care, just use the socket connections as if there was only one process. You can read more about it in the official docs
Excellent post . i'm building a node app which is completely stateless and uses JWT for authentication. i'm planning to host the app on 10 servers each having only i cpu CORE (as it is stateless there is no need to bother about session sync) each server has nginx to frontload node app on that particular server . and all these servers will be front loaded by 1 load balancer . so my question is in my scenario do i really require cluster module ?
As far as I understood, even having only 1 CPU core, the cluster stills brings to you the benefit of having no downtime when reloading the application on a specific server. Yeah, I know... you already have other 9 servers to take care when 1 server is restarting the app... It may be a "micro-benefit", but it is still a benefit. (Specially if you start with one server alone before adding the others.)
I would like to add the following topics as what you shouldn't do in production:
- Serve static assets not minified and gzip'ed (see grunt)
- Disallow local caching for those static assets
- Keep access logging enabled
- Store uploaded files on disk folders instead of a virtual file system like GridFS
Why use gridfFS if the file is not meant to be streamed back to the client and only used for processing? Just curious. We at our startup take in a video and generate meta data off it with AI. Summary, questions, chapters etc. (Everything patented) We need to only process the video and not serve it back. Even if we do need to stream it, I'm sure grid FS won't do any justice. Or would it?
Also some thoughts from here:
Node.js is single threaded in nature
This is not entirely true, since the underlying v8 engine of course uses multiple threads. By default, these are 4, but can be increased. Threading is used for any non-blocking I/O (that is, file I/O, network I/O). And as you already suggested, not to do heavy lifting (=CPU intensive work), it means that node.js/v8 is already very well prepared for parallel processing and utilizing multiple CPUs/cores. Therefore, I would disagree that the clustering module is a must.
Not using Node.js Cluster
You already suggested nginx as reverse proxy, it should be mentioned that nginx can also be very well used as a load balancer for node.js applications. For a comparison see this link: keithcirkel.co.uk/load-balancing-node-js
Especially its capability to dispatch requests to node.js processes according to their current number of pending requests can probably be superior to the common round-robin approach.