Multithreaded languages like Java often have concepts called concurrency, thread safety etc. What does it mean and why don't we have such concepts in other platforms like Node.js etc?
When programming with a classical imperative language, like Java, each command to the CPU happens one after another. In today's world, however, we have multi-core processors. Let's assume you have a CPU with four cores. If one command has to be processed, one core will work on it. Is that efficient? No. Wouldn't it be better to issue four commands simultaneously to four cores, so each core can work on one command? There are actually two ways to accomplish just that. One: You start a second instance of the program (a second process). Some programs do that internally, like Google Chrome. Just take a look at your task manager and you will see all the processes.
Processes bring a big overhead, since they need their own memory regions, that's why for processing, programs can use a more light-weight method to work on several task at a time: Threads (which all use the same memory of the main program)! Imagine a program as one thread of some candy, but full of commands. One core will eat one thread, executing all commands in the thread while doing so. Multi-threading means that you split up that one thread into multiple threads, feeding more cores.
So what is thread safety?
As I just said, processes have their own memory regions each, so they can do whatever they want. Threads, however, use one memory region together. That means they can communicate more effectively, as they do not have to send data over to some other place. The downside of that is: What happens when two threads want to write to the same memory location? What happens if one thread writes to a memory location another thread is just reading from? What happens if one thread reads a value, increments it and writes it back, but before writing it back, another thread pushes a new value, which is then overwritten? Want my answer to those questions? SCARY STUFF happens. That's why you need some kind of memory guard, which makes sure that only one thread works on a memory location at a time. There are different concepts for guarding memory. Just google "mutex", to name one.
Why does NodeJS not have any multi-threading concepts?
Easy. JS is SINGLETHREADED. There is one thread. One operation at a time. Since only one operation works on a memory region, there is no need to guard it.
What about async code in JS? Actually, the JS interpreter is implemented in a very specific way. It uses multiple threads in its own low-level implementation. Whenever it has to wait for a database or HTTP request or whatever, the interpreter spawns a thread, which waits until there is a result and just works on something else in the meantime. Once the result to whatever you asked for arrives, it is put into a big queue. The JS interpreter has one thread which is allowed to take items out of the queue and work on them. So when it pulls the result of your DB query from the queue, it continues inside your handler function, until the handler function finishes and the interpreter would go out of scope. At that point, the interpreter just pulls the next item from the queue and works on that. This way of working is the exact reason why you never know what the interpreter will work on next and why JS is asynchronous. That's also the reason why I prefer the term "pseudo async". JS is not really async, but the interpreter is. However, when writing JS, you don't care about the interpreter and just write single-threaded code.
Can I run code in parallel in NodeJS?
Yes. Please take a look at the Cluster Module, which allows you to spawn multiple processes. Since JS has no multi-threading, spawning processes is the only way to go.
@fibric Thank you for your comment. I actually wrote that NodeJS uses multiple threads further down (see Why does NodeJS not have any multi-threading concepts? second paragraph) :) Is that explanation satisfying?
@maruru read it a second time, and I can live with it. I was a bit confused at first because I translated multi-threading into concurrent processing - that is what we Node.js developers do every time. Sometimes I read was is clearly not written... My bad.
@fibric no problem. You are one of the awesome guys developing what we build our solutions on. So if my lingo is not correct, please help me improve it!
Marco Alka
Software Engineer, Technical Consultant & Mentor
What are threads?
When programming with a classical imperative language, like Java, each command to the CPU happens one after another. In today's world, however, we have multi-core processors. Let's assume you have a CPU with four cores. If one command has to be processed, one core will work on it. Is that efficient? No. Wouldn't it be better to issue four commands simultaneously to four cores, so each core can work on one command? There are actually two ways to accomplish just that. One: You start a second instance of the program (a second process). Some programs do that internally, like Google Chrome. Just take a look at your task manager and you will see all the processes.
Processes bring a big overhead, since they need their own memory regions, that's why for processing, programs can use a more light-weight method to work on several task at a time: Threads (which all use the same memory of the main program)! Imagine a program as one thread of some candy, but full of commands. One core will eat one thread, executing all commands in the thread while doing so. Multi-threading means that you split up that one thread into multiple threads, feeding more cores.
So what is thread safety?
As I just said, processes have their own memory regions each, so they can do whatever they want. Threads, however, use one memory region together. That means they can communicate more effectively, as they do not have to send data over to some other place. The downside of that is: What happens when two threads want to write to the same memory location? What happens if one thread writes to a memory location another thread is just reading from? What happens if one thread reads a value, increments it and writes it back, but before writing it back, another thread pushes a new value, which is then overwritten? Want my answer to those questions? SCARY STUFF happens. That's why you need some kind of memory guard, which makes sure that only one thread works on a memory location at a time. There are different concepts for guarding memory. Just google "mutex", to name one.
Why does NodeJS not have any multi-threading concepts?
Easy. JS is SINGLETHREADED. There is one thread. One operation at a time. Since only one operation works on a memory region, there is no need to guard it.
What about async code in JS? Actually, the JS interpreter is implemented in a very specific way. It uses multiple threads in its own low-level implementation. Whenever it has to wait for a database or HTTP request or whatever, the interpreter spawns a thread, which waits until there is a result and just works on something else in the meantime. Once the result to whatever you asked for arrives, it is put into a big queue. The JS interpreter has one thread which is allowed to take items out of the queue and work on them. So when it pulls the result of your DB query from the queue, it continues inside your handler function, until the handler function finishes and the interpreter would go out of scope. At that point, the interpreter just pulls the next item from the queue and works on that. This way of working is the exact reason why you never know what the interpreter will work on next and why JS is asynchronous. That's also the reason why I prefer the term "pseudo async". JS is not really async, but the interpreter is. However, when writing JS, you don't care about the interpreter and just write single-threaded code.
Can I run code in parallel in NodeJS?
Yes. Please take a look at the Cluster Module, which allows you to spawn multiple processes. Since JS has no multi-threading, spawning processes is the only way to go.