Introduction to Node.js and what I find so great about it

Posted at September 21, 2012

My knowledge of the inner workings of node.js and threading is pretty basic. I can’t guarantee everything written here is accurate. If you spot an error, please let me know!

I’ve been playing around with Node.js for a while now and even though the community behind it is getting bigger, there are a lot of people disliking a lot of things about Node.js. In this post I will explain some of the different inner workings and what I like about them.

I hear you ask yourself: “Why Javascript?”. Even though a lot of people think Javascript is slow, it’s not so bad actually. Note that a lot of things that make it slow in the browser (like the really slow DOM api) are non existent on the server.

From the Node.js homepage:

Node.js is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.

Node.js allows you to write your server side scripts in javascript. The V8 engine of Google (which is also a part of Chrome, and is one of the big reasons on why Chrome is so fast) will take your javascript scripts and compile them on the fly to machine code.

But the thing that is most different from Node to traditional server sided languages for doing web stuff is the way it works.

Threading

When writing a webpage in PHP, one might write something like this:

echo 'Welcome to my awesome website';

When someone visits the page. The script gets executed, simple as that. But what if 100 users visit the page? The script gets executed 100 times and for each time the server (apache probably) spawns a new thread, which is sort of like a process on the webserver.

While this is all fine and dandy, creating a new thread has some overhead and requires some time. While the web is shifting away from static websites and towards web apps, clients are connecting more and more to the webserver.

Node.js is non-blocking and event driven

If you were to write the same website in node, you would write something like this:

var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Welcome to my awesome website\n');
}).listen(1337, "127.0.0.1");
console.log('Server running at http://127.0.0.1:1337/');

We tell node to create a webserver (handled by Apache in the PHP example) and we define a function which get’s called on every request.

The moment you run this node will create a single process. And instead of spawning a new thread everytime someone does a request, we just use the current process and just call a function. As you can imagine this approach is (in this easy example) more scalable and memory efficient.

Because there is only one thread, we must make sure we don’t do something that takes a long time, else everyone connected has to wait for it.

Callbacks and events

That’s where callbacks and events come in. The main thread only runs javascript, but what if you need to get a record from a database? You run the query and pass it a callback (function to execute once the result comes in). This is called non-blocking IO, the main thread will not wait for stuff outside the node enviroment.

Asynchronous code

This model has an advantage and a disadvantage. The disadvantage is that instead of writing:

var result = SQL.query('SELECT * FROM mytable');
doSomethingAwesome(result);

You now have to write:

var result = SQL.query('SELECT * FROM mytable', function(){
  doSomethingAwesome(result);
});

This example doesn’t look to bad, but trust me that when you need to do ten queries (not uncommen) this looks like hell, callback hell to be exact. Even though there are libraries that try to make this as workable as possible, javascript itself does not have anything for us. And it will never be as simple as synchronous code.

Do not be feared: there is also an advantage!

If you would do something IO (SQL query for example) in PHP, your whole scripts waits for the result. This would be a disaster in Node because when waiting for a slow query no one could connect to your website. In PHP this doesn’t matter because each request has it’s own thread.

The CPU is not very good at multitasking, in fact it can only do one thing at once. When your PHP script waits for IO, the CPU is just going to do something different. This is called threading.

Threading solves a problem, however we have no control over it. With node when waiting for IO we can just do other things in the meantime, hereby it gives the control over our server back to us developers. Which can in turn also be seen as a disadvantage though, because it is more difficult and the code is less abstract.

Here is an easy example that shows the power of giving the control back:

Imagine you have to do five heavy SQL queries on each request, each of them take 0.5 seconds to complete. When solving this in a synchronous way, the request takes 2.5 seconds. However node is able to do this in 0.5 seconds because it can fire the five queries right after each other and return when the last one comes back.

connecting the dots

Node has an event loop. This basically comes down to different points in time where it executes a bunch of code.

If the result of your SQL query just comes in while node is running some random function, the callback gets scheduled for the next tick and it resumes what it was doing. When it has nothing left to do, the tick will finish and a new one will start if there is something new to do.

When doing memory intensive stuff (computation, parsing, etc.) you could stuff a couple of process.nextTick() between your functions to make sure you don’t block the main thread for to long.

Because of this model node.js is great when dealing with IO, but really bad for doing computation.