6 Node.js Modules You Should Be Using

Node.js was created six years ago and took some time to gain popularity. But as its kinks have been worked out, many websites are finding success with it. One aspect of Node.js that makes it popular is the large number of third-party modules available. The list continues to grow, and, as a result, it's hard to figure out which modules you need and which you don't need. There are many “best modules” lists out there, but they tend to be quite random. In this story, we worked to develop a list of modules that would be highly valuable to the ProgrammableWeb audience.

Some Words About Asynchronous Programming

Remember that Node.js is a platform that uses JavaScript for its language. To master Node.js, you need to master JavaScript. JavaScript includes the ability to pass functions as arguments to other functions. Node.js takes advantage of this by using a callback mechanism. For example, if you want to make a database call, you provide your query parameters along with a function that gets called after the database call completes. But this opens up some oddities that, if you're not careful, will result in bugs.

If you have a line of code that follows immediately after your database call, will that line of code execute before or after the callback function occurs? Under normal situations, the line following will indeed execute before the callback. And what if in that following line you're trying to access some of the data you were planning to retrieve from the call to the database? You can't, because in all likelihood the data hasn't been retrieved yet.

This can sound like a mess, but it really isn't. It just means that to get the most out of Node.js, you need to master its asynchronous nature and how your JavaScript code works together with the asynchronicity. The article Understanding process.NextTick is, in my opinion, one of the best explanations of how Node.js uses what's called an event loop to handle its asynchronous nature. (The author notes himself that the article is a bit outdated, but it is still an excellent explanation.)

You also need to understand how Node.js now has a function called setImmediate, and how it differs from nextTick. You can learn that by reading the entire (but short) page in Node.js's documentation for timers.

And finally, once you fully understand how all that works, you'll understand why the following modules will help you get your work done. When I write a Web application, I will typically do several steps with each incoming request. I might look up something in a database. I might save something else into a database table. I might log something, and so on. All of these happen in sequence, often requiring some results of the previous step, and most of them deal with callback functions. This can make for a mess in your code if you nest your callback functions:

lookupsomething(query, function(err, resp) {
    storesomething(data, function(err, resp) {
        logsomething(message, function(err, resp) {

With just a couple steps, it's not awful, but if you need to add a step in between another step, it can become a headache. For this I use the module called async.js. This is our first module, which I discuss next.

Async.js Module

async.js has several functions that help you organize what might otherwise be a mess of callback functions with ordering and timing problems. One function is called waterfall. You provide waterfall with an array of functions, and waterfall calls them in sequence, one after the other. Each function gets a callback function, and at the end of the function you call the callback function, which results in the next function getting called. The beautiful thing here is that if you need to add a step in between two steps, you just insert it in between two functions in the array. Even better: While you're making calls to a callback, async.js helps manage your program's call stack and timing appropriately.
async.js also has a handy function called eachSeries for taking an array of data, and calling a single function repeatedly, each time passing the next element in the array into the function. The reason you need this kind of thing is again because of the callback nature. Without the help of async.js, imagine you have an array of items and you want to insert each item into a database, with each insert requiring a separate database call. This might appear to work at first:

for (var i=0; i<data.length; i++) {
    writedata(data[i], function(err, resp) {
console.log('Finished saving!');

The writedata function writes the data, and here it gets called multiple times. However, each iteration doesn't actually call it yet because of the asynchronous nature in the Node.js database drivers (assuming they were coded correctly). Instead, all these calls get queued up and don't run until the event loop finishes. So when the console.log line runs, in fact, the data hasn't been saved; the code hasn't even started saving it.

What do you do if you have an error? That's where the eachSeries function comes in. Instead, you provide a single function, which receives as it parameter the next value from the array. At the end of your function you call a certain callback function that results in async.js starting the next iteration. You can learn about it and see examples in the async.js documentation.

Q module

While async.js is great, another aspect to Node.js programming is using what are called promises. Promises are functions that get called after a callback function completes. The approach looks like this:


The first function gets called, and, when it's finished, the function called anotherfunction gets called. Then after that's finished, the function called yetanotherfunction gets called.
As an exercise to fully understand how promises work, I'd like to challenge you to first consider how the promise function itself comes into existence and how it could get called, and then try building a mechanism yourself without using a helper library. This will help you understand the callback nature and timing nature of both Node.js and JavaScript itself.
Look at the first function; you call it manually by adding the parentheses after it. But look at the next two functions: anotherfunction and yetanotherfunction. In these two cases, you're not immediately calling them; instead, you're leaving off the parentheses and passing them into the next function. The next() function is what calls them. And where does the next function come from? The someDataLookup function must return an object containing a next function. And in the second call to next, the next function must also return an object containing a next function, so that it can be called again.

Jeff Cogswell