Storing tweets using nodejs

Posted at April 20, 2012

Since I have my own VPS I’m constantly trying out new stuff. After messing around with nodejs I have my own little script (just over 100 lines) that connects to Twitter using their streaming API, get’s all tweets that contain certain words, do a minimal amount of parsing and store it all in a SQL database.

Nodejs does not contain a lot of stuff out of the box which I need, the power lies in the huge amount of ‘modules’ (extenstions) for node. They even got their own package manager! In this example I use the modules node-mysql, ntwitter and forever.

Installing all those couldn’t be easier:

npm install mysql ntwitter forever

In my script I fetch all tweets by people who use nikeplus (app) to keep track of their running results. By storing all those tweets I can later create something cool with all the data. I use the streaming API to keep a connection open, to do this you need to register your own app at dev.twitter.com to get the required keys.

Setup:

var sys = require('util');

/* setup mysql connection */
var Client = require('mysql').Client;
var client = new Client();
client.user = 'sql username';
client.password = 'sql password';
console.log('Connected to MySQL');

Now we have a we have twit object for opening a Twitter stream and a client object for talking to our SQL server.

Connect to mysql

ClientConnectionReady = function(client) {
   // nikeplus is the name of the database
   client.query('USE nikeplus', function(error, results) {
      if(error) {
         console.log('ClientConnectionReady Error: ' + error.message);
         client.end();
         return;
      }
      // start the stream
      watchTwitter();
   });
};

This function connects to the database ‘nikeplus’ and gives us a callback function for what to do when the connection is set up. In the callback we call for watchTwitter. The last line runs the function (and starts the process).

Connect to the Twitter stream API

watchTwitter = function() {
   twit.stream('statuses/filter', {track:'makeitcount,nike+'}, function(stream) {
      stream.on('data', function (data) {
         parseTweet(data);
      });
      stream.on('end', function (response) {
         console.log('disconnected :(');
         end();
      });
      stream.on('destroy', function (response) {
          console.log('disconnected :(');
          end();
         // Handle a 'silent' disconnection from Twitter, no end/error event fired
      });
   });
}

This function sets up a stream to the Twitter API. Everytime a new tweets comes in we can do something with it by using a callback. In this callback I call for the function parseTweet and feed it the tweet data.

Parsing the tweets

The nikeplus app send out tweets for your running prestations. I want to parse the distance runned. Here is an example of such a tweet:

I just finished a 13,0 km run with a pace of 4’27″/km with Nike+ GPS. #nikeplus #makeitcount

— Mike van Rossum (@mikevanrossum) April 21, 2012

The pace is quite rare (it’s opt-in) but I’m very interested in the distance which can be either in kilometer or in mile. The function below parses the tweet, the distance and a couple of other things I want to store.

parseTweet = function(tweet) {
   if(!tweet.text) {
      return;
   }

// store the body of the tweet
   var text = tweet.text;

// if this fails these checks it does not hold any data I want
   // for now I only work with english tweets.
   if(text.match(/bnikeplusb/) && text.match(/bfinishedb/)) {

  var complete = 0,
     words = text.split(' '),
     distance,
     i = words.length;

  // check for every word if its 'km' / 'mi' followed by 'run'
  // if it is parse the value (in km) * 100 to int
  while(--i) {
     // check for metric system
     if(words[i] === 'km' && words[i+1] === 'run') {

        // parse the distance
        var d = words[i-1];
        d = d.replace(',', '.');
        d = parseFloat(d).toFixed(2);
        distance = parseInt(d * 100);

        // else check for english system
     } else if(words[i] === 'mi' && words[i+1] === 'run') {

        // parse the distance
        var d = words[i-1];
        d = d.replace(',', '.');
        // 1.609344 mile = 1 km
        d = parseFloat(d).toFixed(2) * 1.609344;
        distance = parseInt(d * 100);
     }

  }

  // get everything we want to store
  var user = tweet.user,
     t = [ 
     tweet.user.screen_name, 
     user.id, 
     tweet.id_str, 
     tweet.created_at, 
     text, 
     distance, 
     user.followers_count, 
     user.statuses_count,
     tweet.coordinates 
  ];
  // add it to the DB
  insertTweet(t);

Storing the tweet in the database

The parse function ended with an array of stuff about the tweet. This function stuffs that array into my SQL database.

insertTweet = function(tweet) {
   // build query and run
   client.query('INSERT INTO tweet SET username = ?, userid = ?, fid = ?, date = ? , tweet = ?, distance = ?, userfollowercount = ?, userstatuscount = ?, geo = ? ', tweet,
      function(error, results) {
         if(error) {
            console.log("ClientReady Error: " + error.message + 'at: ' + tweet[3]);
         } else {
            console.log('tweet added at: ' + tweet[3]);
         }
      }
   );
}

And that’s about it, The script has been running for 2 hours and it already fetched 797 tweets.