Storing tweets using nodejs
Posted at April 20, 2012
Since I have my own VPS I’m constantly trying out new stuff. After messing around with nodejs I have my own little script (just over 100 lines) that connects to Twitter using their streaming API, get’s all tweets that contain certain words, do a minimal amount of parsing and store it all in a SQL database.
Nodejs does not contain a lot of stuff out of the box which I need, the power lies in the huge amount of ‘modules’ (extenstions) for node. They even got their own package manager! In this example I use the modules node-mysql, ntwitter and forever.
Installing all those couldn’t be easier:
npm install mysql ntwitter forever
In my script I fetch all tweets by people who use nikeplus (app) to keep track of their running results. By storing all those tweets I can later create something cool with all the data. I use the streaming API to keep a connection open, to do this you need to register your own app at dev.twitter.com to get the required keys.
Setup:
var sys = require('util');
/* setup mysql connection */
var Client = require('mysql').Client;
var client = new Client();
client.user = 'sql username';
client.password = 'sql password';
console.log('Connected to MySQL');
Now we have a we have twit
object for opening a Twitter stream and a client
object for talking to our SQL server.
Connect to mysql
ClientConnectionReady = function(client) {
// nikeplus is the name of the database
client.query('USE nikeplus', function(error, results) {
if(error) {
console.log('ClientConnectionReady Error: ' + error.message);
client.end();
return;
}
// start the stream
watchTwitter();
});
};
This function connects to the database ‘nikeplus’ and gives us a callback function for what to do when the connection is set up. In the callback we call for watchTwitter
. The last line runs the function (and starts the process).
Connect to the Twitter stream API
watchTwitter = function() {
twit.stream('statuses/filter', {track:'makeitcount,nike+'}, function(stream) {
stream.on('data', function (data) {
parseTweet(data);
});
stream.on('end', function (response) {
console.log('disconnected :(');
end();
});
stream.on('destroy', function (response) {
console.log('disconnected :(');
end();
// Handle a 'silent' disconnection from Twitter, no end/error event fired
});
});
}
This function sets up a stream to the Twitter API. Everytime a new tweets comes in we can do something with it by using a callback. In this callback I call for the function parseTweet
and feed it the tweet data.
Parsing the tweets
The nikeplus app send out tweets for your running prestations. I want to parse the distance runned. Here is an example of such a tweet:
I just finished a 13,0 km run with a pace of 4’27″/km with Nike+ GPS. #nikeplus #makeitcount
— Mike van Rossum (@mikevanrossum) April 21, 2012
The pace is quite rare (it’s opt-in) but I’m very interested in the distance which can be either in kilometer or in mile. The function below parses the tweet, the distance and a couple of other things I want to store.
parseTweet = function(tweet) {
if(!tweet.text) {
return;
}
// store the body of the tweet
var text = tweet.text;
// if this fails these checks it does not hold any data I want
// for now I only work with english tweets.
if(text.match(/bnikeplusb/) && text.match(/bfinishedb/)) {
var complete = 0,
words = text.split(' '),
distance,
i = words.length;
// check for every word if its 'km' / 'mi' followed by 'run'
// if it is parse the value (in km) * 100 to int
while(--i) {
// check for metric system
if(words[i] === 'km' && words[i+1] === 'run') {
// parse the distance
var d = words[i-1];
d = d.replace(',', '.');
d = parseFloat(d).toFixed(2);
distance = parseInt(d * 100);
// else check for english system
} else if(words[i] === 'mi' && words[i+1] === 'run') {
// parse the distance
var d = words[i-1];
d = d.replace(',', '.');
// 1.609344 mile = 1 km
d = parseFloat(d).toFixed(2) * 1.609344;
distance = parseInt(d * 100);
}
}
// get everything we want to store
var user = tweet.user,
t = [
tweet.user.screen_name,
user.id,
tweet.id_str,
tweet.created_at,
text,
distance,
user.followers_count,
user.statuses_count,
tweet.coordinates
];
// add it to the DB
insertTweet(t);
Storing the tweet in the database
The parse function ended with an array of stuff about the tweet. This function stuffs that array into my SQL database.
insertTweet = function(tweet) {
// build query and run
client.query('INSERT INTO tweet SET username = ?, userid = ?, fid = ?, date = ? , tweet = ?, distance = ?, userfollowercount = ?, userstatuscount = ?, geo = ? ', tweet,
function(error, results) {
if(error) {
console.log("ClientReady Error: " + error.message + 'at: ' + tweet[3]);
} else {
console.log('tweet added at: ' + tweet[3]);
}
}
);
}
And that’s about it, The script has been running for 2 hours and it already fetched 797 tweets.