How to use Geospatial Indexing in mongo using nodejs and mongoose

Posted at January 26, 2013

A couple of weeks ago I build a mobile app for school that shows you cultural stuff todo around you. It’s called randomapp (the assignment was to build an app using the Artsholland API. Therefor it only works in the Netherlands).

I’ve decided to build the backend in Mongo because of the recently added support for geospatial. I’ve got more experience in MySQL but the idea of having to write all the logic involved with querying location data myself alone was enough for me to choose for Mongo.

The backend for the app is a very simple API written in Node to query the Mongo collection location data. In this tutorial I’ll show you how some easy steps to get started asking Mongo questions about locations. I definitely recommend you check out the official documentation which covers everything I’m about to explain plus a lot more. Though this tutorial is more of an introduction to the subject.

Setup

For this tutorial I used a couple of node modules, install them by running:

npm install mongoose underscore artsholland
  • Mongoose is an ORM wrapper for talking to MongoDB.
  • Underscore is a toolbox with utility functions.
  • Artsholland is a simple wrapper around the Artsholland API. I only need this to get some geo data into Mongo. You can also download a dataset from the web or get it somewhere else.

Prepare Mongoose

I’m going to store venues in Mongo, for the sake of simplicity a venue only consists of an url and a geo location. This code assumes you have Mongo running on your local machine.

var mongoose = require('mongoose');
mongoose.connect('localhost', 'your-geo-app');

First let’s create a schema. The docs give us some info on how we can store geo data. It’s recommended to store the long and lat in an array. Note that you should always store longitude first!

var venueSchema = mongoose.Schema({
  url: String,
  loc: []
});

Now lets tell Mongo the loc field contains geo data that we want to query later on.

venueSchema.index({ loc: '2d' });

And create a venue object wich we can use to create new documents or query the existing ones.

var Venue = mongoose.model('venues', venueSchema);

Grab the data

The Artsholland wrapper makes it really easy to get some data into node.

var ah = require('artsholland');
ah.api.key = 'my secret key';
ah.venue({per_page: 250}, console.log);

Now we get a JSON response containing a bunch of venues like so:

{ metadata: { per_page: '250', page: '1' },
  results: 
   [ { uri: 'http://data.artsholland.com/venue/16ae85e1-ac48-4f33-be13-3c09edf78070',
     attachment: 'http://data.artsholland.com/venue/16ae85e1-ac48-4f33-be13-3c09edf78070/attachment/1',
     cidn: '16ae85e1-ac48-4f33-be13-3c09edf78070',
     locationAddress: 'http://data.artsholland.com/address/1097ga124',
     telephone: '020 - 694 04 82',
     venueType: 'http://purl.org/artsholland/1.0/VenueTypeMuseum',
     created: '2012-03-07T21:42:35Z',
     modified: '2012-07-03T10:44:41Z',
     type: 'http://purl.org/artsholland/1.0/Venue',
     sameAs: 'http://resources.uitburo.nl/locations/16ae85e1-ac48-4f33-be13-3c09edf78070',
     geometry: 'POINT(4.937161 52.34592)',
     lat: '52.34592',
     long: '4.937161',
     email: 'info@totzover.nl',
     homepage: 'http://www.totzover.nl' },
     // etc..

But we don’t really care about anything except for the lat, long and homepage. Using underscore we can transform this array really quickly into something we want:

ah.venue({ per_page: 250 }, function(venues) {
  var formatted = _.compact(
    _.map(venues.results, function(v) {
      if(!v.homepage || !v.long || !v.lat )
        return;
      return {
        url: v.homepage,
        loc: [ parseFloat(v.long), parseFloat(v.lat) ]
      }
    })
  );

We use Map to loop over the results array and run the function we provide. If we have all the required properties we return an object that looks exactly like how we defined it in our Schema, if we don’t have the required properties we return false. By wrapping the resulting array into an compact we can filter out the items without all the required properties (we just returned them as a null).

Now formatted is an array like this:

{ url: 'http://www.museumamsterdamnoord.nl',
  loc: [ 4.926208, 52.388065 ] },
{ url: 'http://www.nai.nl', loc: [ 4.471547, 51.914227 ] },
{ url: 'http://www.couperusmuseum.org',
  loc: [ 4.3077016, 52.088654 ] },
{ url: 'http://www.dickbrunahuis.nl',
  loc: [ 5.12489, 52.08352 ] },
{ url: 'http://www.scoutmuseum.nl',
  loc: [ 4.454781, 51.91882 ] },
// etc..

Insert the data in Mongo

This is great! Now we can just loop over all the items and add them to Mongo.

var storeVenue = function(v) {
  new Venue(v).save(function(err) {
    if(err) throw err;
    console.log('saved #' + (i++) + ':\t' + v.url);
  })
};

and at the end of our previous function we can just loop over the formatted array:

_.each(formatted, storeVenue);

We can verify we added the objects to Mongo correctly by running:

Venue.find({}, console.log);

Which should log all the Venues.

The whole point of doing this in Mongo is that we can skip this step, yay! We already told Mongo that the loc field holds Geo data so Mongo does the rest.

Querying location data

If you need to get some sample geo locations for testing your queries you can use Google Maps (Right click – what’s here – the lat/long is in the search bar), but note that those numbers are lat/long and we stored everything long/lat, so you have to flip them.

We can use the query API provided by Mongoose to check what venue in our collection is closest to point x. In the case of RandomApp point x was the location of the user (provided by HTML5Geolocation). Example:

Venue.find({ loc : { '$near' : [4.881213, 52.366455] } }, console.log);

This returns a lot of venues sorted on how close they are to the provided location. If you only want the top 5 for example you can just do:

Venue.find({ loc : { '$near' : [4.881213, 52.366455] } }).limit(5).exec(console.log);

The main problem is that you don’t get the distance back as well. You could calculate those by hand but then you’re just calculating stuff Mongo already calculated for you. If you also want that data back from Mongo you have to use another way of querying Mongo, a way not yet provided by the Mongoose query API. But because Mongoose gives us access to the db object we can basically do everything by hand too.

An important thing to note is that Mongo only has the coordinates to calculate the distance. And 1.0 coordinates is not always the same in miles / meters, that all depends on where you are on the globe (that is as long as you use geospatial for storing GEO data). Mongo does takes this into account by converting the distance between coordinates into radians. However we can just tell Mongo how many ‘human’ units go into one radian and it will convert the distance to that unit instead.

 mongoose.connection.db.executeDbCommand({ 
  geoNear : "venues",  // the mongo collection
  near : [4.881213, 52.366455], // the geo point
  spherical : true,  // tell mongo the earth is round, so it calculates based on a 
                     // spherical location system
  distanceMultiplier: 6371, // tell mongo how many radians go into one kilometer.
  maxDistance : 1/6371, // tell mongo the max distance in radians to filter out
}, function(err, result) {
  console.log(result.documents[0].results);
}); 

The results we get back our more raw and have more information, this includes the distance (now in kilometers). The array logged looks like this:

[ { dis: 0.14177381336640812,
  obj: 
   { url: 'http://www.melkweg.nl',
     _id: 5105417a62923ed51f000046,
     loc: [ 4.881217, 52.36518 ],
     __v: 0 } },
  // etc..

And we can see that de Melkweg is only 0.14 kilometers away! (The search location was my home, and the result is around the corner :))