My fourth open source release this week comes from work I’ve done for my employer, The Bivings Group. Today, we are releasing a set of code that assists with aggregating markers on a Google Map. Our clients wanted to be able to display markers on a map reflecting the locations of people who provided their location (city, state, zip, and in some cases, street address), but with tens of thousands of expected sign-ups, it’s not feasible to display all the points on the map at once.
One of the challenges with displaying a map with lots of markers is that if there are too many markers in one location, the placemarks themselves quickly become an unmanageable sea that obscures the map. In some cases, this is the desired effect, but in our case, we wanted to present a cleaner map that showed a single marker that provided an indication of the number of people present in an area. (Less markers also has the advantage of loading faster.)
One solution to this problem is, instead of showing all of the markers all at once, you can group all of the markers that are close together into one. But, how do you determine if two locations are “close together” so that you can do that aggregation?
Geohash
One way is to use a geohash, which takes a latitude and longitude coordinate and, using a hashing algorithm, mixes the two numbers together into a a string that represents the position. More accurately, it represents a box which contains the coordinate in question. For example, the coordinates of the TBG office (38.9193540 °N, 77.0705180 °W) is represented by the geohash dqcjqjncwv2t9
. This string has an interesting property: as you remove characters from the string, you get a less accurate representation of the location – the box gets larger. With a list of addresses stored in geohash format, you can do a search for all of the addresses whose geohashes all start with the same characters.
However, because of the way geohashes are encoded, different geohashes have different shapes, there is some overlap between different geohashes, and in some cases, removing a character from a geohash can result in a larger shape on the map that doesn’t cover the entirety of the more specific geohash. As a result, we determined that they weren’t suitable for our needs: because of the overlap, it was possible for one location to appear to be in the space covered by multiple geohashes, thus potentially leading to unreliable counting.
Quadtree
Some further searching led me to maptiler.org, which provided an alternate answer. Since the Earth is a sphere, it can be broken up into quadrants (corresponding to the four hemispheres: north west, north east, south west, and south east), which (with some stretching) are square. Each of those quadrants can be subdivided into subquadrants, and those into more subquadrants, each of which contains one quarter of the total area covered by the quadrant that contains it.
In fact, this is how Google Maps map tiles work. When zoomed all the way out, the entire earth is represented as a 256 x 256 pixel map tile image. Each larger zoom level contains four times the number of map tiles as the previous zoom level, with no overlap. At the maximum zoom level (20), the entire Earth is represented by 419 = 274,877,906,944 256 x 256 pixel images.
How does that help us? Well, if we number each of the quadrants (0 = NW, 1 = NE, 2 = SW, 3 = SE), and add a digit for each sub-quadrant we “zoom in” to, we get a string that represents a square on the map at a zoom level equivalent to the length of the string. For example, the coordinates of the TBG office can be represented as 0320100322312003121
. If we remove a character from the hash, it’s guaranteed that the larger space contains all the possible locations in the smaller space. There’s no overlap between hashes of the same length. This means we’re guaranteed the accurate counting we wanted. And we can very conveniently scale our searches of the database of addresses with the zoom level of the map window we’re viewing the results in.
Pulling it all Together: GlobalMapTiles
On the client’s map page, whenever the map position or zoom level changes, we send a few ajax queries to the web server with quadtree codes covering the map window. The server obligingly searches the database for the number of people in the area represented by the code, and returns a result list containing the number of people in the quadtree a level or two deeper. We then take the results, convert the quadtree code back into a coordinate on the map, and add a marker at its center.
As luck would have it, there’s a Python script, globalmaptiles.py
on the maptiler.org page that handles the conversion from latitude and longitude to these quadtree coordinates. Although Python is a fine language (and some of the most fun code I’ve written has been in Python), we needed this code in PHP and JavaScript. But that’s ok. The code is almost entirely simple mathematical operations, and that’s very easy to port from Python to JavaScript and PHP. This port, plus some additions of my own to facilitate turning a quadtree string back into its bounding coordinates on the map is what we’re releasing today.
This is, of course, only half of the solution. We now have a system by which we can identify locations on the map which are nearby and aggregate them together, but we still need a way to pull that information out of the database and send it on-demand to the web browser to be displayed. That’s a more interesting problem (and solution). I’m still refining that code and adding additional functionality, and we’ll release that soon, when we’re satisfied with it.
As the original implementation was licensed with the X11 license, GlobalMapTiles is also licensed under the X11 license. For more details on how to use it, and to download, see the Aggregate Map Tools page.
Tags: google maps, JavaScript, open source, PHP, python