Who's on First? It's Mapzen Search

When we launched the first version of the Mapzen Search API in October 2015, we always planned to keep continually improving the service under the hood. We’re about to make our biggest change yet, adding the data service Who’s on First to Mapzen Search. Who’s on First is a gazetteer - a directory of places, linking their names to their locations - one that’s built to enable all kinds of forward-looking utilities and features. As with all data powering Mapzen Search, Who’s on First is 100% open data. And while this change is important in situating Mapzen Search for the future, users won’t need to make any changes to start seeing its benefits.

Switching to Who’s on First as our core gazetteer means there will be some other implications for our underlying data. We’ll be removing our current gazetteer, Quattroshapes (though it will live on on through its data’s inclusion in Who’s on First). Now, when you do a search or geocode, you’ll be getting results by default from OpenStreetMap (addresses, streets, places of interest), OpenAddresses (addresses), Geonames (venues, areas of interest), and Who’s on First (named areas). Existing users don’t need to worry: you’ll be able to continue making the exact same API calls you’re making today; you’ll be getting the same data from Who’s on First.

For self-hosted users of Pelias, we’re upgrading our data importers to use Who’s on First, but they’ll still work if you rely on Quattroshapes for your services as well.

You might be wondering what part something that sounds like a 19th century newspaper plays in a modern geocoder?

Brookes Gazetteer

Title page of the Brookes Gazetteer, via GEDCOM Index

Gazetteers are indexes that connect the names of places to their geographic locations. And they’re incredibly useful for connecting named places (and information about those named places) to locations. In Mapzen Search the gazetteer plays a special role, in that we use it to find “named places” (countries, regions, continents, cities, and neighborhoods), but also use it to find what places fall inside of those places, so that when we get a data source like OpenAdddresses or OpenStreetMap that’s not guaranteed to have the city or state for a particular venue or address, we can add that information in automatically.

Geonames is perhaps the most used open data gazetteer in the world, originating 125 years ago as part of the United States’ Board on Geographic Names. It forms a remarkably comprehensive directory of place names, place types, some name translations, and the point on the map that represents that place. Lots of folks use it.

But it’s not the only open gazetteer out there.

There’s Where on Earth, Yahoo!’s gazetteer (also known as Yahoo! Geoplanet, which took the time to translate most worldwide placenames into seven different languages and find common nicknames or colloquial names. In addition, it took the importance of situating each place in a global hierarchy (e.g. locality:Boston>county:Suffolk>region:Massachusetts>country:United States of America>continent:North America>planet:Earth), and ensuring it provided the records for those places, not just their names. For several years Yahoo! released this dataset under a Creative Commons license.

There’s also Natural Earth, a copyright-free global dataset from Nathaniel Vaughn Kelso and Tom Patterson. It represents most places as polygons, which is a far better way to represent things with borders points, like many of these other gazetteers have. This was further evolved with Quattroshapes, (which built on Alphashapes from Flickr and Betashapes from SimpleGeo in addition to Natural Earth), created by Vaughn Kelso and David Blackman at Foursquare. Foursquare wanted a gazetteer that could be used to reverse geocode places worldwide, to find out where their users where, but also to find out which cities and neighborhoods places were located in.

All of these form the core of Who’s on First, aggregating all their best properties.

From Natural Earth and Quattroshapes, Who’s on First has polygons to represent (most) places. It means we don’t just find the named places, we can find what places are within those places. From Where on Earth we have the rich global place hierarchy, so we can represent the complex political structures of governance. This lets us have places like the United Nations headquarters be legally run like its own country while falling within the territorial boundaries of New York City. Or representing places with many names, with their many names, in many languages.

The Future

What excites us most about Who’s on First is what it portends for the future of Mapzen Search. It’s the open data platform we’ll need to make a world class geocoder that works around the world.

For the first time, we can use a polygon-based gazetteer that’s getting constant editorial attention, so as the world changes, we can keep it up-to-date with the times. New places, updated populations, even temporary places will make their way to Mapzen Search through Who’s on First.

Who’s on First will also help us bring multiple languages and common nicknames into Mapzen Search. With multiple languages for nearly every country, region, and major city, you’ll be searching for “The Big Apple” or “Nueva York” in no time.

And Who’s on First is the new home of over 8 million venues from SimpleGeo, which we hope to make available soon for more point-of-interest searches based on open data.

Who’s on First is exactly what’s next for Mapzen Search. And it’s going to make finding places around the world a lot easier.