So You Want To Use A Metro Extract

extract-acular

Mapzen’s Metro Extracts download page accounts for a huge amount of traffic to our website—it’s more popular than our actual home page. This is awesome, because they are a super-useful resource, and it’s cool to see them being used by so many people.

For the most part, people visiting the Metro Extracts page know what they want and know their way around geodata. For the remaining two-thirds of our traffic that aren’t going to the Metro Extracts page, you might not know exactly why or how to use them, which file format to download, or why there might be what look like inconsistencies in the downloaded data. This post is an overview of what metro extracts are, the file formats we offer, and what you might want to know to work with them.

Background: Metro Extract History

Mike “King Among Cartographers” Migurski initially created OSM Metro Extracts in 2011, responding to a really basic problem: if someone wanted to use OSM data to make a map of, say, just the New York City metro area, she could either do a total planet download or download individual states from Geofabrik and cobble together a metro area.

Basically, a workflow that leads to making this facial expression:

seriously?

So Migurski set up and, for a while, hosted Metro Extract downloads. And awesome people like Nelson Minar and Smart Chicago and a ton of other people in the mapping community contributed to it and helped maintain it. With Extractotron, if that same user wanted to get OSM data of just the New York City metro area, she could do so really easily.

And in its time the Extracts came to reflect the size of the City, then the Province. And slowly but surely Migurski found himself building a map of the Empire which coincided point for point with Empire itself.

(Actually, it was just a kind of unwieldy project to maintain, but I had to hit my weekly Borges jokes quota.) Eventually Mapzen took on running and releasing Metro Extracts. We made a chef recipe to do it (if those words just mean food to you, go here), which makes maintenance and updating the extracts easier.

Which brings us to now, and to our download page.

File formats

Let’s say that I want to work on a mapping project and it only concerns New York City. Here’s what I see when I look at the available NYC metro extracts:

NYC Extracts

The file formats as read from left to right basically go from the rawest form available for OSM data to the most structured, prepackaged format. (At Mapzen we make the “server farm to data table” joke a lot; in this case imagine we’re moving from a bowl of coagulating soymilk to neat pre-packaged, pre-flavored tofu cubelets.)

OSM PBF and OSM XML

OSM is a special community. Likewise, OSM data is really special. So special, it gets its own file format that nobody else uses, .osm. These files can be compressed, either as XML .bx2 or .pbf. There are grimier details of .pbf versus XML, for the purposes of this post let’s just note that .pbf is smaller (more on .pbf here).

Why would I want this format? Let’s say I don’t want to deal with messy admin polygons in my OSM data, and that I also want to filter for some specific tagged OSM data, like amenity=police. I could use some of the same command line tools that generate our Metro Extracts—like Osmosis, osm2pgsql, and ogr2ogr to generate a GeoJSON with an OSM dataset custom to my needs. If you’re real particular about what you need to extract from a metro extract, this is probably for you.

But if you want everything and if you don’t really want to do more refining of the data yourself, maybe one of these shapefiles or GeoJSONs will serve your needs.

OSM2PGSQL and IMPOSM

If you’re working with spatial data, you’re most likely working with SQL data (listen, we can talk about hipster NoSQL stuff some other time, right now let’s stick to file formats). osm2pgsql and imposm are tools for importing .osm data into PostGIS. Mapzen’s chef recipe then generates shapefiles using the PostGIS command pgsql2shp and GeoJSONs using ogr2ogr. osm2pgsql and imposm carve up .osm data in different ways that you can configure yourself; for now let’s just talk about what Mapzen’s configuration generates.

Our osm2pgsql export chops up OSM data into 3 datasets: lines, points, and polygons. Let’s take a look at the point GeoJSON:

{
"type": "Feature",
"properties": {
    "osm_id": 368395980,
    "access": null,
    "aerialway": null,
    "aeroway": "helipad",
    "amenity": null,
    "area": null,
    "barrier": null,
    "bicycle": null,
    "brand": null,
    "bridge": null,
    "boundary": null,
    "building": null,
    "capital": null,
    "covered": null,
    "culvert": null,
    "cutting": null,
    "disused": null,
    "ele": "33",
    "embankment": null,
    "foot": null,
    "harbour": null,
    "highway": null,
    "historic": null,
    "horse": null,
    "junction": null,
    "landuse": null,
    "layer": null,
    "leisure": null,
    "lock": null,
    "man_made": null,
    "military": null,
    "motorcar": null,
    "name": "Unisys Heliport",
    "natural": null,
    "oneway": null,
    "operator": null,
    "poi": null,
    "population": null,
    "power": null,
    "place": null,
    "railway": null,
    "ref": null,
    "religion": null,
    "route": null,
    "service": null,
    "shop": null,
    "sport": null,
    "surface": null,
    "toll": null,
    "tourism": null,
    "tower:type": null,
    "tunnel": null,
    "water": null,
    "waterway": null,
    "wetland": null,
    "width": null,
    "wood": null,
    "z_order": null
},
"geometry": {
    "type": "Point",
    "coordinates": [
        -74.50099,
        40.3709408
    ]
}
}

So that’s a lot of information to explain that this is a helipad. Basically every OSM tag that could be applied to a point, line, or polygon is stored as a feature property within that point, line, or polygon.

imposm exports are a little more granular—there are 18 separated datasets, most of which are kind of important OSM tags that intuitively make sense to separate out (administrative polygons, waterways, roads) and versions of the same dataset that have been “generalized”—i.e., simplified (if the filename has the suffix “gen” that’s what it means).

So should I download imposm files or osm2pgsql? It depends what you want to do and whether you prefer a slightly more granular extract.

Some more persnickety projection information you might want to know

  • What projections do shapefiles and GeoJSONs use?
    • imposm shapefiles: EPSG:4326 (EPSG:3857 for downloads prior to June 20, 2015)
    • osm2pgsql shapefiles: EPSG:4326
    • GeoJSONs (imposm and osm2pgsql): EPSG:4326
  • Why is the imposm projection different? UPDATE: We’re using imposm3, which previously supported only EPSG:3857. Because the imposm3 export tool now has EPSG:4326 capabilities, we updated the Metro Extracts imposm shapefiles to use EPSG:4326, too. This means all extracted shapefiles and GeoJSONs use EPSG:4326.

Adding new cities

Adding a city to Metro Extracts is as simple as updating this cities.json file and issuing a pull request. The cities are nested within regions (i.e. continents), and the bounding box coordinates should be rounded to the third decimal place. If you don’t know how to determine the bounding box for your metro area, try using the ‘export’ tool on openstreetmap.org (instructions here).

Cool. If you landed on this page, you might be new to working with OSM data. Welcome to a weird, wonky world. It’s got lots of helpful people and tools, and it probably needs you. Go forth and map.

Doing more with Metro Extracts

If you want to learn more about the Metro Extracts formats and what you can do with the data, follow this tutorial. In the lesson, you will review the available file formats, load the Metro Extracts data into QGIS, perform attribute queries, and change the symbols used to draw the features.

You can find the tutorial at http://git.io/vmN0x.


This post, like all detailed introductory blog posts before it, is indebted to other detailed introductory blog posts. Fittingly, perhaps the most useful point of reference for this one is by Mike Migurski.

Note: This post was updated on June 20, 2015 to include revised coordinate system information. It was also updated on July 27, 2015 to include links to a tutorial on using Metro Extracts.