The Mapzen Libpostal API

We are happy to announce that access to the Libpostal address parsing and expansion services are now available via the Mapzen API.

Al Barrentine (Libpostal’s author) has written an exhaustive blog post describing what Libpostal is and and how it works. The post is aimed at a technical audience so we’ll just excerpt the short version here:

Libpostal uses machine learning and is informed by tens of millions of real-world addresses from OpenStreetMap. The entire pipeline for training the models is open source. Since OSM is a dynamic data set with thousands of contributors and the models are retrained periodically, improving them can be as easy as contributing addresses to OSM.

Each country’s addressing system has its own set of conventions and peculiarities and libpostal is designed to deal with practically all of them. It currently supports normalizations in 60 languages and can parse addresses in more than 100 countries. Geocoding using libpostal as a preprocessing step becomes drastically simpler and more consistent internationally.

Who’s On First is already using Libpostal for its internal editorial tool (more about that from Dan over here) and the Search team has started work on integrating Libpostal with the Pelias geocoder.

Now you can use Libpostal in your projects too, simply by calling the Mapzen API!

Some of you may have noticed that the newly parsed address string has grown a postal code, in the example above. This is not something that Libpostal does but rather the result of another little piece of data-magic we’ll talk about more soon.

cURL or it didn’t happen

To unwind and normalize all the possible variations for parts of an address string that may be encoded using abbreviations, or some other context-specific short-hand, you would call the /expand endpoint. Like this:

curl -s 'https://libpostal.mapzen.com/expand?address=475+Sansome+St+San+Francisco+CA' | python -mjson.tool
[
    "475 sansome saint san francisco california",
    "475 sansome saint san francisco ca",
    "475 sansome street san francisco california",
    "475 sansome street san francisco ca"
]

To explode an address string in to each of its component parts you would call the /parse endpoint. Like this:

curl -s 'https://libpostal.mapzen.com/parse?address=475+Sansome+St+San+Francisco+CA' | python -mjson.tool
[
    {
        "label": "house_number",
        "value": "475"
    },
    {
        "label": "road",
        "value": "sansome st"
    },
    {
        "label": "city",
        "value": "san francisco"
    },
    {
        "label": "state",
        "value": "ca"
    }
]

By default both Libpostal and the Libpostal API return results a list of dictionaries, each containing a label and value key. This is because there are occasions when a given key may have multiple values, for example an address that contains a cross-street.

If you would prefer to have API results returned as a simple dictionary with labels as keys and values as lists of possible strings simply append the format=keys parameter. Like this:

curl -s 'https://libpostal.mapzen.com/parse?address=475+Sansome+St+San+Francisco+CA&format=keys' | python -mjson.tool
{
    "city": [
        "san francisco"
    ],
    "house_number": [
        "475"
    ],
    "road": [
        "sansome st"
    ],
    "state": [
        "ca"
    ]
}

Tell me more

The Mapzen Libpostal API is available for testing and experimental use without the need for an API key so you can get started testing addresses right away. As with all keyless API access usage is limited so if you want to do more serious work we encourage you to sign-up for a Mapzen API key. It only takes a minute (or two) and you can get yours here.

Complete documentation for the Mapzen Libpostal API is available from: https://github.com/whosonfirst/go-whosonfirst-libpostal/blob/master/docs/index.md

The splash image for this blog post is a crop from a World War II Escape map, courtesy the Cooper Hewitt Smithsonian Design Museum.