How to Geocode Data for Location-Friendly Applications

This is the second part of our series How to Add Postcode-Based Proximity Search With Open Data. In part 1, we explained how to geocode a dataset using open data.

Previously in this series, I showed how to set up databases for a location-based application.

In this article, I explain three ways to geocode your data using Open Postcode Geo. This involves finding coordinates for each record pinpointing that record’s position. Once we know a record’s position, we can work out the distance between that record and another position, such as our current location. In this article, I show you how to find the position of each pub in a dataset of pubs; our application uses this dataset to find the nearest pub to a position supplied by the user.

The three geocoding methods are:

  • Using a SQL join
  • Using dedicated fields
  • Using the Open Postcode API

Each has pros and cons.

Geocoding with a SQL join

This is a great method if:

  • You have the Open Postcode Geo dataset available as a table within in your database.
  • You expect your own data to get regularly updates and you don’t want to geocode it each time that happens.

There are better methods if:

  • You need the very best performance.
  • You cannot install the Open Postcode Geo dataset locally.
  • You need up-to-date data, but do not want the overhead of updating your local Open Postcode Geo dataset from time to time.

This method works by joining your dataset table on the open_postcode_geo table using postcode as the common key. This join creates a dataset containing both your existing data and the coordinates for each record from the open_postcode_geo table. Here’s an example using the sample pubs dataset we loaded earlier:

mysql> select open_pubs.name, open_postcode_geo.easting, open_postcode_geo.northing from open_pubs join open_postcode_geo on open_pubs.postcode = open_postcode_geo.postcode limit 10;

+----------------------+---------+----------+
| name                 | easting | northing |
+----------------------+---------+----------+
| Anchor Inn           |  604749 |   234404 |
| Angel Inn            |  582889 |   247367 |
| Black Boy Hotel      |  587356 |   241327 |
| Black Horse          |  604271 |   233919 |
| Black Lion           |  582751 |   248296 |
| Bristol Arms         |  624667 |   233743 |
| Butt and Oyster Inn  |  620644 |   237973 |
| Carriers Arms        |  607464 |   235395 |
| Chenery Catering Ltd |  612239 |   234480 |
| Clubhouse            |  609842 |   235185 |
+----------------------+---------+----------+
10 rows in set (0.00 sec)

Open Postcode Geo allows you to join on postcodes in a variety of different formats. For fast joining, ensure that the common postcode key is indexed in both tables.

Geocoding dedicated fields with a SQL join

Use this method to add location information to your database when you want to store the easting and northing within your main dataset. For example, it is a good idea if you are creating a derived static dataset containing permanently geocoded addresses. It’s not the best path to take if postcodes in your data change from time to time, as you are storing the coordinates within your dataset and you will therefore need some process to update them should the postcode change.

First, create easting and northing fields in your main dataset table. Let’s do this for our open_pubs table:

mysql> alter table open_pubs add easting mediumint;
Query OK, 54368 rows affected (2.63 sec)

mysql> alter table open_pubs add northing mediumint;
Query OK, 54368 rows affected (2.79 sec)

Then set open_pubs.easting=open_postcode_geo.easting and open_pubs.northing=open_postcode_geo.northing using a SQL join with postcode as the common key:

mysql> update open_pubs join open_postcode_geo on open_pubs.postcode = open_postcode_geo.postcode set open_pubs.easting = open_postcode_geo.easting, open_pubs.northing = open_postcode_geo.northing;
Query OK, 51838 rows affected (20.12 sec)

You may have noticed that not all rows were affected: 51,838 of 54,368. This is because not every pub in the sample dataset has a valid postcodes.

You should now have a single table containing the geocoded data:

mysql> select name, easting, northing from open_pubs limit 10;
+----------------------+---------+----------+
| name                 | easting | northing |
+----------------------+---------+----------+
| Anchor Inn           |  604749 |   234404 |
| Angel Inn            |  582889 |   247367 |
| Black Boy Hotel      |  587356 |   241327 |
| Black Horse          |  604271 |   233919 |
| Black Lion           |  582751 |   248296 |
| Bristol Arms         |  624667 |   233743 |
| Butt and Oyster Inn  |  620644 |   237973 |
| Carriers Arms        |  607464 |   235395 |
| Chenery Catering Ltd |  612239 |   234480 |
| Clubhouse            |  609842 |   235185 |
+----------------------+---------+----------+
10 rows in set (0.00 sec)

For fast joining, ensure that the common postcode key is indexed in both tables.

Geocoding using the Open Postcode Geo API

The API is a great option if you don’t want to keep a local copy of the Open Postcode Geo dataset up to date within your application. For example, imagine a situation wherein an application’s addresses are user generated, and each time an address is submitted it needs to be geocoded with current data.

Another strength of using the API is it takes postcodes in any standard format. As a result, you do not need to worry about the formats of the postcode data with which you’re supplied.

It’s less recommended as an option if you need the best possible realtime performance without the overhead of an API lookup, or if your application needs to work on a device with unpredictable connectivity.

To use the API, you make a request to the API over HTTP with the postcode in the URI:

http://api.getthedata.com/postcode/[postcode]

For example:

http://api.getthedata.com/postcode/SW1A+1AA

If you click the above example you see the API response, which is returned in JSON:

{
 "status": "match",
 "match_type": "unit_postcode",
 "input": "SW1A 1AA",
 "data": {
   "postcode": "SW1A 1AA",
   "status": "live",
   "usertype": "large",
   "easting": 529090,
   "northing": 179645,
   "positional_quality_indicator": 1,
   "country": "England",
   "latitude": "51.501009",
   "longitude": "-0.141588",
   "postcode_no_space": "SW1A1AA",
   "postcode_fixed_width_seven": "SW1A1AA",
   "postcode_fixed_width_eight": "SW1A 1AA",
   "postcode_area": "SW",
   "postcode_district": "SW1A",
   "postcode_sector": "SW1A 1",
   "outcode": "SW1A",
   "incode": "1AA"
 },
 "copyright": [
   "Contains OS data (c) Crown copyright and database right 2016",
   "Contains Royal Mail data (c) Royal Mail copyright and database right 2016",
   "Contains National Statistics data (c) Crown copyright and database right 2016"
 ]
}

The important fields for our purposes are:

  • tatus: This is either “match” or “no_match”. Where status=no_match your input failed to match a postcode, postcode sector, postcode district, or postcode area, and error is provided.
  • match_type: This should be “unit_postcode” unless you are trying to match a part-postcode, for example an outcode.
  • data > easting: The easting of the postcode.
  • data > northing: The northing of the postcode.

Here is a simple PHP example demonstrating how you might call the API:

// Set $postcode to your postcode.
$postcode = 'SW1A 1AA';

// This is the Open Postcode Geo API endpoint.
$endpoint = 'http://api.getthedata.com/postcode/';

// $postcode should be URL encoded.
$url = $endpoint . urlencode($postcode);

// This fetches the API response and loads it into $response as a string of JSON.
$response = file_get_contents($url) or die('Cannot fetch ' . $url);

// Decode the JSON into an associative array.
$arr = json_decode($response, true) or die('Cannot decode ' . $response);

// Check status and match to ensure we have a suitable response for geocoding.
if($arr['status'] == 'match' and $arr['match_type'] == 'unit_postcode'){
   print 'input:' . $postcode . "\n";
   print 'postcode:' . $arr['data']['postcode'] . "\n";
   print 'easting:' . $arr['data']['easting'] . "\n";
   print 'northing:' . $arr['data']['northing'] . "\n";
}
else {
   die('Input "' . $postcode . '" does not match valid postcode');
}

If you run this code using PHP’s command line interface, you should see the output:

input:SW1A1AA
postcode:SW1A 1AA
easting:529090
northing:179645

To run the code on a webserver, wrap it in HTML <pre></pre> tags to preserve the formatting.

Stepping through the code:

$postcode = 'SW1A1AA';
$endpoint = 'http://api.getthedata.com/postcode/';
$url = $endpoint . urlencode($postcode);

Set $postcode to your postcode. Any common postcode format is acceptable.

The postcode must be URL encoded before being included in the URL. In practice, this means replacing any [space] characters with + signs.

$response = file_get_contents($url) or die('Cannot fetch ' . $url);

This line fetches the response from the API. If you need more control or error handling, you could use PHP’s cURL functions instead of file_get_contents().

$arr = json_decode($response, true) or die('Cannot decode ' . $response);

This line takes the text of the JSON response and loads it into an associative array.

Dan Winchester

Comments