How to Build Proximity Search into Your Application

This is the fourth part of our series How to Add Postcode-Based Proximity Search With Open Data. In part 3, we showed how to implement the common application of nearest locations to a postcode provided by a user.

So far in this series we have geocoded a dataset (that is to say, found the position of each record in the dataset), geocoded user input, and looked at the maths which allows us to find the distance between two sets of coordinates: Pythagorean Theorem.

In this article, I show you how to take a geocoded dataset and geocoded user input (a postcode), and order the dataset by proximity to the postcode. Our example application uses a dataset of pubs, hence the purpose of the application is to find the nearest pub to a location specified by the user.

Ready to put all the pieces to work? Let’s build the search function to include in the geolocation application.

Taking our example pub finder application, let’s imagine the user is planning a night out in central London. She wants to find the nearest pubs to Piccadilly Circus, which happens to have a postcode of W1J 9HP. Using the Open Postcode Geo API we can retrieve the easting and northing for that postcode:

http://api.getthedata.com/postcode/SW1Y+4QF

{
 "status": "match",
 "match_type": "unit_postcode",
 "input": "SW1Y 4QF",
 "data": {
   "postcode": "SW1Y 4QF",
   "status": "live",
   "usertype": "large",
   "easting": 529577,
   "northing": 180605,
   "positional_quality_indicator": 1,
   "country": "England",
   "latitude": "51.509525",
   "longitude": "-0.134223",
   "postcode_no_space": "SW1Y4QF",
   "postcode_fixed_width_seven": "SW1Y4QF",
   "postcode_fixed_width_eight": "SW1Y 4QF",
   "postcode_area": "SW",
   "postcode_district": "SW1Y",
   "postcode_sector": "SW1Y 4",
   "outcode": "SW1Y",
   "incode": "4QF"
 },
 "copyright": [
   "Contains OS data (c) Crown copyright and database right 2016",
   "Contains Royal Mail data (c) Royal Mail copyright and database right 2016",
   "Contains National Statistics data (c) Crown copyright and database right 2016"
 ]
}

So we now have the coordinates of our search centre:

Easting=529577
Northing=180605

In the last article, we found earlier that, using the Pythagorean Theorem, we can calculate the distance between two sets of coordinates in SQL thusly:

sqrt(pow(abs([easting_a] - [easting_b]),2) + pow(abs([northing_a] - [northing_b]),2))

Where [easting_a] and [northing_a] are one set of coordinates, and [easting_b] and [northing_b] are the other set.

To build a proximity search, we need to find the distance to each record in our dataset from the search centre, and then order the results by that distance. We can do that by including the SQL code above in a more general SQL select statement:

select sqrt(pow(abs([easting_a] - [easting_b]),2) + pow(abs([northing_a] - [northing_b]),2)) as distance from [table] order by distance

Where [table] is a table of geocoded data.

Let’s fill in values for our Piccadilly Circus search, using our example open_pubs dataset. We assume that we already geocoded this dataset with eastings and northings for each record:

select sqrt(pow(abs(529577 - easting),2) + pow(abs(180605 - northing),2)) as distance from open_pubs order by distance

Here we have set [easting_a] and [northing_a] to the coordinates of our Piccadilly Circus search centre. We replace [easting_b] and [northing_b] with the names of the easting and northing fields in our open_pubs table.

If you run this query, you should retrieve all the records in the table, with distance expressed either as null (for records which have not been geocoded) or a decimal number equal to the distance in metres.

This query makes some changes to produce a more useful set of results:

select name, round(sqrt(pow(abs(529577 - easting),2) + pow(abs(180605 - northing),2))) as distance from open_pubs where easting is not null and northing is not null order by distance limit 10

With this change, we now select name as well as distance; we round distance to the nearest metre using MySQL’s round() function; we exclude any records which have either easting=null or northing=null (which would be pointless data to share with the user, because they do not lead her to a pub); and we limit the results to 10 pubs for user experience reasons.

The result should look something like the below:

mysql> select name, round(sqrt(pow(abs(529577 - easting),2) + pow(abs(180605 - northing),2))) as distance from open_pubs where easting is not null and northing is not null order by distance limit 10;
+-------------------------------+----------+
| name                          | distance |
+-------------------------------+----------+
| Captains Cabin Public House   |       99 |
| Three Crowns Public House     |      101 |
| Apollo Theatre                |      103 |
| Jamies Italian                |      148 |
| St James' Tavern Public House |      156 |
| Tom Cribb Public House        |      160 |
| The Queen's Head              |      168 |
| Prince Of Wales Theatre       |      176 |
| Lyric Public House            |      195 |
| Waxy O'Connors                |      205 |
+-------------------------------+----------+
10 rows in set (0.08 sec)

If you didn’t geocode your dataset, but you did load the Open Postcode Geo dataset, you can achieve the same results with a SQL join:

mysql> select open_pubs.name, round(sqrt(pow(abs(529577 - open_postcode_geo.easting),2) + pow(abs(180605 - open_postcode_geo.northing),2))) as distance from open_pubs join open_postcode_geo on open_pubs.postcode = open_postcode_geo.postcode where open_postcode_geo.easting is not null and open_postcode_geo.northing is not null order by distance limit 10;
+-------------------------------+----------+
| name                          | distance |
+-------------------------------+----------+
| Captains Cabin Public House   |       99 |
| Three Crowns Public House     |      101 |
| Apollo Theatre                |      103 |
| Jamies Italian                |      148 |
| St James' Tavern Public House |      156 |
| Tom Cribb Public House        |      160 |
| The Queen's Head              |      168 |
| Prince Of Wales Theatre       |      176 |
| Lyric Public House            |      195 |
| Waxy O'Connors                |      205 |
+-------------------------------+----------+
10 rows in set (15.77 sec)

The join is faster if the common key (in this case the postcode field) is indexed in both tables.

Optimizing performance

This may be all you need, but in some applications there is a lot of data to sort through. If you need to improve the speed of the proximity search query, there are two optimisations you can make: database indexes, and adding a bounding box to your query.

Applying database indexes

If all your data is in one table, including coordinates, you can use a covering index to speed up the query. This indexes both the fields in your where clause, and the fields you select, enabling the entire query to be carried out on the index.

Dan Winchester

Comments