Analysing walks on the South West Coast Path

2018

May

26

Tags: cartopy (2), fiona (1), folium (1), shapely (2)

I'm fortunate to live near to one of the best continuous coastal walking/hiking routes in the world. The UK's South West Coast Path is a scenic, cultural and culinary delight spanning 630 miles from Minehead to Poole. It is steeped with history, and apparently was originally created by the coastguard who used it to patrol the south west peninsula looking out for smugglers.

I don't consider myself to be a "serious" hiker, but I do enjoy going for a walk and being by the sea. When I remember to do so, I also use a GPS tracker so that I can see how far/high/fast I walked. This is somewhat of a defensive strategy, as I've a bit of a reputation with friends and family for under-estimating exactly how far it is I intend to drag them along for :).

Over the last few walks I've been thinking about how much of the 630 miles I've actually done. I've also noticed that I get into a bit of a rut with regards to going to new sections of the path. To address both of these, I figured I needed to visualise the data and to do some calculations to figure out how much of it I've walked (whilst using a GPS tracker). This might spur me on to do new sections, as well as giving me some more motivation for remembering to track my walks in the future.

Working with the coast path geometry itself

First up, I need to get hold of the South West Coast Path geometry. Luckily, the excellent National Trail website has a GPX file that I can download. Let's do that.

In [1]:
from pathlib import Path
from urllib.request import urlretrieve

# From the fantastic https://nationaltrail.co.uk website.
URL = "https://www.nationaltrail.co.uk/sites/default/files/south_west_coast_path.gpx"

coast_path_gpx = Path('south_west_coast_path.gpx')

if not coast_path_gpx.exists():
    urlretrieve(URL, coast_path_gpx)

GPX in-hand, Fiona and Shapely are my go-to tools for loading and manipulating vector data like this. Luckily they don't disappoint:

In [2]:
import fiona
import shapely.geometry as sgeom

layers = fiona.listlayers(str(coast_path_gpx))
print('Layers available: {}'.format(', '.join(layers)))

with fiona.open(str(coast_path_gpx), "r", layer='tracks') as records:
    coast_path_tracks = [
        sgeom.shape(record['geometry']) for record in records]
Layers available: waypoints, routes, tracks, route_points, track_points

This has created a bunch of Shapely geometries. We really only need one mega-geometry, so let's take their union.

In [3]:
import shapely.ops

coast_path = shapely.ops.unary_union(coast_path_tracks)

Let's figure out how many coordinate points that geometry is:

In [4]:
def n_coords(geometry):
    """
    A function to figure out the number of coordinate
    points are in the given geometry.

    """
    if hasattr(geometry, 'geoms'):
        return sum(n_coords(geom) for geom in geometry.geoms)
    else:
        return len(geometry.coords)

    
print(f'N coords in the coast path geometry: { n_coords(coast_path) }')
N coords in the coast path geometry: 45066

That's quite a lot of coordinate points. We are going to want to do things like intersections and distances from this geometry, and can happily accept a reduction in fidelity for an increase in performance, so let's also generate a "simplified" version of this geometry.

In [5]:
SIMPLIFICATION_FACTOR = 0.0005

coast_path_simplified = coast_path.simplify(SIMPLIFICATION_FACTOR)

print(f'N coords in the simplified geometry: { n_coords(coast_path_simplified) }')

coast_path_simplified
N coords in the simplified geometry: 3287
Out[5]:

Let's take a look at this simplified geometry in detail. Some interactivity would be nice and I don't need any fancy projection stuff at this scale, so Folium is a good choice.

In [6]:
import folium 


def coast_path_map():
    """Setup a Folium map with the simplified coast path geometry"""
    m = folium.Map()
    
    map_set_bounds_from_geometry(m, coast_path_simplified)

    style_function = lambda x: {'color': "#808000",
                                'line_opacity': 0.5}
    m.add_child(
        folium.GeoJson(
            coast_path_simplified.__geo_interface__,
            name='South West Coast Path',
            style_function=style_function,
        )
    )
    return m


def map_set_bounds_from_geometry(m, geom):
    """
    Given a folium map an a geometry, set the map's
    bounds to show it
    
    """
    geom_bounds = geom.bounds
    # Folium is looking for [[y0, x0], [y1, x1]]
    bounds = [geom_bounds[:2][::-1],
              geom_bounds[2:][::-1]]
    m.fit_bounds(bounds)


coast_path_map()
Out[6]:

Even at this level of simplification, it is clear that there is enough detail allow us to continue the analysis.

Calculating the distance of the coast path

Earlier, I quoted a distance of 630 miles for the coast path, but how accurate is that in reality? Geodetic distances can be well approximated with Vincenty's formula, but even better accuracy can be achieved with Charles Karney's algorithm and its associated implementation found in geographiclib. This is now a standard part of the proj.4 library, and can be used with tools such as pyproj and cartopy. Here I'll use Cartopy, but this isn't a documented interface, and I think the API is likely to be integrated with Cartopy's Globe in the near future:

In [7]:
from cartopy.geodesic import Geodesic
import numpy as np

geod = Geodesic()

def distance(pt1, pt2):
    """
    Return the distance, in meters, between the given points.
    Note, this function is vectorized, and can accept multiple points.

    """
    result = np.array(
        geod.inverse(np.asanyarray(pt1),
                     np.asanyarray(pt2)))
    # Inverse returns [[distance, azim1, azim2], ...].
    # We just want to return the distance
    return result[:, 0]

Let's take that function for a spin with some random locations:

In [8]:
result = distance([40.6, -73.8], [49.1, 2.5])
print(f'Geodetic distance between two random points: ~{ result[0]//1000 }km')
Geodetic distance between two random points: ~8489.0km

The coast path geometry is made up of a bunch of individual geometries, so let's work out their constituent distances:

In [9]:
def linestring_distance(geom):
    if hasattr(geom, 'geoms'):
        return sum(linestring_distance(subgeom) for subgeom in geom.geoms)
    else:
        points = np.array(geom.coords)
        return distance(points[:-1, :2],
                        points[1:, :2]).sum()

    
coast_path_distance_km = linestring_distance(coast_path) // 1000

print(f'Coast path distance {coast_path_distance_km} km '
      f'({0.621371 * coast_path_distance_km//1} miles)')
Coast path distance 1073.0 km (666.0 miles)

Great to see that my calculation is so close to the advertised distance for the coast path!

Getting hold of my recorded tracks

Even though it is now deprecated, I've used Google's "My Tracks" since the beginning of time (or at least, since I got my first Android device). It has/had this really neat feature of backing up my tracks to my Google Drive. I also automatically sync my Google Drive with my laptop, so I don't need to do any work to get my tracks as they are already on my machine in KMZ format.

Whilst I paint a rosy picture of the backup mechanics, I have had some difficulties syncing from "My Tracks" to my Google Drive since it was deprecated, and suspect it may have stopped working altogether. In that situation, I simply used "My Tracks" export all functionality on my phone, and put that into my Google Drive directory.

Let's try opening this with Fiona:

In [10]:
my_tracks_dir = Path('/Users/pelson/Documents/GDrive pelson/My Tracks')

example_file = str(my_tracks_dir / 'Newborough.kmz')

with fiona.open(example_file, "r") as records:
    track = [
        sgeom.shape(record['geometry']) for record in records]
---------------------------------------------------------------------------
DriverError                               Traceback (most recent call last)
<ipython-input-10-d510555a9088> in <module>()
      3 example_file = str(my_tracks_dir / 'Newborough.kmz')
      4 
----> 5 with fiona.open(example_file, "r") as records:
      6     track = [
      7         sgeom.shape(record['geometry']) for record in records]

~/miniconda/envs/gis/lib/python3.6/site-packages/fiona/__init__.py in open(path, mode, driver, schema, crs, encoding, layer, vfs, enabled_drivers, crs_wkt)
    163         c = Collection(path, mode, driver=driver, encoding=encoding,
    164                        layer=layer, vsi=vsi, archive=archive,
--> 165                        enabled_drivers=enabled_drivers)
    166     elif mode == 'w':
    167         if schema:

~/miniconda/envs/gis/lib/python3.6/site-packages/fiona/collection.py in __init__(self, path, mode, driver, schema, crs, encoding, layer, vsi, archive, enabled_drivers, crs_wkt, **kwargs)
    160 
    161         if self.session is not None:
--> 162             self.guard_driver_mode()
    163             if not self.encoding:
    164                 self.encoding = self.session.get_fileencoding().lower()

~/miniconda/envs/gis/lib/python3.6/site-packages/fiona/collection.py in guard_driver_mode(self)
    174         driver = self.session.get_driver()
    175         if driver not in supported_drivers:
--> 176             raise DriverError("unsupported driver: %r" % driver)
    177         if self.mode not in supported_drivers[driver]:
    178             raise DriverError("unsupported mode: %r" % self.mode)

DriverError: unsupported driver: 'LIBKML'

Oh dear. KML&KMZ are not supported by Fiona it seems. :(

How about something like fastkml? From a brief inspection, it looks like fastkml wants to deal with strings, so we need to unzip the kmz file and read the contents of doc.kml ourselves.

In [11]:
from fastkml import kml
from zipfile import ZipFile

with ZipFile(example_file, 'r') as kmz:
    with kmz.open('doc.kml', 'r') as  kml_fh:
        kml_doc = kml_fh.read()

tracks_kml = kml.KML()
tracks_kml.from_string(kml_doc)
No geometries found

Turns out this KML isn't understood by fastkml either.

Perhaps I shouldn't have been so quick to dismiss Fiona here. There is some secret-source to enable the LIBKML driver within Fiona:

In [12]:
# There was a helpful hint for geopandas to enable the functionality within fiona:
fiona.drvsupport.supported_drivers['LIBKML'] = 'r'
In [13]:
with fiona.open(example_file, "r") as records:
    geometries = [
        sgeom.shape(record['geometry']) for record in records]

This gives me back 3 geometries: the start point, the track, and the end point. I think the start and end are redundant, as they are also in the track itself, so let's take a look at the middle one of these geometries:

In [14]:
geometries[1]
Out[14]:

And looking at this geometry on a Folium map:

In [15]:
m = coast_path_map()

m.add_child(
    folium.GeoJson(
        geometries[1].__geo_interface__,
        name='Example track'
    )
)
map_set_bounds_from_geometry(m, geometries[1].buffer(0.01))

m
Out[15]:

Looks great! Unfortunately the example file I've chosen isn't actually on the south west coast path - it is a lovely walk that I did up in North Wales a few years ago to an island associated with Wales' patron saint of lovers, Saint Dwynwen.

Let's go through all of my walks, and just pull out the ones that are somewhat close to the coast path. I'll also drop KMZ that don't look like the one I've already processed, and get rid of those that are unrealistically long (>0.6 degrees):

In [16]:
tracks_kmz = sorted(my_tracks_dir.glob('*.kmz'))

records = {}
for track_kmz in tracks_kmz:
    with fiona.open(str(track_kmz), "r") as kml_records:
        try:
            start, record, end = kml_records
        except ValueError:
            # Not the My Tracks format we were expecting...
            print('Dropping (due to error): {}'.format(track_kmz.parts[-1]))
            continue

        name = record['properties'].get('Name', str(track_kmz)).strip()
        geom = sgeom.shape(record['geometry'])
        
        record['geometry'] = geom

        distance_from_path = coast_path.distance(sgeom.shape(start['geometry']))
        
        # If the distance is less than a small number of degrees, let's include it
        if distance_from_path < 0.6 and geom.length < 0.8:
            records[name] = record
        else:
            print('Dropping (due to distance): {}'.format(track_kmz.parts[-1]))

            
print(f'\nFound {len(records)} walks')
Dropping (due to distance): 12_11_2017 10_31.kmz
Dropping (due to error): 25_07_2015 09:30.kmz
Dropping (due to error): 27_09_2014 14:58.kmz
Dropping (due to distance): 29_07_2014 14:14.kmz
Dropping (due to distance): Aber Falls.kmz
Dropping (due to distance): Adelaide Run.kmz
Dropping (due to distance): Amalfi Coast.kmz
Dropping (due to distance): Boar's Hill.kmz
Dropping (due to distance): Bryher.kmz
Dropping (due to error): Budleigh To Sidmouth .kmz
Dropping (due to distance): Geirionydd .kmz
Dropping (due to distance): Half Scilly.kmz
Dropping (due to distance): Koala Walk.kmz
Dropping (due to distance): Newborough.kmz
Dropping (due to distance): Obelisk.kmz
Dropping (due to distance): Sault .kmz
Dropping (due to distance): Snowdon.kmz
Dropping (due to distance): St Mary's.kmz
Dropping (due to distance): Tresco.kmz
Dropping (due to distance): Un Mapped Coast Path.kmz
Dropping (due to distance): Unnamed Road.kmz
Dropping (due to distance): Via Pennino.kmz
Dropping (due to distance): Vicar Of Dibley .kmz

Found 45 walks

In the interests of being a little bit discrete about exactly where I live in the South West of England, I'll mask out a few locations from the tracks:

In [17]:
filter_locations = [sgeom.Point(-3.4792, 50.7179).buffer(0.025),
                    sgeom.Point(-3.4910, 50.7020).buffer(0.025),
                    sgeom.Point(-3.3333, 50.6333).buffer(0.015)]

filter_locations = shapely.ops.unary_union(filter_locations)

# Update tracks to remove those locations, and add a simplified geometry to the record
for record in records:
    geom = records[record]['geometry'].difference(filter_locations)
    records[record]['geometry'] = geom
    records[record]['simplified_geometry'] = geom.simplify(SIMPLIFICATION_FACTOR)

Let's take a look at these tracks:

In [18]:
def add_records(m, records):
    """
    Draw the simplified geometries of the given records.

    """
    for name, record in records.items():
        m.add_child(
            folium.GeoJson(
                record['simplified_geometry'].__geo_interface__,
                name='Example track'
            )
        )
    return m

m = coast_path_map()
add_records(m, records)
Out[18]:

That is a lovely record of some of the walks I've done in the South West, but some of those are definitely not coastal.

Let's just get the tracks that are really on the coast-path:

In [19]:
coast_path_records = {
    name: record for name, record in records.items()
    if coast_path_simplified.distance(record['simplified_geometry']) < 0.01}

print(f'Number of tracks: { len(records) }')
print(f'Number of coast path tracks: { len(coast_path_records) }')
Number of tracks: 45
Number of coast path tracks: 22

Let's take a look at those coast path tracks:

In [20]:
m = coast_path_map()
add_records(m, coast_path_records)
m
Out[20]:

There is clearly at least one geometry that could still be tidied up (I suspect I left the tracker on whilst driving home!), but I can still use what I have to answer the question "How much of the coast path have I walked/tracked?".

First, let's combine together all of those walks into a single geometry.

In [21]:
coast_path_walks_union = shapely.ops.unary_union(
    [record['simplified_geometry'] for record in coast_path_records.values()])

Next, let's buffer around the coast path walks (by 0.001 degrees). This is going to be fairly slow:

In [22]:
coast_path_walks_buffered = coast_path_walks_union.buffer(0.001)

Finally, take the buffered region away from the coast path track.

Taking the difference from the coast path line-string ensures that the result result remains a line-string. The alternative approach of taking a union or intersection would not have this effect, and would result in a polygon instead.

In [23]:
unwalked_coast_path = coast_path.difference(coast_path_walks_buffered)

# The walked section is therefore the coast path minus the unwalked coast path:
walked_coast_path = coast_path.difference(unwalked_coast_path)

Let's look at how much coast path there is left to walk:

In [24]:
remaining_km = linestring_distance(unwalked_coast_path) // 1000

print(f'Distance of South West Coast Path unwalked: { remaining_km }km')
Distance of South West Coast Path unwalked: 954.0km

It turns out that there is quite a lot of coast path left for me to do...

In [25]:
full_length_km = linestring_distance(coast_path) // 1000
completed_km = full_length_km - remaining_km

print(f'Total coast path distance: { full_length_km } km')
print(f'Distance walked: { completed_km } km')
print(f'Percentage of coast path complete: { np.round(completed_km / full_length_km * 100) }%')
Total coast path distance: 1073.0 km
Distance walked: 119.0 km
Percentage of coast path complete: 11.0%

Finally, let's look at those sections that I have done:

In [26]:
m = folium.Map()
map_set_bounds_from_geometry(m, coast_path_simplified)

m.add_child(
    folium.GeoJson(
        walked_coast_path.__geo_interface__,
        name='Example track'
    )
)
print("Sections of the South West Coast Path I've walked (and tracked via GPS):")
m
Sections of the South West Coast Path I've walked (and tracked via GPS):
Out[26]:

Conclusion

I got hold of some GPX data from the UK's National Trails website, and combined this with my own GPS logger data (KMZ/KML via My Tracks) to compute how much of the South West Coast Path I've walked (and tracked).

I was able to use OGR/Fiona to read in the GPS logger data (in KML/KMZ), but it wasn't able to keep track of the timestamp of each coordinate. For that, I would need to roll my own KML parser. This will be necessary if I want a more sophisticated means of filtering those tracks that include both walking and driving (i.e. when I forget to turn off the tracker). I will want to do this in the future in order to answer my remaining question "How far have I walked in total whilst walking the South West Coast Path?", but I will have to keep that for a future blog post.

Folium was a great tool to make use of for visualising this data. My day job often deals with geometries on a global scale, at which point one really needs Cartopy's clever treatment of the antimeridian and poles, but when the problem lies in a space that Cartesian coordinates are appropriate, the existing tools (Shapely, Fiona, GeoJSON, Folium) really do shine.

So it turns out that I've walked just 11%, or 120km, of the South West Coast Path with my GPS tracker on. I've probably completed quite a significant amount more, but clearly failed to track it. Perhaps my next step should be to dig into my Google location data to see how much detail they have on me. Either that, or maybe I should just go for a walk...

A typical coast path sight