Plotting Tour de France stages onto a single map
Approximately in October the new route of the tour de France will be announced. In order to spark some excitement explore past editions a bit more, and plot all stages of past editions onto a single map.
If you want to reproduce this post, you have to perform the following steps:
renv::restore()
targets::tar_make()
In this post the following libraries are used:
library(rnaturalearthdata)
library(rnaturalearth)
library(tarchetypes)
library(conflicted)
library(reactable)
library(tidyverse)
library(leaflet)
library(targets)
library(assertr)
library(janitor)
library(polite)
library(rvest)
library(fs)
library(sf)
conflicts_prefer(dplyr::filter)
conflicts_prefer(dplyr::lag)
Define the website (Lustig, Eppinga, and Fox (2018)) where the data is available from:
base_url <- "https://www.cyclingstage.com/"
Define paths to editions:
editions <- c("tour-de-france-2022-gpx", "tour-de-france-2023-gpx",
"tour-de-france-2024-gpx", "tour-de-france-2025-route")
In the following section, get all links to so called ‘.gpx’ files. These file represent the geographic data of the routes from the different editions. To do this perform the following steps:
gpx_paths <- function(base_url, editions, links_css) {
host <- bow(base_url)
sessions <- map(editions, \(x) nod(host, x))
overview <- map(
sessions,
\(x) scrape(x, content = "text/html; charset=iso-8859-1", verbose = TRUE))
link_nodes <- map(overview, \(x) html_elements(x, links_css))
gpx_paths <- map(
link_nodes, \(x) html_attr(x[html_text2(x) == "GPX"], "href"))
tibble(url = map_chr(sessions, "url")) |>
mutate(
gpx_path = gpx_paths, edition = str_extract(url, "\\d+"),
.keep = "none") |>
unnest(gpx_path)
}
Define the CSS expression to identify relevant links. This expression was determined using techniques described in this vignette.
links_css <- ".data a"
Execute the above described function and make sure that every resulting path is unique (Fischetti (2023)):
# A tibble: 66 × 2
gpx_path edition
<chr> <chr>
1 https://cdn.cyclingstage.com/images/tour-de-france/2022/st… 2022
2 https://cdn.cyclingstage.com/images/tour-de-france/2022/st… 2022
3 https://cdn.cyclingstage.com/images/tour-de-france/2022/st… 2022
4 https://cdn.cyclingstage.com/images/tour-de-france/2022/st… 2022
5 https://cdn.cyclingstage.com/images/tour-de-france/2022/st… 2022
6 https://cdn.cyclingstage.com/images/tour-de-france/2022/st… 2022
7 https://cdn.cyclingstage.com/images/tour-de-france/2022/st… 2022
8 https://cdn.cyclingstage.com/images/tour-de-france/2022/st… 2022
9 https://cdn.cyclingstage.com/images/tour-de-france/2022/st… 2022
10 https://cdn.cyclingstage.com/images/tour-de-france/2022/st… 2022
# ℹ 56 more rows
Now all the links to the relevant files are available.
The next step is to read in all the files and determine relevant elements from these files. To do this perform the following steps:
track_points <- function(base_url, df_gpx_paths, track_point_css) {
host <- bow(base_url)
sessions <- map(pull(df_gpx_paths, gpx_path), \(x) nod(host, x))
gpx_html <- map(
sessions,
\(x) scrape(x, content = "text/html; charset=iso-8859-1", verbose = TRUE))
track_points <- map(gpx_html, \(x) html_elements(x, track_point_css))
list_track_points <- map(
track_points, \(x) tibble(
lat = html_attr(x, "lat"),
lon = html_attr(x, "lon"),
elevation = html_text(x)))
df_track_points <- df_gpx_paths |>
mutate(df_track_points = list_track_points) |>
unnest(df_track_points) |>
mutate(across(c(lat, lon, elevation), \(x) parse_number(x)))
}
Before we call the function define the CSS that will select the right elements. Again this was determined using the SelectorGadget tool (described in detail here):
track_point_css <- "trkpt"
Insert the base url, the links to the gpx files and the mentioned CSS selector into the function and call it:
df_track_points <- track_points(base_url, df_gpx_paths, track_point_css)
# A tibble: 737,011 × 5
gpx_path edition lat lon elevation
<chr> <chr> <dbl> <dbl> <dbl>
1 https://cdn.cyclingstage.com/images/… 2022 55.7 12.6 7
2 https://cdn.cyclingstage.com/images/… 2022 55.7 12.6 7
3 https://cdn.cyclingstage.com/images/… 2022 55.7 12.6 7
4 https://cdn.cyclingstage.com/images/… 2022 55.7 12.6 7
5 https://cdn.cyclingstage.com/images/… 2022 55.7 12.6 7
6 https://cdn.cyclingstage.com/images/… 2022 55.7 12.6 7
7 https://cdn.cyclingstage.com/images/… 2022 55.7 12.6 7
8 https://cdn.cyclingstage.com/images/… 2022 55.7 12.6 7
9 https://cdn.cyclingstage.com/images/… 2022 55.7 12.6 7
10 https://cdn.cyclingstage.com/images/… 2022 55.7 12.6 7
# ℹ 737,001 more rows
To ease the use of spatial data, turn the data frame into a ‘sf’ (Pebesma (2018)) object:
stages_sf <- function(df_track_points) {
sf_points <- df_track_points |>
filter(!is.na(elevation)) |>
st_as_sf(coords = c("lon", "lat"), crs = st_crs(4326))
sf_multipoints <- sf_points |>
group_by(gpx_path, edition) |>
mutate(
distance_delta = as.numeric(st_distance(
geometry, lag(geometry), by_element = TRUE))) |>
mutate(
distance = cumsum(if_else(is.na(distance_delta), 0, distance_delta)),
.keep = "unused") |>
summarise(
geometry = st_combine(geometry),
elev_profile = list(tibble(distance = distance, elevation = elevation)),
.groups = "drop")
st_cast(sf_multipoints, "LINESTRING")
}
If we apply the function, we get an data frame with one row per stage and a geometry list column, that represents the spatial data of that stage:
sf_tdf_stages <- stages_sf(df_track_points)
Simple feature collection with 66 features and 3 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -3.01601 ymin: 42.72618 xmax: 12.6 ymax: 55.85376
Geodetic CRS: WGS 84
# A tibble: 66 × 4
gpx_path edition elev_profile geometry
<chr> <chr> <list> <LINESTRING [°]>
1 https://cdn.cycling… 2022 <tibble> (12.56353 55.6796, 12.56…
2 https://cdn.cycling… 2022 <tibble> (6.67252 46.21029, 6.672…
3 https://cdn.cycling… 2022 <tibble> (6.34834 45.63295, 6.348…
4 https://cdn.cycling… 2022 <tibble> (6.62028 44.91842, 6.620…
5 https://cdn.cycling… 2022 <tibble> (6.00916 45.10007, 6.008…
6 https://cdn.cycling… 2022 <tibble> (4.30587 45.43429, 4.305…
7 https://cdn.cycling… 2022 <tibble> (2.55865 44.32343, 2.558…
8 https://cdn.cycling… 2022 <tibble> (2.3313 43.18762, 2.3312…
9 https://cdn.cycling… 2022 <tibble> (0.69061 43.09445, 0.690…
10 https://cdn.cycling… 2022 <tibble> (-0.04125 43.07966, -0.0…
# ℹ 56 more rows
Before we plot the data download the outlines of countries with the help of the ‘rnaturalearth’ (Massicotte and South (2023)) package.
world <- ne_countries(scale = "medium", returnclass = "sf")
Combine the stage data with the country outline data and plot the result using ggplot techniques:
vis_tdf_stages <- function(sf_tdf_stages, world) {
tdf_bbox <- st_bbox(sf_tdf_stages)
ggplot(data = world) +
geom_sf() +
geom_sf(data = sf_tdf_stages, mapping = aes(color = edition)) +
coord_sf(
xlim = c(tdf_bbox[["xmin"]], tdf_bbox[["xmax"]]),
ylim = c(tdf_bbox[["ymin"]], tdf_bbox[["ymax"]])) +
theme_minimal() +
labs(
title = str_glue(
"TdF Stages {str_flatten(unique(sf_tdf_stages$edition), ",
"', ', ' and ')}"),
color = "Edition",
subtitle = "Color indicates Year") +
theme(legend.position = "bottom")
}
gg_tdf_stages <- vis_tdf_stages(sf_tdf_stages, world)
We downloaded the spatial data of stages from multiple editions of the tour the
France. By transforming the data into a sf
object, we could use known
techniques to analyze and visualise the data.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/duju211/tour_de_france_map, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
During (2024, Oct. 15). Datannery: Tour de France Route Analysis. Retrieved from https://www.datannery.com/posts/tour-de-france-route-analysis/
BibTeX citation
@misc{during2024tour, author = {During, Julian}, title = {Datannery: Tour de France Route Analysis}, url = {https://www.datannery.com/posts/tour-de-france-route-analysis/}, year = {2024} }