Validating OpenStreetMap for detecting cycling-infrastructure change: A Barcelona pilot using Google Street View (2015–2023)
Cities are expanding their cycling networks rapidly, yet longitudinal data describing how these networks change remain limited. Volunteered Geographic Information, particularly OpenStreetMap (OSM), offers time-stamped spatial data on urban infrastructure, but its temporal reliability for detecting cycling-infrastructure change is uncertain. This study evaluates how well OSM identifies additions and removals of cycling infrastructure in Barcelona between 2015 and 2023, using Google Street View (GSV) as ground truth.
Baseline and follow-up cycling networks were derived from dated OSM snapshots representing 2016 and 2024, and geometric differencing was used to detect apparent additions and removals. Barcelona’s 1,063 census tracts were stratified by population density and centrality into nine groups, from which six tracts per stratum were selected (54 in total). Within these tracts, candidate additions, removals and non-cycling controls were sampled, yielding 105 locations, of which 96 had usable GSV imagery. Two trained coders independently classified each site to identify true additions and removals, false positives and false negatives.
OSM indicates an 88% increase in cycling-infrastructure length, with most additions located in denser and more central areas. Validation shows that OSM captures all observed additions and removals in our sample (recall = 1.00) but includes several false positives, especially for removals. Precision is about 0.73 for additions and 0.29 for removals, producing an overall precision of 0.67 and an F1 score of 0.80.
The study presents a reproducible workflow for assessing temporal OSM data and supports broader efforts to evaluate the validity of volunteered geographic information for urban and transport research.
OpenStreetMap (OSM), Cycling infrastructure, Active travel, Built environment change, Data validation, Reproducible workflows, Barcelona
Introduction
In recent years, many cities worldwide have expanded their cycling networks in pursuit of cleaner, healthier and more equitable mobility (Szell et al. 2022; Buehler and Pucher 2021). Reliable data on how these networks evolve are essential to guide fair and evidence-based planning and to enable robust longitudinal research on their impacts. Yet longitudinal information is often missing: official inventories are rarely maintained consistently, while local field audits, though accurate, are costly and difficult to reproduce at scale. As a result, even in cities with substantial cycling expansion, consistent and longitudinally comparable data on infrastructure change are often limited and fragmented.
The growing availability of Volunteered Geographic Information (VGI) (Goodchild 2007) offers new opportunities to address these data gaps. Among such sources, OpenStreetMap (OSM) stands out for providing open, editable and time-stamped spatial data on a wide range of urban features. In principle, OSM could enable the reconstruction of past infrastructure networks and support empirical studies of built-environment transformations.
However, the use of OSM for longitudinal analysis faces several challenges. Previous research has shown that not all edits correspond to physical change (false positives), some genuine changes may remain unmapped (false negatives), and even valid updates are not always recorded at the time they occur (Barron, Neis, and Zipf 2014; Ferster et al. 2020; Vierø, Vybornova, and Szell 2025). These uncertainties raise questions about whether OSM’s temporal record can be trusted to detect infrastructure additions and removals.
This study addresses these questions by evaluating the reliability of OSM for detecting cycling-infrastructure change in Barcelona between 2015 and 2023, using Google Street View (GSV) imagery as ground truth. It assesses OSM’s ability to identify added and removed cycle lanes and introduces a simple calibration framework to adjust OSM-based estimates to real-world conditions. The work forms part of the ATRAPA project (The Active Travel Backlash Paradox), which studies how people perceive and respond to built-environment sustainable-travel interventions across European cities (GEMOTT Research Group 2025).
Data and methods
The analysis followed a five-step reproducible workflow (Figure 1) with two main components: temporal differencing of dated OSM extracts to detect additions and removals of cycling infrastructure, and stratified GSV validation to check a sample of these detected changes. All stages were implemented in R using open-source packages, including osmextract (Gilardi et al. 2025) to retrieve dated OSM snapshots.
OSM temporal differencing
We constructed baseline and follow-up cycling networks from OSM extracts dated 1 January 2016 and 1 January 2024, approximating conditions in 2015 and 2023. From each extract, road segments were classified as cycling infrastructure (CI) or non-cycling (NONCI) based on their OSM tags.
A segment was classified as CI if any of the following conditions were met.
highway = cyclewayAny of the tags
cycleway,cycleway:left,cycleway:right, orcycleway:bothhad a value in {lane,track,opposite_lane,opposite_track,separate}bicycle_road = yeshighwaywas in {service,unclassified} andbicyclewas in {yes,designated} andmotor_vehicle = no
A segment was classified as NONCI if it did not meet any of the CI criteria above and its highway tag was in {primary, secondary, tertiary, unclassified, residential, primary_link, secondary_link, tertiary_link, living_street, pedestrian}.
After cleaning and projecting the networks, the 2015 and 2023 layers were compared geometrically. Segments present only in 2023 were classified as additions and those present only in 2015 were classified as removals. Cases where an apparent addition lay immediately adjacent to a corresponding removal were excluded, since these typically reflected minor positional realignments in OSM rather than actual physical change.
Stratified GSV validation
To evaluate the accuracy of OSM-detected cycling-infrastructure changes, we implemented a stratified GSV validation across Barcelona’s 1,063 census tracts (2015; Figure 2 a). Population density was calculated as the 2022 resident population per square kilometre for each tract, using official census counts and tract area. Centrality was expressed as straight-line distance from the tract centroid to Plaça Catalunya, treated as the city centre. Both variables were divided into terciles, ranked from 1 (lowest density / most peripheral) to 3 (highest density / most central). Combining density and centrality terciles yielded nine density–centrality strata labelled D1_C1 to D3_C3, where Di indicates density tercile i and Cj centrality tercile j. From each stratum, six tracts were randomly selected (54 in total).
Within each sampled tract, we drew up to two OSM-detected additions, two removals and one non-cycling control, with selection probabilities proportional to segment length (Figure 2 b). The midpoint of each selected segment served as the validation location and was linked to the nearest available GSV panorama. An interactive version of the validation-points map, including clickable links to the corresponding GSV panoramas for every sampled location, is provided in Supplement S1.
Each point was independently coded by two trained coders following a standardised protocol. Coders examined the GSV panoramas for two periods corresponding to the OSM reference dates: a follow-up around 1 January 2024 and a baseline around 1 January 2016. Because GSV does not provide panoramas for the exact OSM reference dates, we used the closest available imagery surrounding each period. To avoid coder judgement about temporal proximity, coders inspected a predefined set of allowable years—2024 and 2023 for the follow-up and 2016 and 2015 for the baseline. For each of these years, they recorded whether cycling infrastructure was present (1/0/blank) and the month number displayed in GSV.
The year hierarchy was fixed: coders attempted the main years first, and if neither was interpretable, they reviewed a fallback year (2022 for follow-up, 2014 for baseline). If none of the allowed years yielded a usable view, the period was coded as missing.
Following coding, the spreadsheet automatically selected, for each period, the GSV month closest to the relevant OSM reference date. These closest-in-time observations formed the adjudicated baseline and follow-up values used to classify OSM-detected additions, removals and non-cycling controls.
Intercoder agreement was high (XXX %), with XXX of XXX cases requiring reconciliation.
Using these adjudicated values, each OSM-detected event was categorised as an addition (ADD), removal (REMOVE) or non-cycling control (NONCI):
ADD (OSM reports addition)
True Positive (TP): verifiable 0→1 transition
False Positive (FP): any other interpretable pattern
NA: if either period lacked a usable view
REMOVE (OSM reports removal)
TP: verifiable 1→0 transition
FP: otherwise
NA: if either period lacked a usable view
NONCI (OSM reports no change)
- Used to identify False Negatives (FN): genuine 0→1 or 1→0 transitions visible in GSV but not detected in OSM.
Examples of TP additions and removals are shown in Figure 3. Full protocol details and the coding workbooks are provided in Supplement S2 and Supplement S3. The joined results are available in Supplement S4.
Validation accuracy was then assessed using standard metrics. Precision (TP/(TP+FP)) captures the share of OSM-detected changes that were real. Recall (TP/(TP+FN)) captures the share of real changes correctly detected by OSM. The F1 score is the harmonic mean of precision and recall. Ninety-five per cent Wilson confidence intervals (CI) were computed for each measure.
Precision = TP / (TP + FP)
Percentage of OSM changes that were correct. (How many OSM-detected changes are real).Recall = TP / (TP + FN)
Percentage of real changes detected by OSM. (How many real changes OSM detected).F1 = 2 · Precision · Recall / (Precision + Recall)
Balanced indicator combining both metrics.
Results
Network change detected from OSM (2015–2023)
Between 2015 and 2023, the OSM-derived cycling network in Barcelona increased from 153.6 to 288.4 km, representing an 88 % expansion (Figure 4). Geometric differencing indicated 155.7 km of added and 24.6 km of removed infrastructure, corresponding to a net gain of about 131 km. The small gap between this estimate and the change in totals (–3.7 km) suggests good internal consistency (Table 1).
| Metric | Value (km) |
|---|---|
| Total 2015 | 153.6 |
| Total 2023 | 288.4 |
| Net growth | 134.8 |
| Added | 155.7 |
| Removed | 24.6 |
| Added − Removed | 131.1 |
| Gap: (Added − Removed) − Net | -3.7 |
OSM-detected additions were not evenly distributed across the city. Low-density strata (D1_C1, D1_C2 and D1_C3) accounted for around 60 % of all additions, with particularly large gains in low-density areas closer to the centre (D1_C2 and D1_C3). In contrast, OSM indicated only a small number of removals, concentrated mainly in intermediate-density tracts such as D2_C1, D2_C3 and D3_C3 (Table 2). These patterns reflect Barcelona’s recent focus on expanding cycling provision in dense, transit-rich areas close to the core while redesigning only selected streets elsewhere.
| Stratum | Definition | Added (km) | Removed (km) | Added (%) | Removed (%) |
|---|---|---|---|---|---|
| D1_C1 | Low density, peripheral | 31.1 | 6.7 | 20.0 | 27.4 |
| D1_C2 | Low density, intermediate | 29.2 | 0.9 | 18.8 | 3.8 |
| D1_C3 | Low density, central | 33.3 | 10.9 | 21.4 | 44.5 |
| D2_C1 | Medium density, peripheral | 2.3 | 0.5 | 1.5 | 2.0 |
| D2_C2 | Medium density, intermediate | 14.4 | 0.6 | 9.3 | 2.3 |
| D2_C3 | Medium density, central | 16.4 | 2.5 | 10.5 | 10.2 |
| D3_C1 | High density, peripheral | 1.8 | 0.3 | 1.2 | 1.2 |
| D3_C2 | High density, intermediate | 12.5 | 0.6 | 8.0 | 2.6 |
| D3_C3 | High density, central | 14.6 | 1.5 | 9.4 | 5.9 |
| TOTAL | All strata combined | 155.7 | 24.6 | 100.0 | 100.0 |
Validation of OSM-detected changes using GSV
Of the 105 sampled sites, 96 (91 %) provided usable GSV panoramas: 42 additions, 7 removals and 44 non-cycling controls. These points were drawn across all density–centrality strata following the stratified sampling design. Because validation points could only be selected where OSM indicated a candidate segment of the relevant class, the distribution across strata in Table 3 reflects the availability of additions, removals and non-cycling segments in the sampled tracts rather than any imbalance in the sampling procedure.
| Stratum | Definition | Add | Remove | Nonci | Total |
|---|---|---|---|---|---|
| D1_C1 | Low density, peripheral | 5 | 1 | 5 | 11 |
| D1_C2 | Low density, intermediate | 8 | 1 | 4 | 13 |
| D1_C3 | Low density, central | 6 | 1 | 5 | 12 |
| D2_C1 | Medium density, peripheral | 2 | 2 | 6 | 10 |
| D2_C2 | Medium density, intermediate | 5 | 0 | 6 | 11 |
| D2_C3 | Medium density, central | 4 | 1 | 4 | 9 |
| D3_C1 | High density, peripheral | 1 | 0 | 5 | 6 |
| D3_C2 | High density, intermediate | 7 | 0 | 5 | 12 |
| D3_C3 | High density, central | 6 | 1 | 5 | 12 |
| TOTAL | All strata combined | 44 | 7 | 45 | 96 |
Table 4 summarises the validation outcomes. OSM captures all observed additions (recall = 1.00, 95 % CI [0.90–1.00]) but also includes some FP (precision \(\approx\) 0.73). For removals, recall was likewise 1.00 but with a wide CI due to the small sample size, while precision drops sharply to around 0.29, meaning that most apparent deletions do not correspond to true infrastructure loss. Overall precision across all changes is 0.67 with an F1 of 0.80.
| Class | n (usable) | TP | FP | FN | Precision (95% CI) | Recall (95% CI) | F1 |
|---|---|---|---|---|---|---|---|
| ADD | 44 | 32 | 12 | 0 | 0.73 [0.58-0.84] | 1.00 [0.89-1.00] | 0.84 |
| REMOVE | 7 | 2 | 5 | 0 | 0.29 [0.08-0.64] | 1.00 [0.34-1.00] | 0.44 |
| Pooled | 51 | 34 | 17 | 0 | 0.67 [0.53-0.78] | 1.00 [0.90-1.00] | 0.80 |
Discussion
These results suggest that OSM is a strong proxy for additions but a weaker one for removals. Although all real changes were captured, OSM tended to flag more additions and removals than were visible in GSV. Because the volume of apparent additions greatly exceeded that of removals, this imbalance likely produces a slight overestimation of net network growth when using raw OSM differencing.
A simple calibration using the observed precision values (0.73 for additions, 0.29 for removals) can adjust OSM-based estimates of change, reducing bias in net growth calculations. As validation was stratified by density and centrality, calibration factors can be adapted to specific urban contexts.
Methodologically, the study demonstrates a transparent, open-source framework for assessing the temporal reliability of OSM data on cycling infrastructure. The workflow, combining historical differencing, stratified sampling, and visual inspection, is reproducible in any city with sufficient Street View coverage. It underpins the ATRAPA Built Environment Transformations Dataset, which will extend this validation approach to other European cities including Milan, Ljubljana, Warsaw, Utrecht, Malmö, and Paris.
Theoretically, this work supports the view of OSM as a dynamic socio-technical system rather than a static dataset. Temporal variation in OSM reflects both genuine infrastructure evolution and the rhythms of community mapping activity. By distinguishing true from apparent edits, the approach contributes to a broader framework for assessing the temporal validity of VGI, which is an essential step for longitudinal research in transport, health, and environmental studies.
Several limitations should be acknowledged. True removals were few, producing wide confidence intervals for that category. GSV imagery dates vary within each anchor year, occasionally creating minor temporal mismatches. Although inter-rater agreement was high, subtle interpretive differences – for example, in deciding whether a feature should count as cycling infrastructure or not – remain possible. Finally, Barcelona’s large and active mapping community likely ensures relatively high data quality; replication in less-mapped cities will be required to test generalisability.
Conclusions
This Barcelona pilot demonstrates that OSM can reliably detect additions to cycling infrastructure but struggles to capture removals. Overall, OSM slightly overstates network growth. Despite these limitations, OSM remains a valuable, low-cost resource for longitudinal analysis of cycling infrastructure when accompanied by empirical calibration.
The study introduces a transparent validation framework that combines OSM historical data, stratified sampling, and GSV-based inspection to generate quantitative precision and recall metrics. These can be used to correct OSM-derived measures of change and to support comparisons within the ATRAPA framework. More broadly, the findings strengthen confidence in using VGI for temporal urban analysis while highlighting the need to understand the social and temporal dynamics behind its production.
Acknowledgements
This research forms part of the ATRAPA project and is funded by the European Research Council (grant 101117700). We also thank Víctor González Parra for his collaboration as second coder during the Google Street View validation and the OpenStreetMap community for their contributions.
Supplements
S1. Interactive maps (HTML)
View: Interactive validation points map (with GSV links)
View: Interactive OSM change-detection map
These maps allow zooming, panning and inspection of individual validation points and network segments.
S2. Google Street View validation protocol (PDF)
S3. Raw validation workbooks: coder 1 and coder 2 (XLSX)
S4. Final adjudicated validation results (XLSX)
Declaration of Generative AI and AI-assisted technologies in the writing process
During the preparation of this work, specifically in the revision phase after peer review, the author(s) used ChatGPT 5.1 in order to improve the readability and language of the revised manuscript in response to reviewer feedback. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the published article.