information on how to check gemeenteblad source data for spelling mistakes

This commit is contained in:
knoflook 2025-02-18 13:12:00 +01:00
parent ae7babeca1
commit 50fb1297c8

30
info.md
View File

@ -47,6 +47,36 @@ done
```
compare against amount of lines in `working-data/02/` with `wc -l` for each file and then we see which sets dont return all data then more investigation can happen and the query can be adjusted.
## checking the data from gemeenteblad for mistakes
some of the street names are misspelled. to check for that i was using the following process.
in the `source-data/by-district` directory i run a quick sed command for each neighbouhood to format data for pasting into a tool i.e. `cat 10-ijsselmonde | sed 's/, /, Rotterdam\n/g'`
this returns a list like this (abbreviated for readability):
```
Aesopusviaduct, Rotterdam
Anthony Tijkenstraat, Rotterdam
Bierens de Haanweg, Rotterdam
Bollandstraat, Rotterdam
Bolnesserkade, Rotterdam
Burgemeester Molenaarstraat, Rotterdam
Cannenburchstraat, Rotterdam
Zuidkreek.
```
then i go into https://www.mapcustomizer.com/ and use the bulk import option to mark all these streets, adding `, Rotterdam` to the last street name so it looks like this
```
Aesopusviaduct, Rotterdam
Anthony Tijkenstraat, Rotterdam
Bierens de Haanweg, Rotterdam
Bollandstraat, Rotterdam
Bolnesserkade, Rotterdam
Burgemeester Molenaarstraat, Rotterdam
Cannenburghstraat, Rotterdam
Zuidkreek, Rotterdam
```
if you try these streets 2 of them will fail (bolnesserkade and cannenburghstraat). then you manually look these streets up in a search engine. in our case - Bolnesserkade is marked as being in Ridderkerk and Cannenburghstraat is misspelled. The first one i just ignore (can be added manually later, it's an edge case) the second one i just correct in the source file and make a comment about it.
## AI-generated guide