You can try the tool online, or check out this screencast for an overview of its main features. The interface is available also through the R package
Scientists across different disciplines are increasingly taking up the nice habit of publishing the datasets that emerge from their research, and these often include geolocated data. The same is true with open data distributed by public administrations: more and more, such datasets include geocoded information.
All of this sounds like a great starting point for data journalists. If the data is available for a large area, for example, it will surely take just a minute to extract the data for a given region or municipality, and use them to write a story for a local media. Right?
I feel that every single person that has worked with open data to any meaningful extent went through such moments of naive optimism at least once when finding out about the availability of a given data source. In particular when the dataset is the result of scientific research, excitement quickly turns into despair, as accessing such open data often requires using different programming languages, having familiarity with coding and at least the basics of geocomputation.
In some instances, the complexity of the data may make this inevitable. In other cases, the issue may simply be that data formats that seem perfectly readable for a given target audience need to be translated into something that can be understood by non-professionals.
Latitude, longitude, and data
Geo-located data are distributed in a variety of formats. For many of them, just double-clicking on the downloaded file will bring-up… well, nothing. But sometimes the relevant dataset is distributed as a spreadsheet. And data journalists love spreadsheets.
In the roller coaster of hope and despair that characterizes data journalism endeavours, a spreadsheet is a good sign, but often not quite enough. For example, let’s take a nicely formatted spreadsheets with three columns: latitude, longitude, and a value. Journalists need to report on places, locations, administrative units, anything that will make sense to their readers, and then often aggregate data. So, how do we go from here?
In this post, I will go through this relatively common case, and introduce a tool that will hopefully make it much easier to overcome the initial difficulties.
What about some ornithology?
To showcase this scenario, I will use an openly-licensed ornithological dataset of bird sightings. This is mainly to keep things simple, without giving too much thought to substantive analysis. Also, the pandemic pushed more people to look out for birds and care about them. As their numbers are decreasing so rapidly, more of us should care about birds, really. Also, I like birds.
Let’s move on to data analysis.
I have this dataset with all sightings of Alcedo atthis (better known as kingfisher) and of Phoenicopterus roseus (the flamingo) recorded on the eBird platform in the last two weeks in and around Europe.
I open the spreadsheet, and it looks like this: