An article by Monika Sengul-Jones titled “The promise of Wikidata” published on datajournalism.com a couple of months ago highlighted how Wikidata — a sort of database associated with Wikipedia — could be used by data journalists in a number of ways. Indeed, in the past years using Wikidata as a source has come up in various brainstorming sessions with colleagues contributing to EDJNet, and indeed we will be publishing soon a new material that makes extensive use of Wikidata.
Why aren’t more data journalists using Wikidata? Even beyond issues highlighted by Monika Sengul-Jones in her piece such as (unevenly) incomplete data, we have identified two additional obstacles to wider adoption of Wikidata in this context.
Firstly, getting data out of Wikidata can be an intimidating task even for people who are familiar with coding. Besides data wrangling, one needs some familiarity with the data structure of Wikidata (this is unavoidable, but it’s not too bad) and with SPARQL database queries, a major pain for those unaccustomed to database languages (see Wikidata’s instructions). Exploration of data — a typical component of data journalism — remains complex, and iterative processes less than intuitive.
Secondly, matching Wikidata identifiers to lists of individuals or objects as found “in the wild” is error-prone, and manual checks can be extremely time consuming.
To deal with this, we have been working on an interface to facilitate matching lists of strings to relevant Wikidata identifiers; we will be releasing it soon and announce it in a dedicated post.
Today, we are instead presenting a new tool, or rather, a package for the R programming language —
tidywikidatar — that facilitates interacting with Wikidata for the many data journalists who use R and are familiar with its established data wrangling tools. In brief,
tidywikidatar makes it easier to get data from Wikidata and explore them, without having to deal with complex database queries or nested data structures.
To see it in action, in this post we will outline a basic routine for exploring information stored on Wikidata, and find out what Wikidata knows about members of the European Parliament.
Setting up the package
First, you need of course to install the package.