DEV Community

Roel Hogervorst
Roel Hogervorst

Posted on • Originally published at blog.rmhogervorst.nl on

Tweeting wikidata info

In this explainer I walk you through the steps I took to create a twitter botthat tweets daily about people who died on that date.

I created a script that queries wikidata, takes that information and createsa sentence. That sentence is then tweeted.

For example:

A tweet I literally just send out from the docker container

A tweet I literally just send out from the docker container

I hope you are has excited as I am about this project. Here it comes!

There are 3 parts:

  1. Talk to wikidata and retrieve information about 10 people that died today
  2. Grab one of the deaths and create a sentence
  3. Post that sentence to twitter in the account wikidatabot
  4. Throw it all into a docker container so it can run on the computer of someone else (AKA: THA CLOUD)

You might wonder, why people who died? To which I answer, emphatically but notreally helpfully: ‘valar morghulis’.

1. Talk to wikidata and retrieve information

I think wikidata is one of the coolest knowledge bases in the world, it containsfacts about people, other animals, places, and the world. It powers many boxesyou see in Wikipedia pages. For instance this random page about Charles the first has a box on theright that says something about his ancestors, successors and coronation.The same information can be displayed in Dutch.This is very cool and saves Wikipedia a lot of work. However, we can also use it!

You can create your own query about the world in the query editor. But it is quite hard to figure out how to do that. These queries need to made ina specific way. I just used an example from wikidata: ‘who’s birthday is it today?’and modified it to search for people’s death (that’s how I learn, modifysomething and see if I broke it). It looks a lot like SQL, but is slightly different.

Of course this editor is nice for us humans, but we want the computer to do itso we can send a query to wikidata. I was extremely lazy and used theWikidataQueryServiceR created by wiki-guru Mikhail Popov @bearlogo.

This is the query I ended up using (It looks very much like the birthdays onebut with added information):

querystring <- 
'SELECT # what variables do you want to return (defined later on)
  ?entityLabel (YEAR(?date) AS ?year) 
  ?cause_of_deathLabel 
  ?place_of_deathLabel 
  ?manner_of_deathLabel  
  ?country_of_citizenshipLabel 
  ?country_of_birth
  ?date_of_birth
WHERE {
  BIND(MONTH(NOW()) AS ?nowMonth) # this is a very cool trick
  BIND(DAY(NOW()) AS ?nowDay)
  ?entity wdt:P570 ?date.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  ?entity wdt:P509 ?cause_of_death.
  OPTIONAL { ?entity wdt:P20 ?place_of_death. }
  OPTIONAL { ?entity wdt:P1196 ?manner_of_death. }
  FILTER(((MONTH(?date)) = ?nowMonth) && ((DAY(?date)) = ?nowDay))
  OPTIONAL { ?entity wdt:P27 ?country_of_citizenship. }
  OPTIONAL { ?entity wdt:p19 ?country_of_birth}
  OPTIONAL { ?entity wdt:P569 ?date_of_birth.}
}
LIMIT 10'
Enter fullscreen mode Exit fullscreen mode

Try this in the query editor

When I created this blog post (every day the result will be different)the result looked like this:

library(WikidataQueryServiceR)
result <- query_wikidata(querystring)

## 10 rows were returned by WDQS

result[1:3,1:3]# first 3 rows, first 3 columsn

## entityLabel year cause_of_deathLabel
## 1 Rafael García-Plata y Osma 1918 influenza
## 2 Dobroslava Menclová 1978 traffic crash
## 3 Alan J. Pakula 1998 traffic crash
Enter fullscreen mode Exit fullscreen mode

The query returns name, year, cause of death, manner of death(didn’t know which one to use), place of death, country of citizenship, countryof birth and date of birth.I can now glue all these parts together to create a sentence of sorts

2. grab one of the deaths and create a sentence

I will use glue to make text, but the paste functions from base R is also fine.

These are the first lines for instance:

library(glue)
glue_data(result[1:2,], "Today in {year} in the place {place_of_deathLabel} died {entityLabel} with cause: {cause_of_deathLabel}. {entityLabel} was born on {as.Date(date_of_birth, '%Y-%m-%d')}. Find more info on this cause of death on www.en.wikipedia.org/wiki/{cause_of_deathLabel}. #wikidata")

## Today in 1918 in the place Cáceres died Rafael García-Plata y Osma with cause: influenza. Rafael García-Plata y Osma was born on 1870-03-04. Find more info on this cause of death on www.en.wikipedia.org/wiki/influenza. #wikidata
## Today in 1978 in the place Plzeň died Dobroslava Menclová with cause: traffic crash. Dobroslava Menclová was born on 1904-01-02. Find more info on this cause of death on www.en.wikipedia.org/wiki/traffic crash. #wikidata
Enter fullscreen mode Exit fullscreen mode

Post that sentence to twitter in the account wikidatabot

I created the twitter account wikidatabot andadded pictures 2fa and some bio information. I wanted to make it clear that itwas a bot. To post something on your behalf on twitter requires a developersaccount. Go to https://developer.twitter.com and create that account. In my caseI had to manually verify twice because apparently everything I did screamed bot activityto twitter (they were not entirely wrong). You have to sign some boxes,acknowledge the code of conduct and understand twitter’s terms.

The next step is to create a twitter app but I will leave that explanation tortweet, because that vignette is veryvery helpful.

When you’re done, you can post to twitter on your account with the help ofa consumer key, access key, consumer token and access token. You will need themall and you will have to keep them a secret (or other people can post on youraccount, and that is something you really don’t want).

With those secrets and the rtweet package you can create a token that enablesyou to post to twitter.

And it is seriously as easy as:

rtweet::post_tweet(status = tweettext, token = token )
Enter fullscreen mode Exit fullscreen mode

Again the same tweet

Again the same tweet

4 Throw it all into a docker container

I want to post this every day but to make it run in the cloud it would be niceif R and the code would be nicely packed together. That is where docker comes in,you can define what packages you want and a mini operating system is createdthat will run for everyone on ever computer (if they have docker).The whole example script and docker file can be found here on github.

And that’s it. If you have suggestions on how to run it every day in the cloudfor cheap, let me know by twitter or by opening an issue on github.

Things that could be done better:

  • I can run the container, but I don’t know how to make it run in the cloud
  • I ask for 10 deaths and pick one randomly, I don’t know if there is a random function in sparql
  • I put the (twitter) keys into the script, it would be better to use environment variables for that
  • rtweet and WikidataQueryServiceR have lots of dependencies that make the docker container difficult to build (mostly time consuming)
  • I guess I could just build the query and post to wikidata, but using WikidataQueryServiceR was much faster
  • I wish I knew how to use the rocker:tidyverse container to run a script, but I haven’t figured that out yet

State of the machine

At the moment of creation (when I knitted this document ) this was the state of my machine: click here to expand

sessioninfo::session_info()
## ─ Session info ──────────────────────────────────────────────────────────
## setting value                       
## version R version 3.5.1 (2018-07-02)
## os Ubuntu 16.04.5 LTS          
## system x86_64, linux-gnu           
## ui X11                         
## language en_US                       
## collate en_US.UTF-8                 
## tz Europe/Amsterdam            
## date 2018-11-19                  
## 
## ─ Packages ──────────────────────────────────────────────────────────────
## package * version date source         
## backports 1.1.2 2017-12-13 CRAN (R 3.5.0) 
## blogdown 0.8 2018-07-15 CRAN (R 3.5.1) 
## bookdown 0.7 2018-02-18 CRAN (R 3.5.0) 
## clisymbols 1.2.0 2017-05-21 CRAN (R 3.5.0) 
## crayon 1.3.4 2017-09-16 CRAN (R 3.5.0) 
## curl 3.2 2018-03-28 CRAN (R 3.5.0) 
## digest 0.6.15 2018-01-28 CRAN (R 3.5.0) 
## evaluate 0.11 2018-07-17 CRAN (R 3.5.1) 
## glue * 1.3.0 2018-07-17 CRAN (R 3.5.1) 
## htmltools 0.3.6 2017-04-28 CRAN (R 3.5.0) 
## httr 1.3.1 2017-08-20 CRAN (R 3.5.0) 
## knitr 1.20 2018-02-20 CRAN (R 3.5.0) 
## magrittr 1.5 2014-11-22 CRAN (R 3.5.0) 
## R6 2.2.2 2017-06-17 CRAN (R 3.5.0) 
## Rcpp 0.12.18 2018-07-23 cran (@0.12.18)
## rmarkdown 1.10 2018-06-11 CRAN (R 3.5.0) 
## rprojroot 1.3-2 2018-01-03 CRAN (R 3.5.0) 
## sessioninfo 1.0.0 2017-06-21 CRAN (R 3.5.1) 
## stringi 1.2.4 2018-07-20 cran (@1.2.4)  
## stringr 1.3.1 2018-05-10 CRAN (R 3.5.0) 
## WikidataQueryServiceR * 0.1.1 2017-04-28 CRAN (R 3.5.1) 
## withr 2.1.2 2018-03-15 CRAN (R 3.5.0) 
## xfun 0.3 2018-07-06 CRAN (R 3.5.1) 
## yaml 2.2.0 2018-07-25 CRAN (R 3.5.1)

Top comments (0)