In this blog, we will explore the geopolitical data from GDELT and see how that data can be used in the analysis.
The GDELT Project created by Kalev H. Leetaru monitors the world's news from every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images, and events driving our global society.
In this blog, we will have a look at the Events database of GDELT and how this data can be used for analysis.
The GDELT Event Database catalog over 20 main categories and more than 300 subcategories. Each category is given a particular cameo code. We will be looking into the 20 main cameo codes. That includes
- Make Public Statement
- Express intent to cooperate
- Engage in diplomatic cooperation
- Engage in material cooperation
- Provide aid
- Exhibit military posture
- Reduce relations
- Use unconventional mass violence
Let's see how we can get the data for these events for all countries.
- BigQuery You can query any data you want according to your need. Here is an example of a query.
select SQLDATE,EventRootCode,Actor1CountryCode,NumMentions from gdeltv2.events;
Using gdelt python package
pip install gdelt
- Call the gdelt version 2 database.
gd2 = gdelt.gdelt(version=2)
- Use gd2 object to search for the data of a given date and set table to events.
results = gd2.Search(['2020-01-01'],table='events',coverage=True)
- Load the data into the notebook.
df = pd.read_csv("gdelt.csv");
- The data output of the gdelt object has all the columns present in the events database. Now filter it to the columns necessary, i.e., SQLDATE, EventRootCode, Actor1CountryCode, NumMentions
results = results[['SQLDATE','EventRootCode','NumMentions','Actor1CountryCode']]
- Convert the SQLDATE format from 'YYYYMMDD' to 'YYYY-MM-DD'.
results['SQLDATE'] = results['SQLDATE'].apply(lambda x: pd.to_datetime(str(x), format='%Y-%m-%d'))
- Aggregate the data based on SQLDATE, EventRootCode, and Actor1CountryCode.
results = results.groupby(['SQLDATE','EventRootCode','Actor1CountryCode']).agg('sum').reset_index()
Mapping Top Cameo codes in a country based on the Number of Mentions of the particular cameo code.
Example: Top Trends in USA (Last Week)