Jan Tschada

Posted on May 19, 2023

Mapping the geospatial patterns of broadcasted news

#geoint #python #spatial #arcgis

Mapping the geospatial patterns of broadcasted news allows a geospatial analyst to gain insights into common and unusual geospatial patterns. We decided using one of the most comprehensive news collection named "Global Database Events of Tone and Language" (GDELT) as the ground truth.

The geography of worldwide news coverage

Understanding the geospatial patterns of such a massive knowledge graph is often difficult for non geospatial experts. The machine learning algorithms extract articles and features from websites in real-time. The geocoding engine matches extracted locations against over eleven million well-known place names. An article mentioning a place like "New York" leads to an extracted feature location. But, the article does not have to be specific about the named location. For instance, an article regarding "The impact of the COVID pandemic on capital markets." — mentioning "New York is no exception" — leads to a location match. We should expect some false positives, but the sum of all extracted locations should reflect a geospatial pattern and give us a coarse-grained overview.

Accessing the geospatial features using the geoprotests API

The geoprotests API offer ready-to-use geospatial features representing broadcasted news related to protests and demonstrations. You can use these geospatial features to build various mapping and geospatial applications.

Every geospatial result support the GeoJSON and Esri FeatureSet format out of the box. All endpoints support an optional date parameter for filtering the results. For best performance, the serverless cloud-backend calculate the geospatial aggregations of the last 24 hours between midnight and 1 AM. The serverless functions save these geospatial features and yesterday should be the latest available date. Without specifying a date, we calculate the geospatial features of the last 24 hours on-the-fly.

Ramp up your development environment

You need to activate your Rapid API account. Please, check out the RapidAPI Account Creation and Management Guide for more details.

Setup your Python based development environment using your weapon of choice. Using pip creating a virtual environment is one simple approach.

python -m venv geoint

# Linux
source geoint/bin/activate

# Windows
geoint/Scripts/activate

The Python module requests offers easy and elegant access to http functions. You need to install this module using pip.

pip install requests

Accessing the broadcasted news of 31th December 2022

The hotspot endpoint offers access to statistically significant named locations. The count of news mentioning these locations define the degree of significance.

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0
import requests

url = 'https://geoprotests.p.rapidapi.com/hotspots'

querystring = {
  'date': '2022-12-31',
  'format': 'geojson'
}

# Authenticate: https://rapidapi.com/auth
api_key = '<SIGN-UP-FOR-KEY>'
headers = {
  'x-rapidapi-host': 'geoprotests.p.rapidapi.com',
  'x-rapidapi-key': api_key
}

geojson_response = requests.request('GET', url, headers=headers, params=querystring)
geojson_response.raise_for_status()
features = geojson_response.json()

The features dictionary represents the named locations in GeoJSON format.

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "id": 41415,
      "geometry": {
        "type": "Point",
        "coordinates": [
          37.6156,
          55.7522
        ]
      },
      "properties": {
        "OBJECTID": 41415,
        "name": "Moscow, Moskva, Russia",
        "timestamp": "2022-12-31T00:00:00",
        "count": 77
      }
    }
  ]
}

Accessing the aggregated broadcasted news

The spatial aggregations are hexagonal geospatial features having a specific property representing the count of the news mentioning a location being within a hexagonal grid cell. Spatial analyst's use mapping capabilities visualizing hot and cold spots of these spatial aggregations.

Let us use the powerful mapping capabilities of ArcGIS and visualize the aggregated broadcasted news as a feature set. You need to specify the output format to esri.

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0
url = 'https://geoprotests.p.rapidapi.com/aggregate'

querystring = {
  'date': '2022-12-31',
  'format': 'esri'
}

headers = {
  'x-rapidapi-host': 'geoprotests.p.rapidapi.com',
  'x-rapidapi-key': api_key
}

esri_response = requests.request('GET', url, headers=headers, params=querystring)
esri_response.raise_for_status()
aggregated_features = esri_response.json()

The aggregated features represent the corresponding hexagonal grid cells as a Python dictionary. You need to setup the arcgis Python Module into your development environment. The contained map widget offers powerful mapping capabilities for your Jupyter Notebook environment. Follow the Install and Setup Guide.

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0
from arcgis.gis import GIS
from arcgis.features import FeatureSet

# Create a FeatureSet from the features
aggregated_featureset = FeatureSet.from_dict(aggregated_features)

# Anonousmly connect to ArcGIS Online
gis = GIS()

# Create a simple map view
map_view = gis.map('Europe')

# Add the FeatureSet as a layer
map_view.add_layer(aggregated_featureset)

You should tweak the rendering of the features using a dedicated renderer instance. The renderer is an optional parameter of the add_layer method. But, you could also use a different approach for mapping the aggregated features. The spatial-enabled dataframe offers plotting capabilities in combination with the map view. So that you can easily define rendering supporting natural class breaks.

# Create a simple map view
map_view = gis.map('Europe')

# Add the FeatureSet as a layer
aggregated_featureset.sdf.spatial.plot(map_view, 
                                      renderer_type='c',
                                      method='esriClassifyNaturalBreaks',
                                      class_count=5, 
                                      col='count', 
                                      cmap='YlOrRd',
                                      alpha=0.35)

The produced map shows the hotspot in red being Moscow, Russia with a mention count of 77, as we expected.

Any feedback is welcome, keep on mapping!

Feel free to try it out: geoprotests API.

DEV Community

Mapping the geospatial patterns of broadcasted news

The geography of worldwide news coverage

Accessing the geospatial features using the geoprotests API

Ramp up your development environment

Accessing the broadcasted news of 31th December 2022

Accessing the aggregated broadcasted news

Top comments (0)

Read next

eq and ne in PyTorch

How to Run stable-diffusion-3.5-large-turbo on Google Colab

Leveraging Python's Pattern Matching and Comprehensions for Data Analytics

25 retos de Programación de JavaScript y Python: AdventJS