Play Button Pause Button
IBM Developer

Exploring Bias in Crime Data

hammertoe profile image Matt Hamilton ・2 min read

I was joined this week by my colleague Margriet Groenendijk to look through some crime data to do a bit of an exploration and see what we could see in the way of biases in the data.

A 1080p version of this video is on Cinnamon

This is the recording from my ML for Everyone show that broadcasts every Tuesday at 2pm UK time on the IBM Developer Twitch channel.

We didn't know really what we were looking for, but wanted to see if some of the commonly heard views of biases in policing were visible in the data. This is a hugely political and emotive area at the moment. So much so that IBM has just launched Call for Code for Racial Justice to encourage tech projects to combat racism.

Screenshot of Call for Code for Racial Justice website

The data we used was taken from the UK police open data website: https://data.police.uk/

Margriet has a python notebook that we were using for the session:

GitHub logo IBMDeveloperUK / Data-Science-Lunch-and-Learn

Resources for weekly Data Science Lunch & Learns

Every Monday: Data Science Lunch & Learn

Online at 12.30 BST / 13.30 CEST on Crowdcast


Jupyter notebooks

Most of the events use a notebook to go through example code. We will mainly use Watson Studio to run these, but you can run them on any platform. To follow along in Watson Studio sign up for a free IBM Cloud account and create a Watson Studio service as described in these instructions. Use the below urls to load the notebook for each of the events to follow along during the event.

Upcoming events

26 Oct 2020: Classification Models using Python and Scikit-Learn 

2 Nov 2020: Mapping COVID projections

9 Nov 2020: Automate your machine learning workflow tasks using Elyra and Kubeflow Pipelines

16 Nov 2020: AMA with

We were specifically looking at "Stop and Search" data reported by the police force in the area I live in, Avon and Somerset Police.

The first thing we immediately found is that the data in itself can sometimes be confusing. For example ethnicity is broken down both by 'self reported' and 'officer reported'. Which in and of itself could be significant.

There are many different ways this data could be interpreted. And we'd need a lot more knowledge of the specific terms in the reporting to be able to draw any rigorous conclusions. But we wanted still to see what we could see.

One specific area we chose was the 'outcomes' of a stop and search versus the officer reported race. ie. if you suspected a bias in policing you might expect to see a higher prevalence of stop and searches carried out for one race for stops in which no action was subsequently taken.

We found that if you are classed as Black or Asian that the probability of the outcome being no action was 25% versus 30% for white people stopped. Of course we have to be aware of correlation versus causation here as there could be two plausible explanations for these numbers:

  1. That BAME people stopped are 'let off' with no action more often.
  2. That police officers are more likely to stop a BAME person for no offence.

So this was just a very superficial look at the data, but hopefully shows how you can use python notebooks, and the pandas library to explore and visualise the data.

If you want to learn more, then please drop by the IBM Developer Europe Twitch stream on Tuesdays from 2-3pm UK time, or have a look at the Data Science Lunch and Learn series we run on the IBM Developer Europe Crowdcast channel.


Editor guide