DEV Community

Cover image for Interested in Football Analytics?
Enrique Uribe
Enrique Uribe

Posted on

Interested in Football Analytics?

I've recently started my journey diving into football analytics and have created a sample python program that references https://understat.com/ to scrape single game shot data.

This marks the beginning of my journey into data manipulation. I’m excited to dive deeper into this field and look forward to sharing more updates as I progress.

Repo:
https://github.com/UribeJr/football-data-scraper-to-csv-exporter

#!/usr/bin/env python
# coding: utf-8

# In[2]:


#import modules and packages
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd


# In[3]:


#scrape single game shots
base_url = 'https://understat.com/match/'
match = str(input("Enter your match ID: "))
url = base_url + match


# In[16]:


res = requests.get(url)
soup = BeautifulSoup(res.content, 'lxml')
span = soup.find('span')
script = soup.find_all('script')
script


# In[18]:


string = script[1].string
string


# In[26]:


#strip symbols so we only have json data
index_start = string.index("('") + 2
index_end = string.index("')")

json_data = string[index_start:index_end]
json_data = json_data.encode('utf8').decode('unicode_escape')
data = json.loads(json_data)


# In[35]:


df_h = pd.DataFrame(data['h'])
print("Home Team DataFrame:")
print(df_h.head())


# In[37]:


# Save the home team DataFrame to a CSV file
df_h.to_csv('home_team_shots.csv', index=False)


# In[ ]:
Enter fullscreen mode Exit fullscreen mode

How To

  • Import all necessary packages/modules requests, pandas, BeautifulSoup
  • Go to https://understat.com/ and go to any match that you want specific shot data for. Match URL should look like the following https://understat.com/match/{match-id}
  • Execute data_scraping.py and input the match-id

Congratulations!

The program then scrapes the shot data from the match and converts each Home and Away's team data into a separate Data Frame. The Data Frame's are then export as separate CSV Files for reference.

Data Frame:

Screenshot 2024-09-13 at 11 18 58 AM

CSV:

Screenshot 2024-09-13 at 11 21 52 AM

Top comments (0)