DEV Community

Costasgk
Costasgk

Posted on

ScoreCast: A Tool for Predicting Football Game Outcomes in Minor Leagues

ImageCover

This web application provides predictions for informational purposes only and should not be considered as financial or betting advice. The accuracy of the predictions is not guaranteed, and we are not liable for any losses incurred from using the tool. It is intended to assist developers in the football industry and it is used for informational purposes only.

ScoreCast, an open-source web application developed for predicting football game outcomes in six minor football leagues: Serie A Brazil, Serie B Brazil, Primera Division Argentina, J1 League Japan, Eliteserien Norway, and Veikkausliiga Finland. This tool not only assists football enthusiasts and bettors in making informed decisions but also serves as a gateway for exploring football analytics and gaining valuable insights. Drawing inspiration from a Dataquest video, ScoreCast sets itself apart by consolidating six leagues into a single platform, providing instant predictions, and ensuring a seamless betting experience. With simplicity and user-friendliness as its pillars, ScoreCast becomes an indispensable companion for those venturing into the thrilling realm of minor league football betting.

The Goal Behind ScoreCast

Combining my interest for sports analytics and software development, I embarked on a journey to create ScoreCast. Scraping data from six minor football leagues — Serie A Brazil, Serie B Brazil, Primera Division Argentina, J1 League Japan, Eliteserien Norway, and Veikkausliiga Finland — I developed an open-source web application that stands as a predictor for minor league football games. Since there aren’t many available predictors for smaller leagues, ScoreCast serves as a guiding companion, offering insights to help users in making informed betting choices. ScoreCast covers the following six football leagues:

  1. Campeonato Brasileiro Série A (Brazil)

  2. Campeonato Brasileiro Série B (Brazil)

  3. Primera División de Argentina (Argentina)

  4. J1 League (Japan)

  5. Eliteserien (Norway)

  6. Veikkausliiga (Finland)

Web Scraping with Beautiful Soup

In the pursuit of gathering crucial data for ScoreCast, I used the web scraping tool Beautiful Soup, to extract information from the FBREF website. Using classic python modules, successfully retrieved comprehensive data from six minor football leagues: Serie A Brazil, Serie B Brazil, Primera Division Argentina, J1 League Japan, Eliteserien Norway, and Veikkausliiga Finland. The result was an extensive collection of six CSV files, each containing over 3000 rows of match details such as date, time, competition, round, venue, result, goals for (gf), goals against (ga), and much more. Below, I present the CSV data from the Serie A organized into a structured dataframe.

Image1

Exploring and Cleaning the Data

During the exploratory phase of the project, I encountered several missing values across multiple columns. For instance, in the Serie A Brazil CSV, notable columns such as ‘date,’ ‘time,’ ‘comp,’ ‘round,’ ‘day,’ ‘venue,’ ‘result,’ ‘gf,’ ‘ga,’ and ‘opponent’ had 158 missing values each. Careful data cleaning and imputation techniques were employed to handle these missing values.

In the data cleaning process, I applied a series of steps to refine the dataset and prepare it for analysis in ScoreCast. First, I converted the ‘date’ column into a datetime format and ensured the ‘time’ column was in string format. To handle missing values, I implemented several strategies. For instance, when encountering missing values in the ‘venue’ and ‘opponent’ columns, I filled them with the most frequent values found in the dataset. Similarly, I addressed missing values in the ‘formation’ column, either dropping it entirely if all values were missing or filling it with the most common formation.

Moving forward, I handled missing values in the ‘result’ column, filling them with the most frequent outcome recorded. To handle the ‘poss’ (possession) column, I either dropped it if all values were missing or filled the missing values with the mean possession value. Additionally, for the ‘gf’ (goals for) and ‘ga’ (goals against) columns, I converted the values to numeric data and filled any missing values with the respective mean goal counts.

Furthermore, I handled missing values in the ‘referee’ column, filling them with the most frequent referee’s name from the dataset. For columns related to shots, goals, and penalties, including ‘gls’, ‘sh’, ‘sot’, ‘sot%’, ‘g/sh’, ‘g/sot’, ‘pk’, and ‘pkatt’, I filled the missing values with their respective means, ensuring a balanced dataset for analysis.

As part of the data preprocessing, I also converted the ‘gf’ and ‘ga’ columns to numeric data, allowing for more efficient computations. Moreover, I ensured that the dataset was ready for further analysis and modeling, laying the groundwork for ScoreCast’s precise predictions and valuable insights.

Modeling and Extracting Predictions

Using the popular RandomForestClassifier algorithm, we aimed to build a robust model that forecasts football match outcomes of the six minor football leagues.

With the help of the Python library pandas, we loaded the cleaned CSV files, and for each match, we engineered new features such as ‘venue_code’, ‘opp_code’, ‘hour’, and ‘day_code.’ These features were essential for training our model as they provided critical information about the teams, match venue, and time of the match.

One of the challenges we faced was handling the missing data present in the CSV files. For example, in the Serie A Brazil dataset, there were missing values for ‘date,’ ‘time,’ ‘comp,’ ‘round,’ ‘day,’ ‘venue,’ ‘result,’ ‘gf,’ ‘ga,’ and ‘opponent.’ To address this, we used a combination of techniques such as filling missing values with the most common ones, taking rolling averages, and replacing NaN values with the rolling average. We also mapped certain values using a custom mapping function, which allowed us to effectively deal with missing data.

Below, we showcase the processed dataframe resulting from the above process. Please note that certain columns, such as “npxg/sh,” are dropped during this process, as they do not contribute to the training process.

Image2

To make our predictions, we divided the dataset into training and testing sets. The training data spanned up to July 19, 2023, while the testing data included matches from July 23, 2023, and beyond. Using the RandomForestClassifier algorithm from the scikit-learn library, we trained our model on the training data and then made predictions on the testing data.

We evaluated the model’s performance using metrics such as accuracy. With careful fine-tuning and data preprocessing, our model achieved an impressive accuracy rate of 73%.

Upon completing the modeling process, we generated predictions for each match in the six minor football leagues. The predictions were organized into a CSV file, with the columns ‘Date,’ ‘Team A,’ ‘Team B,’ ‘Prediction for Team A,’ and ‘Prediction for Team B.’ This concise format allowed us to present the match outcomes and the corresponding winning probabilities for each team.

Here are the results of the modeling and training presented in dataframe format:

Image3

However, it is essential to note that our model may encounter cases where both teams are predicted to win (W/W) or lose (L/L), or when the outcome is a draw for both teams (D/D). We advise users to exercise caution and consider seeking professional advice when encountering such predictions in betting.

Developing the Web Application

For the development of the ScoreCast web application, I utilized the Flask framework and successfully deployed it on the Heroku platform. The app allows users to access predictions for football games from the six minor leagues mentioned above: Serie A Brazil, Serie B Brazil, Primera Division Argentina, J1 League Japan, Eliteserien Norway, and Veikkausliiga Finland.

I set up the Flask app and created different routes to handle requests for each league’s predictions. The app follows a simple structure with an HTML template for the home page (‘index.html’) and separate templates for each league’s predictions: ‘br_a.html’ for Serie A Brazil, ‘br_b.html’ for Serie B Brazil, ‘arg.html’ for Primera Division Argentina, ‘jpn.html’ for J1 League Japan, ‘norw.html’ for Eliteserien Norway, and ‘fin.html’ for Veikkausliiga Finland.

For each league’s route, I read the corresponding CSV file into a Pandas DataFrame and performed some preprocessing. One of the key features of the web app is the flexibility it offers to developers. Specifically, it allows users to define their desired time frame for extracting predictions. By default, the app is set to extract predictions for matches within the time frame from ‘2023–07–30’ to ‘2023–08–31’. However, developers have the freedom to insert any time range of their choice directly through the code.

The web app has been deployed on Heroku and is accessible through the following domain: https://score-cast-3a6cb8fe5c50.herokuapp.com/. With this user-friendly web app, we can conveniently access and explore predictions for those six minor leagues.

Below we present the final web application deployed on Heroku.

ImageApp

Conclusions

In conclusion, ScoreCast is a comprehensive web application designed to provide football match predictions based on historical data from six minor football leagues: Serie A Brazil, Serie B Brazil, Primera Division Argentina, J1 League Japan, Eliteserien Norway, and Veikkausliiga Finland. Leveraging data gathering techniques such as web scraping and employing machine learning models like Random Forest Classifier, ScoreCast generates predictions for football matches. However, it is crucial to remember that these predictions are for informational purposes only and should not be used as financial or betting advice.

Heroku: https://score-cast-3a6cb8fe5c50.herokuapp.com/

GitHub: https://github.com/Costasgk/ScoreCast

Future Work

In the future, ScoreCast has exciting plans for development. We will focus on enhancing the prediction model’s accuracy through advanced machine learning techniques and algorithm fine-tuning. Additionally, we aim to expand our data sources, optimize data processing pipelines for efficiency, and explore cutting-edge prediction models. The user interface will also be refined to offer a seamless and intuitive experience, providing valuable match outcome insights. These initiatives will ensure that our tool remains accurate and reliable, providing valuable insights to users in the football industry.

References

  1. ScoreCast GitHub Repository: https://github.com/Costasgk/ScoreCast The ScoreCast GitHub repository contains the source code and files for the football prediction web application. Users and developers can access and explore the codebase to understand the implementation details and contribute to the project.

  2. FBref: Football Data and Statistics: https://fbref.com/en/comps
    FBref is a reliable source for comprehensive football data and statistics, providing valuable information on various leagues and competitions. It served as a key data source for the ScoreCast application, enabling the extraction of essential match details and performance metrics.

  3. YouTube Video by Dataquest: https://www.youtube.com/watch?v=Nt7WJa2iu0s&ab_channel=Dataquest
    This video provided insightful guidance and inspiration during the development process of the ScoreCast application.

Top comments (2)

Collapse
 
jonyclaber profile image
Jonyclaber • Edited

Hello. Navigating the dynamic terrain of sports betting, this sports betting software development company emerges as an undisputed leader. With an intuitive interface, real-time data analytics, and impeccable security features, they have crafted a platform that caters to both beginners and seasoned bettors. Their commitment to innovation and user experience is evident in every feature, making the betting journey not just profitable but also enjoyable. For those seeking the best in sports betting technology, look no further than this trailblazing company!

Collapse
 
anni profile image
Anietie Brownson

Great stuff

Some comments may only be visible to logged-in visitors. Sign in to view all comments.