A friend of mine asked me to make a small project with him. He is actually, a fan of the The Elder Scrolls V - Skyrim game, but, when we searched for an API for it, we didn't find any. Maybe that's because nobody actually plays it anymore, but I doubt it, so I made my own API for it, scraping a few things of the wiki!
For these project, our steps are:
- Access the wiki programatically
- Scrape the data from there and store it
- Create a server and a few endpoints that show this data
So, for the first two steps, we are talking about webscraping! I never made a really great project with webscraping before, so it was a good opportunity for me to learn as well.
First of all, let's get to the wiki!
Here you can see an example of a wiki page:
Skyrim - Factions
After looking through it a bit, you can see that there's a list of all factions in the page:
So, looking in the site source code, we find that all tables with the content we need have a few elements in common, that we can use to filter out the junk text that is not going to compose our API.
After that, it was just a matter of using bs4 to get and clean our data!
from bs4 import BeautifulSoup
import requests
import json
def getLinkData(link):
return requests.get(link).content
factions = getLinkData(
"https://elderscrolls.fandom.com/wiki/Factions_(Skyrim)")
data = []
soup = BeautifulSoup(factions, 'html.parser')
table = soup.find_all('table', attrs={'class': 'wikitable'})
After that, we go through all elements in the page, scrape all rows and columns of the table and put it in a way that actually makes sense for our API.
Last but not least, we save that to one variable so that we can pass it to our API later.
for wikiTable in table:
table_body = wikiTable.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
# Get rid of empty values
data.append([ele for ele in cols if ele])
cleanData = list(filter(lambda x: x != [], data))
skyrim_data[html] = cleanData
Okay, with the scraping part done, we should then move to the next part which is create a server and display the data we scraped in an API manner.
For that, I used FastAPI which is one awesome idea for generating one API fast!
And it is as simple as it gets, first of all, we import it and then set our endpoints!
from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from skyrim import skyrim_data
app = FastAPI()
@app.get("/", response_class=HTMLResponse)
def home():
return("""
<html>
<head>
<title>Skyrim API</title>
</head>
<body>
<h1>API DO SKYRIM</h1>
<h2>Rotas disponíveis:</h2>
<ul>
<a href="/factions"><li>/factions</li></a>
</ul>
</body>
</html>
""")
@app.get("/factions")
def factions():
return skyrim_data["Factions"]
I defined the / route to be an actual list of all available endpoints and there we have it!
I deployed to Heroku so you can see it in action here. It can be a little slow to open because it needs to start the container, before running the code.
I hope this has been enlightning and that's all folks!
Top comments (0)