Top Universities Dataset

#datascience #python #database #bs4

Introduction

Dataset of World University Rankings of past 8 years along with the parameters like Quality of Staff, Alumni Employment, Publications, Citations, etc. I used beautiful soup parser to extract data from html source code, which I obtained using requests package in python.

You can find the folder of dataset on Kaggle : https://www.kaggle.com/saumitrajagdale/university-rankings

Scraping Data

I scraped this data from the official rankings page provided by CWUR.
URL: https://cwur.org/

The dependecy packages which I use were:

Pandas: For keeping data in dataframe format.
Beautiful Soup (bs4): For parsing HTML source code.
Requests: For obtaining the source code of a given url.
Numpy: For basic array operations.

Code Snippet For Scraping:

# Dependencies
import pandas as pd
import bs4 
import urllib.request
import numpy as np

# Obtaining source code from the url
url ="https://cwur.org/2012.php"
url_contents = urllib.request.urlopen(url).read()

# Parsing the HTML source code 
soup = bs4.BeautifulSoup(url_contents, "html.parser")

# Extracting the data according to the HTML tags
rows=[]
r=soup.findAll("tr")
for i in range(1,len(r)):
    temp=r[i].findAll("td")
    row=[]
    for j in range(0,len(temp)):
        if j==0:
            s=str(temp[j])
            s=s[4:]
            s=s[:-5]
            row.append(s)
        else:
            s=str(temp[j])
            s=s[4:]
            s=s[:-5]
            row.append(s)
    print(row)
    rows.append(row)

# Converting data into dataframe usings pandas
df=pd.DataFrame(rows,columns=["World Rank","University","Location","National Rank", "Quality of Education", "Alumni Employment", "Quality of Faculty", "Publications", "Influence", "Citations", "Patents","Score"])
print(df)

# Creating csv file from the dataframe
df.to_csv("University_Ranks_2012.csv")

Scope of Analysis

This Dataset can be used for following analysis:

To find the most significant and weighted parameter affecting the ranks of Universities
To find the trend of rankings of past 8 years based on the parameters provided as columns in dataset.
To visualise the ranking rise and fall of a particular university with rankings as y- axis and years as x-axis. [Line Graphs]

DEV Community

Top Universities Dataset

Introduction

Scraping Data

Code Snippet For Scraping:

Scope of Analysis

Top comments (0)

Read next

Day 4 - None Datatype & input() function in Python

Automating Flask & PostgreSQL Deployment on KVM with Terraform & Ansible

Python Asynchronous Programming: Simplifying Concurrency Like a Pro

Automate Saving the Planet... Or Just Your Computer's Energy 🐍