Web Scraping CoinsMarketCap with Python: Selenium

#python

When it comes to web scraping in Python, people usually have two choices:

bs4 + requests
Selenium (the so called webdriver!)

Often, it suffices with approach one (beautifulsoup), and one can scrape the majority of websites by adding a header. However, for some websites that are equipped with strong anti-scraping, selenium is a must in your toolkit.

Today, we are going to look at an example of scraping historical price of bitcoin at coinmarketcap.

We need the historical data of bitcoin, but instead of manually copy paste, can we automate this process? Wouldn't it be nice to have a scraper, so that each time we run it, it just scrapes everything we want?

Sure, it would be very nice! But how do we do this?

First, import necessary packages

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
import calendar
from pprint import pprint

Next, our target page is https://coinmarketcap.com/currencies/bitcoin/historical-data/, or to make it more general, it is of the format https://coinmarketcap.com/currencies/{exchange_name}/historical-data/, where exchange_name could be bitcoin, ethereum etc.

We then open the url by webdriver

url: str = f"https://coinmarketcap.com/currencies/{exchange_name}/historical-data/"
driver = webdriver.Firefox()
driver.get(url)

Next, we need to manually inspect the page (using development tool), and see how can we select specific element.

So, what we are interested lies in a table. The table has a parent class with class "history". That should be pretty much enough for specifying the elements we want.

elem = driver.find_element(By.CSS_SELECTOR, ".history tbody tr")

This selects the top most row of the table. To get the inside text of the elem, we just use elem.text.

What left is just getting target information with playing strings. Very straightforward.

Below is a fully working code:


from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
import calendar
from pprint import pprint


def get_latest_price(exchange_name="bitcoin"):
    exchange_name = exchange_name.lower()
    url: str = f"https://coinmarketcap.com/currencies/{exchange_name}/historical-data/"
    driver = webdriver.Firefox()
    driver.get(url)
    # wait the page to load
    time.sleep(2)
    # get the latest date
    elem = driver.find_element(By.CSS_SELECTOR, ".history tbody tr")
    res = elem.text
    res = res.split(" $")
    date = res[0]
    open_price = res[1]
    high_price = res[2]
    low_price = res[3]
    close_price = res[4]
    driver.close()
    return {
        "exchange_name": exchange_name,
        "url": url,
        "date": date,
        "open_price": open_price,
        "high_price": high_price,
        "low_price": low_price,
        "close_price": close_price,
    }

Let's try it out:

➜  python3 scraper_exchange.py 
{'close_price': '17,781.32',
 'date': 'Dec 13, 2022',
 'exchange_name': 'bitcoin',
 'high_price': '17,930.09',
 'low_price': '17,111.76',
 'open_price': '17,206.44',
 'url': 'https://coinmarketcap.com/currencies/bitcoin/historical-data/'}

Yeah! We do get the latest real price.

DEV Community

Web Scraping CoinsMarketCap with Python: Selenium

Top comments (0)

Read next

An elegant and simple way to use Python Lists: List Comprehensions

Chat with your PDF using Pinata,OpenAI and Streamlit

LuxDevHQ (Lux Academy and Data Science East Africa) Training Program in AI, Data Science, Analytics, and Data Engineering

💡Never forget again: Build a scheduled reminder app in <50 lines of Python using REST and Postgres