Today we are going to see how we can scrape currency data using Python and BeautifulSoup is a simple and elegant manner.
This article aims to get you started on a real-world problem solving while keeping it super simple, so you get familiar and get practical results as fast as possible.
So the first thing we need is to make sure we have Python 3 installed. If not, you can just get Python 3 and get it installed before you proceed.
Then you can install beautiful soup with.
We will also need the library's requests, lxml, and soup sieve to fetch data, break it down to XML, and to use CSS selectors. Install them using.
Once installed, open an editor and type in.
Now let's go to the Yahoo currency page and inspect the data we can get.
This is how it looks.
Back to our code now. Let's try and get this data by pretending we are a browser like this.
Save this as yahoo_currency_bs.py.
If you run it.
You will see the whole HTML page.
Now, let's use CSS selectors to get to the data we want. To do that, let's go back to Chrome and open the inspect tool.
We notice that all the individual rows of data are contained in a with the class that begins with 'data-row.' We can get BeautifulSoup to select that data like this.
This prints all the content in each of the rows.
We can now pick out classes inside these rows that contain the data we want. The currencies are in elements with the class, which increments from data-col0, data-col1, data-col2. This makes life easy for us.
If you run it, it will print out all the Currency symbols.
Bingo!! We got the currencies.
Now with the same process, we get the other data like Name, the last price, change, change percentage, etc.
We even added a separator to show where each symbol data ends. You can now pass this data into an array or save it to CSV and do whatever you want. If you want to use this in production and want to scale to
If you want to use this in production and want to scale to thousands of links, then you will find that you will get IP blocked quickly by Yahoo. In this scenario, using a rotating proxy service to rotate IPs is almost a must.
Otherwise, you tend to get IP blocked a lot by automatic location, usage, and bot detection algorithms.
Our rotating proxy server Proxies API provides a simple API that can solve all IP Blocking problems instantly.
With millions of high speed rotating proxies located all over the world
With our automatic IP rotation
With our automatic CAPTCHA solving technology
With our automatic User-Agent-String rotation (which simulates requests from different, valid web browsers and web browser versions)
Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.
A simple API can access the whole thing like below in any programming language.