Throughout your journey as a Data Scientist, you will find yourself regularly dealing with data. Sometimes, these data are readily available, while other times, you have to source for and gather the data yourself.
Your data can be gathered from various sources, but more often than not, you would get these data from the web.
Now imagine you found a website that has this gigantic enormous data that you find very useful. Unfortunately, there is no way you can download the contents on this website onto your device for analysis.
Manually collating the data from the website would cost you a great amount of time. Fortunately, you can seamlessly import these data using some Python packages.
Use the function urlretrieve to save this file locally. Pass two arguments to the function - the url of the website (which has been assigned to the variable ‘url’) and the name you wish to save the file as.
The data is now saved as a file on your device, which you can manage and wrangle as you wish.
To fully understand how this works, you need to have a basic understanding of HTTP requests. But worry not, even if you do not understand requests, you can follow the steps below and import data from the web.
Import the functions urlopen and Request from the subpackage urllib.request.
Specify the url.
Then here comes the almighty requests package. It is an easier and more recommended way of performing the same import performed with urllib above.
Note that there are several other actions you could take with the packages used above, like interacting with an API. However, for the context of this article, we are only concerned with using them for importing data from a webpage.
Woohoo! You can now easily import data from the web with Python.
The data imported however are HTML contents, with html tags, and other html attributes. They are therefore not quite ready for use or analysis.
To make them ready for use, you have to format them using a package called BeautifulSoup. This will be discussed in a follow-up article.
Till then, keep importing data with these packages and doing wonders with Python.