So how to use the HTTP protocol directly in Python? Here are some examples. As the name indicates, the GET requests received data from a website:
>>> import requests >>> requests.get('https://youtube.com') <Response > >>>
It returns a response. The HTTP protocol has many predefined responses:
- A response of 200 means it has successfully received the data.
- The response 404 means the data is not found.
You can store the output of the get function like this:
>>> response = requests.get('https://youtube.com') >>> response.status_code 200 >>>
To get the content you can use this:
You can parse HTML in Python, but if you need a web browser there are pre-made components you can use.
Data on the web is not always HTML either, sometimes you'll load an image or text file.
Sometimes the data is in JSON format, in that case you want to use the json module. JSON is often used on the web, its a format that lets you define objects. These objects can be converted to Python object and from Python objects to JSON.
You can get more data on the request here:
This gives many variables like format, cookies, content encoding etc.
If you don't want to parse data, but just download it, you have several options.
You can use the requests module. The example below downloads an image file using requests.
import requests import shutil image_url = "https://cdn.pixabay.com/photo/2020/01/31/19/21/ural-owl-4808774__340.jpg" resp = requests.get(image_url, stream=True) local_file = open('image.jpg', 'wb') resp.raw.decode_content = True shutil.copyfileobj(resp.raw, local_file)
To download files, you can use urllib2.
Its possible to use system tools too (wget, curl).
Claim your page on DEV before someone else does
Level up every day