Sometimes when you scrape a website, you may have encountered the fact that the website returns shortened URLs to sources from other websites.
As in this case, for example, https://upflix.pl/r/Qb64Ar
this link consists of a domain and some random characters. The way a shortened link works is that it redirects you to another page. Therefore, the status_code
that our query returns is 302
Sometimes it happens that we need a full URL to get that can do this with a few lines of Python code and the requests library.
pip install requests
We will use the head
method to perform this function
This method is similar to get
with the difference that it does not return any content, only headers.
response = requests.head(short_url)
After executing the query, we can check the headers that were returned.
There is information here such as:
- date
- type of website content
- character encoding
- FULL LINK and many other information you can see below.
{'Date': 'Thu, 16 Nov 2023 00:43:13 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Connection': 'keep-alive', 'location': 'https://www.imdb.com/title/tt14060708/', 'vary': 'Origin', 'x-powered-by': 'PHP/7.3.33', 'x-frame-options': 'SAMEORIGIN', 'CF-Cache-Status': 'DYNAMIC', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=bgvCMcMQg1ZkjanlgqzemKUHHthalhb%2FAT72Q58O8a22eFmkeb%2FyeeIMfKkGFwt8WmkMB6dv28F1G2CdH134Kilk%2BcdQNweIZ3O%2FN9KlQf1A2VF%2Bm3yYT89rvjU%3D"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}', 'Strict-Transport-Security': 'max-age=15552000; includeSubDomains; preload', 'X-Content-Type-Options': 'nosniff', 'Server': 'cloudflare', 'CF-RAY': '826bb2ce597ebfda-WAW', 'alt-svc': 'h3=":443"; ma=86400'}
Full code
import requests
from typing import Optional
def get_full_url(short_url: str) -> Optional[str]
response = requests.head(short_url)
if response.status_code == 302:
headers = response.headers
return headers["location"]
return None
Top comments (2)
It will return either the full URL or an alternative string, depending on the shortener provider. Typically, this alternative string is the shortener service URL if the link is broken.
missing optional import from typing and colon after optional