I always thought getting worldwide postal codes by myself was an easy task because postal codes seem to be nothing more than a simple shortcode that is publicly available. I quickly realized this was not the case, because:
- There is no single source of truth
- Most sources were incomplete
- Data was very often presented in a very unstructured way
After doing some general research, I soon understood, that the reason for the problems above had their origin in the history of postal codes. Each country has a different format, area granularity, and way of structuring postal codes as a whole.
I first tried to scrape Wikipedia with the following code. For this post, I will use the example of Austria.
For this, I a small python script.
Before running it make sure to install all dependencies:
pip3 install lxml
pip3 install requests,
pip3 install bs4
import requests from bs4 import BeautifulSoup url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_in_Austria' # fire GET request response = requests.get(url) # parse content content = BeautifulSoup(response.text, 'lxml') # get postal codes postcodes = [ postcode.text for postcode in content.find_all('li') if ' - ' in postcode.text ] # filter edge cases postcodes = [ postcode.split() for postcode in postcodes if len(postcode.split()) == 3 or len(postcode.split()) == 4 ] # write output to file with open('at_postcodes.txt', 'a') as f: for postcode in postcodes: f.write(postcode + '\n')
The obtained data set and the related approach might be enough for some use cases, but since I wanted to get global postal code data, I was not satisfied.
I live in Austria and realized very quickly that the data that I have just scraped is not complete (some postal codes are missing). Considering the time it took my to build the parser and the fact that I would have to adapt it for every single data source (adaptions are even needed across Wikipedia since every article is written differently), I decided to give up.
This was the moment I gave up and started to look for ready-to-use solutions:
I hope this article will save you some time, in case you are trying to achieve the same.