DEV Community

holger
holger

Posted on

Python: List all Current and Planned Azure Regions

The list of all current and planned Azure regions can be reviewed at the Azures Global Infrastructure Website which is publicly available [1]. However, it is not a flat list that one could skim through or re-use within a document but it requires interaction by selecting the corresponding regions and countries.

Fortunately, there is Pandas and Python, both of which can be used to extract the various tables from the website and put the data out as a convenient list on the console, into a file or for further processing.

Below, I will go through some lines of code with which this could be done. However, be aware that this approach has significant drawbacks: If the wording on the website or the structure of the table is changed, then the script will not work anymore and requires modification.

I'm pretty sure that these small tasks can soon be handled within a few seconds by AI Companions, such as the new Bing - but doing it manually is still fun, certainly provides an opportunity to learn and you know where the information comes from and how it was processed.

Happy reading. :-)

What are Azure Regions?

Azure operates in multiple datacenters around the world. These datacenters are grouped in to geographic regions, giving [Azure Customers] flexibility in choosing where to build [their] applications.

Azure Documentation - What are Azure regions? [2]

How are new Regions announced?

New regions, as well as other Azure updates, are typically announced on the Azure Updates website. [3]
It can even be consumed as an RSS Feed.

Using Python and Pandas to extract the various regions and their status

Every now and then I need a list of all current and planned Azure regions. Since navigating the Azures Global Infrastructure Website is a very manual task, I thought about retrieving the data using Python. Here are the steps I took. [1]

First of all, a few libraries would be required:

  • requests for sending HTTP requests and retrieving data from the web [4]
  • pandas to filter and present the data [5]
  • lxml since the read_html() function of Pandas uses this library by default (does not need to be imported explicitly) [6]

The import statements would then look like this (search is actually required within the script to find certain keywords within the data):

import requests
import pandas as pd
from re import search
Enter fullscreen mode Exit fullscreen mode

As next step, the data from the Azure Website could be retrieved and Pandas' readhtml() function could be used to read the various tables into dataframes.

url = 'https://azure.microsoft.com/en-us/explore/global-infrastructure/geographies/'
html = requests.get(url).content
df_list = pd.read_html(html)
Enter fullscreen mode Exit fullscreen mode

When looking at the first slice of the dataframes, one could see that some further polishing might be needed and that there was a lot of information that was not necessarily of interest for a basic list.

for df in df_list[:1]:
    print(df)
Enter fullscreen mode Exit fullscreen mode

Output:

                       Regions  \
0                     Location   
1                  Year opened   
2  Availability Zones presence   
3                   Compliance   
4               Data residency   
5            Disaster Recovery   
6           Products by region   
7                 Available to   

                               East Asia  Start free  \
0                                          Hong Kong   
1                                               2010   
2                             Available with 3 zones   
3  Global Compliance  CIS Benchmark, CSA STAR Att...   
4  Stored at rest in the Asia Pacific region  Lea...   
5  Cross-region options:  Azure Site Recovery  Re...   
6                        See products in this region   
7                         All customers and partners   

                          Southeast Asia  Start free  
0                                          Singapore  
1                                               2010  
2                             Available with 3 zones  
3  Global Compliance  CIS Benchmark, CSA STAR Att...  
4  Stored at rest in the Asia Pacific region  Lea...  
5  Cross-region options:  Azure Site Recovery  Re...  
6                        See products in this region  
7                         All customers and partners 
Enter fullscreen mode Exit fullscreen mode

From above output, it is possible to derive some intermediate conclusions though:

  • The dataframe containing the original table headers starts with Regions might not be of interest.
  • The header row of the other dataframes contains the Region, the first row contains the location.

This can be confirmed when looking at the header rows only:

for df in df_list[:1]:
    for dc in list(df):
        print(dc)
Enter fullscreen mode Exit fullscreen mode
Regions
East Asia  Start free
Southeast Asia  Start free
Enter fullscreen mode Exit fullscreen mode

One could check whether a row contains Regions and if so, just ignore it. Then if it contains Coming soon, one could set the state variable to planned and otherwise consider it active.

The appropriate location is contained in the first row, which is why it could be derived from df[dc][0] and then the region fields could need some clean-up by removing strings that are not of interest, like Start free or Get started.

regions_list = []
locations_list = []

for df in df_list:
    for dc in list(df):
        if search('Regions', dc):
            pass
        else:
            if search('Coming soon', dc):
                state = 'planned'
            else:
                state = 'active'

            az_location = df[dc][0]
            region = dc.removesuffix('  Start free')
            region = region.removesuffix('  Get started')
            region = region.removesuffix('  Coming soon')
Enter fullscreen mode Exit fullscreen mode

Since there are duplicates in the list of dataframes (wich I assume are coming from the fact that there is a selection for nearby datacenters on the website, that is based on the currently selected region), it may be required to skip some of them, which is why a list could be useful that contains all regions that have already been processed.

Then, a dictionary can be created that holds the corresponding values and is finally appended to a list.

            if region in regions_list:
                pass
            else:
                regions_list.append(region)
                locations_list.append(
                    dict({
                        'az_display_name': region,
                        'az_short_name': region.replace(' ','').lower(),
                        'az_location': az_location,
                        'az_state': state
                    })
                )
Enter fullscreen mode Exit fullscreen mode

As last step, the region details could be put into one (or separate) dataframes. Since the list of regions is relatively large, Pandas needs to be configured to display them all when the output is primarily for the console.

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
Enter fullscreen mode Exit fullscreen mode

Once this is done, the dataframes can be put out to the console.

df = pd.DataFrame(locations_list)

df_planned_regions = df[df.values == 'planned']
df_active_regions = df[df.values == 'active']

# Print only active regions
print(df_active_regions)

# Print only planned regions
print('\n') # Newline
print(df_planned_regions)

# Print all regions, regardless of their status
print('\n') # Newline
print(df)
Enter fullscreen mode Exit fullscreen mode

The list of planned regions would look something like this:

         az_display_name       az_short_name      az_location az_state
13     Indonesia Central    indonesiacentral          Jakarta  planned
17         Malaysia West        malaysiawest     Kuala Lumpur  planned
18     New Zealand North     newzealandnorth         Auckland  planned
19  Saudi Arabia Central  saudiarabiacentral     Saudi Arabia  planned
20          Taiwan North         taiwannorth           Taipei  planned
21          Austria East         austriaeast           Vienna  planned
23          Denmark East         denmarkeast       Copenhagen  planned
29        Greece Central       greececentral           Athens  planned
30           Italy North          italynorth            Milan  planned
32        Poland Central       polandcentral           Warsaw  planned
33         Spain Central        spaincentral           Madrid  planned
41         Chile Central        chilecentral         Santiago  planned
42        Mexico Central       mexicocentral  Querétaro State  planned
46             East US 3             eastus3          Georgia  planned
60   US Sec West Central    ussecwestcentral      Undisclosed  planned
62        Israel Central       israelcentral           Israel  planned
Enter fullscreen mode Exit fullscreen mode

Through the to_markdown() function, we could even create it in markdown format right away.

    print(df_planned_regions.to_markdown())
Enter fullscreen mode Exit fullscreen mode
|    | az_display_name      | az_short_name      | az_location     | az_state   |
|---:|:---------------------|:-------------------|:----------------|:-----------|
| 13 | Indonesia Central    | indonesiacentral   | Jakarta         | planned    |
| 17 | Malaysia West        | malaysiawest       | Kuala Lumpur    | planned    |
| 18 | New Zealand North    | newzealandnorth    | Auckland        | planned    |
| 19 | Saudi Arabia Central | saudiarabiacentral | Saudi Arabia    | planned    |
| 20 | Taiwan North         | taiwannorth        | Taipei          | planned    |
| 21 | Austria East         | austriaeast        | Vienna          | planned    |
| 23 | Denmark East         | denmarkeast        | Copenhagen      | planned    |
| 29 | Greece Central       | greececentral      | Athens          | planned    |
| 30 | Italy North          | italynorth         | Milan           | planned    |
| 32 | Poland Central       | polandcentral      | Warsaw          | planned    |
| 33 | Spain Central        | spaincentral       | Madrid          | planned    |
| 41 | Chile Central        | chilecentral       | Santiago        | planned    |
| 42 | Mexico Central       | mexicocentral      | Querétaro State | planned    |
| 46 | East US 3            | eastus3            | Georgia         | planned    |
| 60 | US Sec West Central  | ussecwestcentral   | Undisclosed     | planned    |
| 62 | Israel Central       | israelcentral      | Israel          | planned    |
Enter fullscreen mode Exit fullscreen mode

The table renders nicely in markdown.

az_display_name az_short_name az_location az_state
13 Indonesia Central indonesiacentral Jakarta planned
17 Malaysia West malaysiawest Kuala Lumpur planned
18 New Zealand North newzealandnorth Auckland planned
19 Saudi Arabia Central saudiarabiacentral Saudi Arabia planned
20 Taiwan North taiwannorth Taipei planned
21 Austria East austriaeast Vienna planned
23 Denmark East denmarkeast Copenhagen planned
29 Greece Central greececentral Athens planned
30 Italy North italynorth Milan planned
32 Poland Central polandcentral Warsaw planned
33 Spain Central spaincentral Madrid planned
41 Chile Central chilecentral Santiago planned
42 Mexico Central mexicocentral Querétaro State planned
46 East US 3 eastus3 Georgia planned
60 US Sec West Central ussecwestcentral Undisclosed planned
62 Israel Central israelcentral Israel planned

Please find the cohesive code below. It may not be very elegant but shows how tables from websites could be extracted using Python and Pandas.

import requests
import pandas as pd
from re import search

def list_azure_regions():
    url = 'https://azure.microsoft.com/en-us/explore/global-infrastructure/geographies/'
    html = requests.get(url).content
    df_list = pd.read_html(html)

    regions_list = []
    locations_list = []

    for df in df_list:
        for dc in list(df):
            if search('Regions', dc):
                pass
            else:
                if search('Coming soon', dc):
                    state = 'planned'
                else:
                    state = 'active'

                az_location = df[dc][0]
                region = dc.removesuffix('  Start free')
                region = region.removesuffix('  Get started')
                region = region.removesuffix('  Coming soon')

                if region in regions_list:
                    pass
                else:
                    regions_list.append(region)
                    locations_list.append(
                        dict({
                            'az_display_name': region,
                            'az_short_name': region.replace(' ','').lower(),
                            'az_location': az_location,
                            'az_state': state
                        })
                    )

    return locations_list

if __name__ == '__main__':
    azure_regions = list_azure_regions()

    pd.set_option('display.max_rows', None)
    pd.set_option('display.max_columns', None)

    df = pd.DataFrame(azure_regions)

    df_planned_regions = df[df.values == 'planned']
    df_active_regions = df[df.values == 'active']

    # Print only active regions
    print(df_active_regions)

    # Print only planned regions
    print('\n') # Newline
    print(df_planned_regions)

    # Print all regions, regardless of their status
    print('\n') # Newline
    print(df)
Enter fullscreen mode Exit fullscreen mode

References

Top comments (0)