DEV Community

Cover image for Azure Express Route in-depth
Olivier Miossec
Olivier Miossec

Posted on • Edited on

Azure Express Route in-depth

Azure Express Route is an essential service when you want to connect your on-premises network to your Azure infrastructure. It is more robust and consistent than a VPN solution and provides higher bandwidth, from 50 Mbps to 100 Gbps.
Express Route can connect your network to Office 365 and Microsoft services, but this post only discusses Azure resources.

Azure Express Route is a direct link to the Microsoft network backbone from your on-premises network. Let’s see how it works.

First, you have two ways to connect to Microsoft Backbone, you can choose between Azure Express Route or Azure Express Route Direct.

Connection to Azure

With Azure Express Route, you delegate the connectivity to an operator, like Equinix (Cloud Exchange) or BT (connectivity partner) to one of the Azure Point of Presence (see). These Points of Presence (PoP) are not necessarily in an Azure Region. These PoPs are just locations where the Microsoft backbone is available. The operator can connect your network to Azure. You will need two routers (if possible, in two different places) supporting BGP sessions, 2 IPv4 /30 peering for the peering and an AS number for your network (16 or 32-bit, public or private), a VLAN ID to establish the peering, one or more prefix to announce to the Azure side and to choose the bandwidth (from 50 Mbps to 10 Gbps)

For Express Route Direct, there is no operator to help you. You will need to set up a PoP in the same location as Azure. You need two 10 Gbps or two 100 Gbps ports using Single Mode LR Fiber to connect to the Microsoft backbone. Your switch Router must support 802.1Q tag, BGP sessions per Port/device. Like the first option, you will need 2 IPv4 /30 peering for the peering and an AS number for your network (16 or 32 bit, public or private), a VLAN ID to establish the peering, one or more prefix to announce to the Azure side.

Now that the connectivity is in place, you need to create an Express Route Circuit the process is different between Express Route and Express Route Direct. With Express Route Direct, you will need to create an additional resource, an Express Route Port, and the Express Route Circuit with this port. With an operator, you will receive a service key that will be used when enabling the Express Route Circuit.

Creating the Express Route circuit

Before creating the circuit, you need to evaluate several elements:

  • The Bandwidth, you will have to choose between 50 Mbps and 10 Gbps, for Express Route Direct the bandwidth is between 1 Gbps and 10 Gbps for the 10 Gbps link and between 5 Gbps to 100 Gbps for the 100 Gbps link.

Then you have to choose the SKU tier:

  • The Local SKU allows you to use only VNETs in the same Azure Region as the circuit peering. For example, if you peer your network in Paris, you will be allowed to use VNET deployed in the France Central region, but not in the West Europe region. This SKU allows your on-premises network to advertise up to 4000 routes.
  • The Standard SKU allows you to use only VNETs in the same geopolitical region (see https://learn.microsoft.com/en-us/azure/expressroute/expressroute-locations?tabs=america%2Ca-c%2Cus-government-cloud%2Ca-C ) If your peering point is in Paris you will be able to use VNETs in France Central and West Europe (same geopolitical region, Europe), but not in East US. And like the Local SKU, your on-premises network can advertise up to 4000 routes.
  • The Premium SKU (or Premium add-on) allows you to connect VNETs in any Azure region. If your peering point is in Paris, you will be able to use VNETs in France Central and West Europe (same geopolitical region, Europe), and the East US (North America region). With this SKU, your on-premise network can advertise up to 10000 routes.

The last thing to decide is the billing model: Metered or unmetered. Depending on the amount of data you expect to route from Azure to your on-premises (only traffic to on-premises is invoiced), you will need to estimate the traffic coming from Azure to your on-premise network to see which option is better for you (you can use the Azure Price Calculator).

If you can raise the bandwidth of your Express route (the opposite is not true) the SKU and the billing model cannot be changed after the creation of the Circuit. For the billing, you will need to do some estimation and calculation to have to right option. For the SKU the choice is more delicate. You will need to estimate the number of routes that will be advertised by your on-premises network, and from which Azure region VNET will peer to the Express Route Circuit.

The Express Route Circuit is just the materialization of your connection in Azure. It can be seen as access to the Microsoft Enterprise Edge router, a link between your network and the Microsoft backbone. This routing device will use your AS Number and prefixes that you provide to route traffic in Azure. These prefixes should be chosen wisely. You can announce 0.0.0.0/0 for example, but all traffic will end up in your network. For example, you can announce a unique prefix for all your Azure resources, 172.16.0.0/12. But in this case, if you have two different Express route Circuits, traffic will be allowed between them using your on-premises routers.

The Express Route Gateway

Having a functional Express Route Circuit doesn’t mean that you can connect your workload to your on-premises network. An Azure Express Route circuit is just the representation of your connection to a Microsoft Enterprise Edge Router. This router is not in Azure, the router can be outside any Azure region.

You need an Express Route Gateway to connect Azure resources to the Express Route circuit. This gateway is a router, like a VPN gateway that connects a VNET to a VPN tunnel, The Express Route Gateway connects a VNET to the Express Route Circuit and your on-premises network via secure links.

There are several SKUs for Express Route Gateway, the old series:
Standard, High Performance, and Ultra Performance, and the new series: ERGw1Az, ERGw2Az, ERGw3Az, and ERGwScale (currently in preview, it is the default one for Azure Virtual Wan). The new series is deployed in Availability Zones to ensure availability in case of zone failure. This is the recommended series.

An Express route gateway is based on VMs (or VMSS if you choose ERGWScale or if you use Azure Virtual wan, in this case, you have a VMSS). The Express route capacity depends on the underlying VMs (except for the ERGWScale, capacity depends on the number of instances). But all gateways, including the ERGWScale, have a limit of 500 routes to advertise to the Express Route and the on-premises network. There is also a limit for the number of route learned from the Express Route (from 4500 to 9500 (the limit is higher for the ERGWScale depending on the number of instances). It is important to have this limit in mind and to set up alerting to avoid incidents.

In terms of monitoring, you should pay attention to two other metrics, CPU usage, bandwidth, and the number of active flows. VM backs an Express route gateway, high CPU utilization could lead to packet contention and performance degradation, and in case of maintenance, one instance will be off, and the two remaining instances will have to deal with the same charge that could result in higher latency and sometimes packet loss. The bandwidth, Bit received per second, should stay lower to the limit of the SKU (1 Gbps for ERGw1Az, 2 Gbps for ERGw2Az, and 10 Gbps for ERGw3Az), if not you will experience higher latency and higher CPU usage. The number of active flows is the number of packets currently in transit in the gateway. The maximum number, like the bandwidth, depends on the SKU of the gateway (200k for ERGw1Az, 400k for ERGw2Az, and 1000k for ERGw3Az).

What do you do if you reach the limit? You can start some investigation to see who might be responsible for the surge. For that, you will need to use Virtual Network Flow logs, it will help you with the help of KQL to see which network uses the gateway in this way.
But most of the time, it is the result of higher use by your teams of the network infrastructure. You need to scale up the infrastructure using two strategies; you can migrate the VNG to a higher SKU or you can create a new Express Route Gateway and move VNETs peering to the new gateway.

The Express Route Gateway is a real router. It connects the Azure world to the MSEE router in the peering location. The gateway is a router that exchanges routing information and routes traffic to and from the MSSE router. It uses BGP, you can see how by using Azure CLI.

az network vnet-gateway list-bgp-peer-status -n <GateWayName> -g <RG Name> -o table
Enter fullscreen mode Exit fullscreen mode
Neighbor      ASN    State      ConnectedDuration    RoutesReceived    MessagesSent    MessagesReceived
------------  -----  ---------  -------------------  ----------------  --------------  ------------------
xx.xx.xx.68  12076  Connected  3.21:43:08.0670805   2                 6422            6195
xx.xx.xx.69  12076  Connected  3.21:43:08.0514564   4                 6427            6193
Enter fullscreen mode Exit fullscreen mode

This command lists BGP neighbors of the Gateway, the MSSE router. The IP used here belong to gatewaySubnet.

az network vnet-gateway list-bgp-peer-status -n VNG-express-route-test -g RG-neu-vnet-hub-test --query 'value[].{LocalAddress:localAddress, Neighbor:neighbor, ASN:asn, State:state, RoutesReceived:routesReceived}' -o table
Enter fullscreen mode Exit fullscreen mode
LocalAddress    Neighbor      ASN    State      RoutesReceived
--------------  ------------  -----  ---------  ----------------
xx.xx.xx.76    xx.xx.xx.68  12076  Connected  2
xx.xx.xx.76    xx.xx.xx.69  12076  Connected  4
Enter fullscreen mode Exit fullscreen mode

You see that the MSSE is connected to the gateway subnet and uses AS Number 12076 to communicate with the gateway.

If we list learned routes.

az network vnet-gateway list-learned-routes -n VNG-express-route-test  -g RG-neu-vnet-hub-test -o table
Enter fullscreen mode Exit fullscreen mode
Network           NextHop       Origin    SourcePeer    AsPath       Weight
----------------  ------------  --------  ------------  -----------  --------
xx.xx.xx.0/24                  Network   xx.xx.xx.76               32768
xx.xx.zz.0/24                  Network   xx.xx.xx.76                 32768
xx.xx.xx.128/27  xx.xx.xx.68  EBgp      xx.xx.xx.68  12076-12076    32869
xx.xx.xx.128/27  xx.xx.xx.69  EBgp      xx.xx.xx.69  12076-12076  32869
xx.xx.xx.0/24    xx.xx.xx.68  EBgp      xx.xx.xx.68  12076-12076  32869
xx.xx.xx.0/24    xx.xx.xx.69  EBgp      xx.xx.xx.69  12076-12076  32869
172.16.0.0/16    xx.xx.xx.69  EBgp      xx.xx.xx.69  12076-4xxx  32869

Enter fullscreen mode Exit fullscreen mode

There are two origins, network with is the local VNET and peered VNET, in this case not Next Hop, and EBgp (for External BGP) that list the next Hop, the source, and the AsPath 12076-12076 for internal traffic and 12076-4xxx for traffic going to or from on-premises. The 4xxx represents the AS number of the on-premises network.

On the MSSE, the Express Circuit, this command shows us the communication between the Router and the on-premises router.

az network express-route list-route-tables-summary -g RG-neu-vnet-hub-test -n ER-ams2-equinix-ccoe-test --path primary --peering-name AzurePrivatePeering --query value -o table
Enter fullscreen mode Exit fullscreen mode
As     Neighbor       StatePfxRcd    UpDown    V
-----  -------------  -------------  --------  ---
4xxxx  x.z.xx.241  0              7w6d      4
65515  xx.xx.xx.76   4              4d00h     4
65515  xx.xx.xx.77   4              4d00h     4
Enter fullscreen mode Exit fullscreen mode

Here the 4xxxx is the AS Number of the on-premises network and 65515 is the AS Number used by the MSSE router.

Dealing with more than 10 Gbps

But what if you need to have more than a 10 Gbps bandwidth? This scenario is far more common than you can imagine. For example, if you need to encode diffuse Live Video in Azure, you need to have more than 10 Gbps bandwidth to avoid any delay in video diffusion. First, you will need to use Express Route Direct with more than 10 Gbps bandwidth, then you will have 2 options: using Fast Path or using the ERGWScale Express Route Gateway.
Fast Path is an Express Route feature, that allows packets to bypass the gateway and inject the flow directly into a VM virtual network interface. You need to use the ErGw3AZ (a gateway is still needed to use Fast Path) and you will have a limit of the number of VNIC you can use in Fast Path. The VM can be in the gateway VNET or the peered VNET (in the hub and spoke model), but the peered VNET should be in the same Azure Region as the gateway. There are some other limitations like the limited use of private links and the limited available Azure regions.

The second option is to use the ERGWScale gateway SKU, currently in preview. ERGWScale is based on Virtual Machine Scale Set instead of VMs, you can scale the capacity to aggregate more throughput.

Top comments (1)

Collapse
 
aix_smichel profile image
Sébastien Michel

As Always, great article ?