DEV Community

Kai Walter
Kai Walter

Posted on • Updated on

Azure Functions on Service Fabric

This post describes the steps and learnings in order to get Functions running inside Service Fabric inside a virtual network connected over ExpressRoute to our enterprise back-ends systems which I talked about in Build Mission Critical .NET microservices - BRK3047 (video) / (session description).

Chapter 1 - Azure Functions / WebHost v1

Why ?

I'm working on a project which is a platform for integrating applications that exist outside our corporate network to our corporate back-end systems. A good share of this integration platform is build in the Microsoft Azure cloud environment.

Though it is "only" intended as an integration layer between digital and classical world, it still keeps some business data and logic worth protecting and not having out in the "open".

Most of the services we use in this scenario - API Management, Cosmos DB, Service Bus, Service Fabric, Storage - are available with virtual network endpoints and thus access to these services can be reduced to the boundaries of the virtual network - with that limiting the attack surface of the overall environment.

One major element of the integration story is the processing logic. This is mostly implemented using a simple stateless input-processing-output notion which Azure Functions (and other function & serverless platforms) deliver. In fact together with API Management these functions are the very DNA of the system while the other services "just" as much as fulfill their single intended purpose (storing, queueing, ...).

The challenge

Now. How to get these functions also inside the boundaries of the virtual network ...

  • with seamless access to connected back-end resources
  • without any public exposure

App Service capability of VNET peering was not sufficient enough for me to fulfill these requirements back then.

App Service Environment (aka ASE): back in spring/summer 2017 instance deployment times here in West Europe were in the range between 4 and 6h & the monthly cost were beyond reasonable justification - my approach is described here. Additionally ASE spinned up a some VMs / resources which would not be properly untilized in a Functions only scenario. Maybe meanwhile it has improved with v2 - I did not check.

But we already had Service Fabric running inside the virtual network - hosting stateful services and the stateless services we could not easily bridge to back-end resources over API Management. So why not just squeeze Azure Functions somehow into Service Fabric?

Motivation

Bits and pieces, I figured out along the journey, I already put in several Q&A style articles on Stack Overflow. However I wanted to piece it together into a more comprehensive story to give people out there a chance to follow along and may be adapt a few things for themselves.

WebHost v1 as Service Fabric application

I invested some time into this before containers were available on Service Fabric. Forking the WebJobs Scripts SDK / Functions v1 I tried to adapt the code so that it can run as a native Service Fabric application. I abandoned this approach: too much work and lacking knowledge on my end to succeed.

On the way I managed to get a Functions console host as Guest executable hosted in Service Fabric but that did not help - I really needed the WebHost.

WebHost v1 as container in Service Fabric

Fortunately around fall 2017 containers got supported in SF and I was able to bake the v1 WebHost into a Windows container.

Finally with this approach Functions hosted in this pattern could exist as first class citizens in the virtual network and we were able to migrate stateless Service Fabric applications mentioned above into Functions to achieve one common programming model. Also a lot of "jump" APIs (bridging from Functions in public Consumption or App Service Plan into the private virtual network) we hosted in API Management could be removed.


Scalability

Before I go into single elements of the approach, a word on scalability - which seems to be an obvious issue; thanks Paco for pointing it out.

When we started with our platform we assumed, that it needed to be scalable like hell. Thousands of messages per second coming in from all directions which need to be passed on immediately at the same pace. In the aftermath: not so much. Some producers may generate loads of messages (e.g. on a mass data change in a business system) but most of the consumers - including our database, Cosmos DB - need a balanced way of getting these messages delivered. Hence the platform is acting more like a sandwich - allowing fast unloading from the producers and throttled forwarding applied towards the consumers.

Based on our initial assumption we started fronting our unloading points with API Management which passed on the requests to Azure Functions in Consumption Plan. That setup fulfilled the scalability requirements pretty good - increasing incoming traffic made the Functions scale up, decreasing traffic back down again. What we did not consider - and back then just didn't know - were the limitations of the Function Consumption Plan sandbox environments. The HTTP triggered Functions picked up the incoming traffic and distributed it to several target message queues, database and/or target consumers HTTP endpoints. The high message load combined with too much processing or forwarding steps resulted in massive port exhaustion situations. To circumvent this we decided to let API Management take the initial load which puts message meta data directly into Service Bus queues and message payload into Blob storage. With that let Functions work on this message traffic at a controlled pace.

Hence no further need for Consumption Plan and flexible scaling of the Function App instances anymore. Today we exercise semi-automated scaling of the containers depending on the backlog we have in certain Service Bus queues.


key elements and gotchas

building the Function host

When I started it was possible to download the pre-built Functions host directly with the Dockerfile:

...
ADD https://github.com/Azure/azure-functions-host/releases/download/1.0.11559/Functions.Private.1.0.11559.zip C:\\WebHost.zip

RUN Expand-Archive C:\WebHost.zip ; Remove-Item WebHost.zip`
...
Enter fullscreen mode Exit fullscreen mode

At some point the team stopped providing these precanned versions. This required to adjust our CI/CD process for the Functions host base image into downloading the source code, building it e.g. in Azure DevOps aka VSTS to be loaded into the base image:

...
ADD Functions.Private.zip C:\\WebHost.zip

RUN Expand-Archive C:\WebHost.zip ; Remove-Item WebHost.zip
...
Enter fullscreen mode Exit fullscreen mode

managing master key / secrets

To control the master key the Function host uses on startup - instead of generating random keys - we prepared our own host_secrets.json file

{
   "masterKey": {
   "name": "master",
   "value": "asGmO6TCW/t42krL9CljNod3uG9aji4mJsQ7==",
   "encrypted": false
},
"functionKeys": [
      {
         "name": "default",
         "value": "asGmO6TCW/t42krL9CljNod3uG9aji4mJsQ7==",
         "encrypted": false
      }
   ]
}
Enter fullscreen mode Exit fullscreen mode

and then feeded this file into the designated secrets folder of the Function host (Dockerfile):

...
ADD host_secrets.json C:\\WebHost\\SiteExtensions\\Functions\\App_Data\\Secrets\\host.json
...
Enter fullscreen mode Exit fullscreen mode

auto starting web site

Dockerfile included this configuration to get the default web site autostarted and pointing to the Functions WebHost.

...
RUN Import-Module WebAdministration; \
    Set-ItemProperty 'IIS:\Sites\Default Web Site\' -name physicalPath -value 'C:\WebHost\SiteExtensions\Functions'; \
    Set-ItemProperty 'IIS:\Sites\Default Web Site\' -name serverAutoStart -value 'true'; \
    Set-ItemProperty 'IIS:\AppPools\DefaultAppPool\' -name autoStart -value 'true';
...
Enter fullscreen mode Exit fullscreen mode

Always On / Keep Alive

I tested this setup also with background Service Bus queue processing. Though I set the autostart properties for the Web Site, the background processing only started when the WebHost was initiated by a HTTP trigger. For that reason I have at least one HTTP triggered function (in the sample below GetServiceInfo) which I query in the HTTP health probe of Service Fabrics load balancer. That keeps the WebHost up and running for background processing.

from the Service Fabric ARM template:

...
        "loadBalancingRules": [
          {
            "name": "Service28000LBRule",
            "properties": {
              "backendAddressPool": {
                "id": "[variables('lbPoolID0')]"
              },
              "backendPort": 28000,
              "enableFloatingIP": false,
              "frontendIPConfiguration": {
                "id": "[variables('lbIPConfig0')]"
              },
              "frontendPort": 28000,
              "idleTimeoutInMinutes": 5,
              "probe": {
                "id": "[concat(variables('lbID0'),'/probes/Service28000Probe')]"
              },
              "protocol": "Tcp"
            }
          },
...
        "probes": [{
...
          {
            "name": "Service28000Probe",
            "properties": {
              "protocol": "Http",
              "port": 28000,
              "requestPath": "/api/GetServiceInfo",
              "intervalInSeconds": 60,
              "numberOfProbes": 2
            }
          },
...
Enter fullscreen mode Exit fullscreen mode

loading own set of certificates

Dockerfile can be used to load certificates into the container, to be used by the Function App:

...
ADD Certificates\\mycompany.org-cert1.cer C:\\certs\\mycompany.org-cert1.cer
ADD Certificates\\mycompany.org-cert2.cer C:\\certs\\mycompany.org-cert2.cer

RUN Set-Location -Path cert:\LocalMachine\Root;\
    Import-Certificate -Filepath "C:\\certs\\mycompany.org-cert1.cer";\
    Import-Certificate -Filepath "C:\\certs\\mycompany.org-cert2.cer";\
  Get-ChildItem;
...
Enter fullscreen mode Exit fullscreen mode

extending the startup

To add more processing to the app containers startup (which we will need later) the ENTRYPOINT passed down from the microsoft/aspnet:4.7.x image

...
ENTRYPOINT ["C:\\ServiceMonitor.exe", "w3svc"]
Enter fullscreen mode Exit fullscreen mode

can be replaced with an alternate entry script

...
    Set-ItemProperty 'IIS:\AppPools\DefaultAppPool\' -name autoStart -value 'true';

EXPOSE 80

ENTRYPOINT ["powershell.exe","C:\\entry.PS1"]
Enter fullscreen mode Exit fullscreen mode

which executes steps at start of the container:

...
# this is where the magic happens
...
C:\ServiceMonitor.exe w3svc
Enter fullscreen mode Exit fullscreen mode

wrapping it up

This is what a base image Dockerfile looked like

FROM microsoft/aspnet:4.7.1
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]

ADD Functions.Private.zip C:\\WebHost.zip

RUN Expand-Archive C:\WebHost.zip ; Remove-Item WebHost.zip

ADD host_secrets.json C:\\WebHost\\SiteExtensions\\Functions\\App_Data\\Secrets\\host.json

ADD entry.PS1 C:\\entry.PS1

ADD Certificates\\mycompany.org-cert1.cer C:\\certs\\mycompany.org-cert1.cer
ADD Certificates\\mycompany.org-cert2.cer C:\\certs\\mycompany.org-cert2.cer

RUN Set-Location -Path cert:\LocalMachine\Root;\
    Import-Certificate -Filepath "C:\\certs\\mycompany.org-cert1.cer";\
    Import-Certificate -Filepath "C:\\certs\\mycompany.org-cert2.cer";\
    Get-ChildItem;

RUN Import-Module WebAdministration;                                                        \
    $websitePath = 'C:\WebHost\SiteExtensions\Functions';                                   \
    Set-ItemProperty 'IIS:\Sites\Default Web Site\' -name physicalPath -value $websitePath; \
    Set-ItemProperty 'IIS:\Sites\Default Web Site\' -name serverAutoStart -value 'true';    \
    Set-ItemProperty 'IIS:\AppPools\DefaultAppPool\' -name autoStart -value 'true';

EXPOSE 80
Enter fullscreen mode Exit fullscreen mode

which can be referenced by App specific Dockerfile like:

FROM mycompanycr.azurecr.io/functions.webhost:1.0.11612
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]

COPY App.zip App.zip

RUN Expand-Archive App.zip ; \
    Remove-Item App.zip

SHELL ["cmd", "/S", "/C"]
ENV AzureWebJobsScriptRoot='C:\App'

WORKDIR App
Enter fullscreen mode Exit fullscreen mode

Each Azure Function host release is loaded into the container registry with a corresponding release tag. This allowed for operating Function apps with different (proven or preliminary) versions of the Function host.

MSI = managed service identity

Functions operated in App Service allow managed service identity to access secrets from KeyVault.

To achieve the same in our environment we first had to add managed service identity to Service Fabrics / VM scalesets.

Now entry.PS1 startup script introduced above can be used to add the route to the MSI endpoint and check it on container startup:

Write-Host "adding route for Managed Service Identity"
$gateway = (Get-NetRoute | Where-Object {$_.DestinationPrefix -eq '0.0.0.0/0'}).NextHop
$arguments = 'add','169.254.169.0','mask','255.255.255.0',$gateway
&'route' $arguments

$response = Invoke-WebRequest -Uri 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fvault.azure.net%2F' -Method GET -Headers @{Metadata="true"} -UseBasicParsing
Write-Host "MSI StatusCode :" $response.StatusCode

C:\ServiceMonitor.exe w3svc
Enter fullscreen mode Exit fullscreen mode

Chapter 1 - learnings

OK, problem solved.

But:

  • Windows Server Core images have ~6GB size - hence Service Fabric nodes need an awful amount of time to load new versions of these
  • time goes on

With Azure Functions v2 and .NET Core it is possible to have images dramatically reduced in size and host those on Linux.

Chapter 2 - Azure Functions / WebHost v2

In the previous chapter the why and how to get the now outdated Azure Functions v1 WebHost working inside containers into Service Fabric is explained.

For v2 there was more support for containerizing Azure Functions. There are already images on Docker Hub and base image samples on GitHub for Windows and Linux which provide what I had to figure out in v1 on my own.

But:

  • Nanoserver smalldisk images cannot are ... too small
  • no PowerShell in the Nanoserver microsoft/dotnet:2.1-aspnetcore-runtime-nanoserver-1803 base image - which I needed for a the tweaks I implemented in v1
  • PowerShell Core did not have the Certificate cmdlets yet. So how can I load my corporate certificates into the image?

solving the basic problems

increase Nanoserver smalldisk OS disk

The solution to this problem is described on Stack Overflow:

adding PowerShell Core

Take the sample Dockerfile mentioned above and implement a multi-stage build which then allows running the entry.PS1 script in PowerShell Core:

# escape=`

# --------------------------------------------------------------------------------
# PowerShell
FROM mcr.microsoft.com/powershell:nanoserver as ps
...
# --------------------------------------------------------------------------------
# Runtime image
FROM microsoft/dotnet:2.2-aspnetcore-runtime-nanoserver-1803

COPY --from=installer-env ["C:\\runtime", "C:\\runtime"]

COPY --from=ps ["C:\\Program Files\\PowerShell", "C:\\PowerShell"]
...
USER ContainerAdministrator
CMD ["C:\\PowerShell\\pwsh.exe","C:\\entry.PS1"]
Enter fullscreen mode Exit fullscreen mode

adding certoc.exe to install certificates

The same approach for the certificate installation: just borrow from another image

...
# --------------------------------------------------------------------------------
# Certificate Tool image
FROM microsoft/nanoserver:sac2016 as tool
...
ADD Certificates\\mycompany.org-cert1.cer C:\\certs\\mycompany.org-cert1.cer
ADD Certificates\\mycompany.org-cert2.cer C:\\certs\\mycompany.org-cert2.cer
ADD host_secret.json C:\\runtime\\Secrets\\host.json
ADD entry.PS1 C:\\entry.PS1

USER ContainerAdministrator
RUN icacls "c:\runtime\secrets" /t /grant Users:M
RUN certoc.exe -addstore root C:\\certs\\mycompany.org-cert1.cer
RUN certoc.exe -addstore root C:\\certs\\mycompany.org-cert2.cer
USER ContainerUser
...
Enter fullscreen mode Exit fullscreen mode

Significant changes from Windows Server Core 1803 and Nanoserver 1803 base images required also to switch user context for importing certificates.

handling secrets

As the Dockerfile sample above suggests, also directory ACL needed modification so that the host running in user context is able to write into the secrets folder.

check out Stack Overflow: System.UnauthorizedAccessException : Access to the path 'C:\runtime\Secrets\host.json' is denied in Azure Functions Windows container

MSI

Also the MSI part needed some tweaking after switching to Nanoserver and PowerShell Core. Until Get-NetRoute cmdlet is not available in PowerShell Core, this strange string pipelining exercise is required to extract the default gateway.

Write-Host "adding route for Managed Service Identity"
$gateway = (route print | ? {$_ -like "*0.0.0.0*0.0.0.0*"} | % {$_ -split " "} | ? {$_.trim() -ne "" } | ? {$_ -ne "0.0.0.0" })[0]
$arguments = 'add', '169.254.169.0', 'mask', '255.255.255.0', $gateway
&'route' $arguments

# --------------------------------------------------------------------------------
# test MSI access
$response = Invoke-WebRequest -Uri 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fvault.azure.net%2F' -Method GET -Headers @{Metadata = "true"} -UseBasicParsing
Write-Host "MSI StatusCode :" $response.StatusCode

# --------------------------------------------------------------------------------
# start Function Host
dotnet.exe C:\runtime\Microsoft.Azure.WebJobs.Script.WebHost.dll
Enter fullscreen mode Exit fullscreen mode

adding more functionality

@Microsoft.KeyVault(SecretUri=...) application settings

Azure Functions in App Service support the @Microsoft.KeyVault() syntax in application settings. To achieve the same with environment variables inside containers this script extension does the transformation:

...
# --------------------------------------------------------------------------------
# replace Environment Variables holding KeyVault URIs with actual values
$msiEndpoint = 'http://169.254.169.254/metadata/identity/oauth2/token'
$vaultTokenURI = 'https://vault.azure.net&api-version=2018-02-01'
$authenticationResult = Invoke-RestMethod -Method Get -Headers @{Metadata = "true"} -Uri ($msiEndpoint + '?resource=' + $vaultTokenURI)

if ($authenticationResult) {
    $requestHeader = @{Authorization = "Bearer $($authenticationResult.access_token)"}

    $regExpr = "^@Microsoft.KeyVault\(SecretUri=(.*)\)$"

    Get-ChildItem "ENV:*" |
        Where-Object {$_.Value -match $regExpr} |
        ForEach-Object {
        Write-Host "fetching secret for" $_.Key
        $kvUri = [Regex]::Match($_.Value, $regExpr).Groups[1].Value
        if (!$kvUri.Contains("?api-version")) {
            $kvUri += "?api-version=2016-10-01"
        }
        $creds = Invoke-RestMethod -Method GET -Uri $kvUri -ContentType 'application/json' -Headers $requestHeader
        if ($creds) {
            Write-Host "setting secret for" $_.Key
            [Environment]::SetEnvironmentVariable($_.Key, $creds.value, "Process")
        }
    }
}

# --------------------------------------------------------------------------------
# start Function Host
dotnet.exe C:\runtime\Microsoft.Azure.WebJobs.Script.WebHost.dll
Enter fullscreen mode Exit fullscreen mode

Chapter 3 - Solve the singleton mystery

While incrementally migrating Function Apps from v1 to v2 I realized, that all of a sudden the singleton execution of a timer triggered function did not work anymore with v2. In v1 you could just put the host id common for all instances of the same function app executed across multiple containers into host.json:

{
  "id": "4c45009422854e56a8a70567cd7219fe",
...
Enter fullscreen mode Exit fullscreen mode

WebHost instances then lock or synchronizes singleton executions over the shared storage (referenced by AzureWebJobsStorage).

The function - migrated to v2 - suddenly executed multiple times (in exactly the number of containers of the same function app) at the same interval which was definitely not the intended behavior. id specified in host.json did not seem to be relevant anymore.

Checking ScriptHostIdProvider in the v2 host I learned, that id can be set in an environment variable:

...
AzureFunctionsWebHost:hostid=4c45009422854e56a8a70567cd7219fe
...
Enter fullscreen mode Exit fullscreen mode

Usually the platform (Azure Functions / App Service) cares about setting this unique id. But when hosting the Functions runtime in multiple instances one has to take care of this.

Still the makers of the Functions runtime are not favor setting an explicit hostid

and for that issue a warning when the host starts up:

warn: Host.Startup[0]
      Host id explicitly set in configuration. This is not a recommended configuration and may lead to unexpected behavior.
info: Host.Startup[0]
      Starting Host (HostId=4c45009422854e56a8a70567cd7219fe, InstanceId=181fb9ee-be21-4c7e-bcf1-c325fce7532b, Version=2.0.12353.0, ProcessId=5804, AppDomainId=1, InDebugMode=False, InDiagnosticMode=False, FunctionsExtensionVersion=)
Enter fullscreen mode Exit fullscreen mode

Important: when sharing the same storage (referenced by AzureWebJobsStorage) among multiple function apps, the hostid has to be unique for each function app. Otherwise domestic function app locking and durable functions can get messed up.

Chapter 4 - ServiceBus NamespaceManager over MSI

As NamespaceManager is not supported anymore with the .NET Core compatible package Microsoft.Azure.ServiceBus (which is a dependency of Microsoft.Azure.WebJobs.Extensions.ServiceBus when using Service Bus within WebJobs or Functions), the package Microsoft.Azure.Management.ServiceBus.Fluent and affiliates have to be used.

This package supports a MSI based authentication and I can leverage the availability of MSI (which I described in chapter 2):

...
    // some magic that determines subscriptionId, resourceGroupName & sbNamespaceName
...
    var credentials = SdkContext.AzureCredentialsFactory.FromMSI(new MSILoginInformation(MSIResourceType.VirtualMachine), AzureEnvironment.AzureGlobalCloud);
    var azure = Azure
            .Configure()
            .WithLogLevel(HttpLoggingDelegatingHandler.Level.Basic)
            .Authenticate(credentials)
            .WithSubscription(subscriptionId);

    var sbNamespace = azure.ServiceBusNamespaces.GetByResourceGroup(resourceGroupName, sbNamespaceName);
    var queues = sbNamespace.Queues.List();
...
Enter fullscreen mode Exit fullscreen mode

The only thing left is to authorize the MSI created in the AAD for the clusters VM Scale Set on the Service Bus resource - e.g. granting a Reader role when as in my case only queue message count need to be retrieved.

Top comments (0)