DEV Community

Cover image for Investigating az-cli performance on the hosted Azure Pipelines and GitHub Runners
Jesse Houwing for Xebia Microsoft Services

Posted on • Originally published at jessehouwing.net on

Investigating az-cli performance on the hosted Azure Pipelines and GitHub Runners

I've been building a few more workflows and pipelines over the past few days and had been experimenting with the az-cli. And I've been running into all kinds of performance issues.

Azure CLI is a great nifty tool to chat to Azure as well as Azure DevOps and there's a AzureCLI@v2 task in Azure DevOps that preconfigures your Azure subscription and all.

While testing I got increasingly frustrated by how slow az is on GitHub Actions and Azure Pipelines hosted runners. But only on Windows. So, I've been digging into why.

Over the past week I've found quite a few issues, files a few issues and submitted a couple of pull requests. I'll walk you through my findings, they may help improve the performance of your workflows too!

Case matters

It turns out that the casing of your commands matters when it comes to az. You wouldn't expect it, az version will take about 12 seconds to run on the hosted runner including the overhead of starting a fresh powershell session.

But az Version will take about one minute. Why? Well, it turns out that az caches all the commands that are available from its many plugins, and that it looks up the command by key to see if it recognises what you've passed in. It performs that lookup case sensitive.

Issue filed:

command index lookup is case sensitive #27497

Describe the bug

When you run

az vErSiOn

a command index rebuild is triggered.

az version

does not trigger a command index rebuild

any command

Errors

cli.azure.cli.core: No module found from index for '['VeRSIoN', '--debug']'
cli.azure.cli.core: Loading all modules and extensions

Issue script & Debug output

Command

C:\Users\JesseHouwing>az VeRSIoN --debug

Output:

cli.knack.cli: Command arguments: ['VeRSIoN', '--debug']
cli.knack.cli: __init__ debug log:
Enable color in terminal.
Enable VT mode.
cli.knack.cli: Event: Cli.PreExecute []
cli.knack.cli: Event: CommandParser.OnGlobalArgumentsCreate [<function CLILogging.on_global_arguments at 0x01C3A4F0>, <function OutputProducer.on_global_arguments at 0x01E3B730>, <function CLIQuery.on_global_arguments at 0x01E58388>]
cli.knack.cli: Event: CommandInvoker.OnPreCommandTableCreate []
cli.azure.cli.core: No module found from index for '['VeRSIoN', '--debug']'
cli.azure.cli.core: Loading all modules and extensions
cli.azure.cli.core: Discovered command modules: ['acr', 'acs', 'advisor', 'ams', 'apim', 'appconfig', 'appservice', 'aro', 'backup', 'batch', 'batchai', 'billing', 'botservice', 'cdn', 'cloud', 'cognitiveservices', 'config', 'configure', 'consumption', 'container', 'containerapp', 'cosmosdb', 'databoxedge', 'dla', 'dls', 'dms', 'eventgrid', 'eventhubs', 'extension', 'feedback', 'find', 'hdinsight', 'identity', 'interactive', 'iot', 'keyvault', 'kusto', 'lab', 'managedservices', 'maps', 'marketplaceordering', 'monitor', 'mysql', 'netappfiles', 'network', 'policyinsights', 'privatedns', 'profile', 'rdbms', 'redis', 'relay', 'resource', 'role', 'search', 'security', 'servicebus', 'serviceconnector', 'servicefabric', 'signalr', 'sql', 'sqlvm', 'storage', 'synapse', 'util', 'vm']
cli.azure.cli.core: Loading command modules:
cli.azure.cli.core: Name                  Load Time    Groups  Commands
cli.azure.cli.core: acr                       0.142        34       144
cli.azure.cli.core: acs                       0.027         7        54
cli.azure.cli.core: advisor                   0.003         3         6
cli.azure.cli.core: ams                       0.019        22       100
cli.azure.cli.core: apim                      0.008        14        68
cli.azure.cli.core: appconfig                 0.004         9        47
cli.azure.cli.core: appservice                0.079        73       260
cli.azure.cli.core: aro                       0.014         1        10
cli.azure.cli.core: backup                    0.004        16        59
cli.azure.cli.core: batch                     0.032        34       102
cli.azure.cli.core: batchai                   0.004        10        30
cli.azure.cli.core: billing                   0.009        19        52
cli.azure.cli.core: botservice                0.004        12        42
cli.azure.cli.core: cdn                       0.006        39       133
cli.azure.cli.core: cloud                     0.002         1         7
cli.azure.cli.core: cognitiveservices         0.003        10        33
cli.azure.cli.core: config                    0.003         2         7
cli.azure.cli.core: configure                 0.002         2         5
cli.azure.cli.core: consumption               0.026         8         9
cli.azure.cli.core: container                 0.014         1        11
cli.azure.cli.core: containerapp              0.130        36       115
cli.azure.cli.core: cosmosdb                  0.016        58       192
cli.azure.cli.core: databoxedge               0.007         5        27
cli.azure.cli.core: dla                       0.004        23        62
cli.azure.cli.core: dls                       0.004         7        41
cli.azure.cli.core: dms                       0.003         3        22
cli.azure.cli.core: eventgrid                 0.005        25        96
cli.azure.cli.core: eventhubs                 0.018        13        20
cli.azure.cli.core: extension                 0.002         1         7
cli.azure.cli.core: feedback                  0.002         1         2
cli.azure.cli.core: find                      0.002         1         1
cli.azure.cli.core: hdinsight                 0.009         8        39
cli.azure.cli.core: identity                  0.004         2        11
cli.azure.cli.core: interactive               0.001         1         1
cli.azure.cli.core: iot                       0.096        19        82
cli.azure.cli.core: keyvault                  0.009        22       133
cli.azure.cli.core: kusto                     0.004         3        14
cli.azure.cli.core: lab                       0.004        11        34
cli.azure.cli.core: managedservices           0.003         3         8
cli.azure.cli.core: maps                      0.002         5        13
cli.azure.cli.core: marketplaceordering       0.006         1         2
cli.azure.cli.core: monitor                   0.307        21        67
cli.azure.cli.core: mysql                     0.104        14        49
cli.azure.cli.core: netappfiles               0.006        17        96
cli.azure.cli.core: network                   0.130       103       337
cli.azure.cli.core: policyinsights            0.016         9        17
cli.azure.cli.core: privatedns                0.023        14        60
cli.azure.cli.core: profile                   0.002         2         8
cli.azure.cli.core: rdbms                     0.024        44       185
cli.azure.cli.core: redis                     0.003         5        27
cli.azure.cli.core: relay                     0.034         7         8
cli.azure.cli.core: resource                  0.015        51       227
cli.azure.cli.core: role                      0.003        17        61
cli.azure.cli.core: search                    0.003         7        22
cli.azure.cli.core: security                  0.006        48       104
cli.azure.cli.core: servicebus                0.019        12        17
cli.azure.cli.core: serviceconnector          0.054        16       235
cli.azure.cli.core: servicefabric             0.018        27        76
cli.azure.cli.core: signalr                   0.003         8        30
cli.azure.cli.core: sql                       0.017        56       215
cli.azure.cli.core: sqlvm                     0.024         4        20
cli.azure.cli.core: storage                   0.039        58       272
cli.azure.cli.core: synapse                   0.015        54       246
cli.azure.cli.core: util                      0.002         3         7
cli.azure.cli.core: vm                        0.049        57       265
cli.azure.cli.core: Total (65)                1.653      1219      4752
cli.azure.cli.core: Loading extensions:
cli.azure.cli.core: Name                  Load Time    Groups  Commands  Directory
cli.azure.cli.core: azure-devops              0.150        60       192  C:\Users\JesseHouwing\.azure\cliextensions\azure-devops
cli.azure.cli.core: containerapp              0.015        31        95  C:\Users\JesseHouwing\.azure\cliextensions\containerapp
cli.azure.cli.core: init                      0.003         1         1  C:\Users\JesseHouwing\.azure\cliextensions\init
cli.azure.cli.core: Total (3)                 0.169        92       288
cli.azure.cli.core: Loaded 1266 groups, 4947 commands.
cli.azure.cli.core: Updated command index in 0.005 seconds.
cli.knack.cli: Event: CommandInvoker.OnPreCommandTableTruncate [<function AzCliLogging.init_command_file_logging at 0x040095C8>]
cli.azure.cli.core.azlogging: metadata file logging enabled - writing logs to 'C:\Users\JesseHouwing\.azure\commands\2023-09-28.20-42-39.VeRSIoN.11404.log'.
az_command_data_logger: command args: version --debug
cli.knack.cli: Event: CommandInvoker.OnPreArgumentLoad [<function register_global_subscription_argument.<locals>.add_subscription_parameter at 0x04036730>]
cli.knack.cli: Event: CommandInvoker.OnPostArgumentLoad []
cli.knack.cli: Event: CommandInvoker.OnPostCommandTableCreate [<function register_ids_argument.<locals>.add_ids_arguments at 0x040446A0>, <function register_cache_arguments.<locals>.add_cache_arguments at 0x04044898>]
cli.knack.cli: Event: CommandInvoker.OnCommandTableLoaded []
cli.knack.cli: Event: CommandInvoker.OnPreParseArgs [<function _documentdb_deprecate at 0x05179F58>]
cli.knack.cli: Event: CommandInvoker.OnPostParseArgs [<function OutputProducer.handle_output_argument at 0x01E3B778>, <function CLIQuery.handle_query_parameter at 0x01E583D0>, <function register_ids_argument.<locals>.parse_ids_arguments at 0x04044850>, <function handler at 0x052DA4F0>, <function DevCommandsLoader.post_parse_args at 0x07439D60>]
cli.knack.cli: Event: CommandInvoker.OnTransformResult [<function _resource_group_transform at 0x04032DF0>, <function _x509_from_base64_to_hex_transform at 0x04032E38>]
cli.knack.cli: Event: CommandInvoker.OnFilterResult []
{
  "azure-cli": "2.53.0",
  "azure-cli-core": "2.53.0",
  "azure-cli-telemetry": "1.1.0",
  "extensions": {
    "azure-devops": "0.26.0",
    "containerapp": "0.3.29",
    "init": "0.1.0"
  }
}
cli.knack.cli: Event: Cli.SuccessfulExecute []
cli.knack.cli: Event: Cli.PostExecute [<function AzCliLogging.deinit_cmd_metadata_logging at 0x040096E8>]
az_command_data_logger: exit code: 0
cli.__main__: Command ran in 2.197 seconds (init: 0.315, invoke: 1.882)
telemetry.main: Begin splitting cli events and extra events, total events: 1
telemetry.client: Accumulated 0 events. Flush the clients.
telemetry.main: Finish splitting cli events and extra events, cli events: 1
telemetry.save: Save telemetry record of length 3292 in cache
telemetry.main: Begin creating telemetry upload process.
telemetry.process: Creating upload process: "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\python.exe C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\Lib\site-packages\azure\cli\telemetry\__init__.pyc C:\Users\JesseHouwing\.azure"
telemetry.process: Return from creating process
telemetry.main: Finish creating telemetry upload process.

Expected behavior

The command is compared to the index case insensitive and the command index is not invalidated.

Environment Summary

C:\Users\JesseHouwing>az version { "azure-cli": "2.53.0", "azure-cli-core": "2.53.0", "azure-cli-telemetry": "1.1.0", "extensions": { "azure-devops": "0.26.0", "containerapp": "0.3.29", "init": "0.1.0" } }

Additional context

No response

For now, make sure you call az devops and not az DevOps.

Darn you autocorrect for correcting devops to DevOps all the time!

az devops login vs AZURE_DEVOPS_EXT_PAT

To authenticate to Azure DevOps, you have a few options to chose from. You can use AzureCLI@2 with an Azure service connection or add az devops login to your script or pass a token through the AZURE_DEVOPS_EXT_PAT environment variable.

Configuring your Azure Service Connection is the slowest of the bunch. Followed by az devops login where as the environment variable should be the fastest of the bunch.

But it turns out to be much more complex. That's because az devops login will try to install python's keyringpackage the first time it runs, and on the Hosted runners, the command has never been called before, thus adding 10-20s to the first call.

Collaborated on a pull request to force keyring installation during image creation:

[windows] warmup "az devops" for the first run #8410

as Jesse Houwing has found "az devops" warms up during first run which takes ~40 sec.

warm up is done by running "az devops --help" with config redirected to persistent "C:" location, also keyring installation requires "fake" login

original PR: https://github.com/actions/runner-images/pull/8294

Description

warming up "az devops" first invocation. due to security reasons we cannot accept original PR: https://github.com/actions/runner-images/pull/8294

https://github.com/actions/runner-images/issues/8296

Check list

  • [x] Related issue / work item is attached
  • [ ] Tests are written (if applicable)
  • [ ] Documentation is updated (if applicable)
  • [ ] Changes are tested and related VM images are successfully generated

And filed an issue to make sure this package is included in az devops by default:

[Feature Request] Install keyring by default or offer a way to install keyring without causing error messages #1367

Is your feature request related to a problem? Please describe.

When calling az devops login for the very first time, it installs pythons keyring package. For some reason that's not included in the default setup. This adds needless time to a first invocation of az devops login on the GitHub Hosted Runners and Azure Pipelines.

You can force the installation of keyring using

echo "dummy" | az devops login
az devops logout

On the GitHub/Azure Pipelines hosted runner the provisioning of the image happens on a non-interactive session and az devops login fails with an error along those lines.

ERROR: (1312, 'CredRead', 'A specified logon session does not exist. It may already have been terminated')

Unfortunatly, az devops doesn't provide a clean way to install keyring during installation or through a command other than catching the failed login.

Describe the solution you'd like Either azdevops should preinstall keyring during normal installation of the extension. It's unclear why it's not included in the first place.

Or there should be a clean way to run az devops install-dependencies or something to have it download its dependencies in a provisioning scenario.

I grabbed the following lines from the code:

$pythonPath = resolve-path (join-path ([System.IO.Path]::GetDirectoryName((Get-Command az).Path)) "..\python.exe")
Invoke-ValidateCommand -Command "& '$pythonPath' -m pip install keyring~=17.1.1 --target '$env:AZURE_EXTENSION_DIR\azure-devops' -vv --disable-pip-version-check --no-cache-dir"  

and it works. But since there is no way to grab the correct version of keyring from anywhere other than the sourcecode of az devops, it's a very brittle contract.

For now we're looking into just swallowing any message printed by az devops login. But that's of course also a brittle contract.

cli.azext_devops.dev.common.pip_helper: installing keyring~=17.1.1
cli.azure.cli.core.extension.operations: Running: ['C:\\Program Files\\Microsoft SDKs\\Azure\\CLI2\\python.exe', '-m', 'pip', 'install', 'keyring~=17.1.1', '--target', 'C:\\Users\\WDAGUtilityAccount\\.azure\\cliextensions\\azure-devops', '-vv', '--disable-pip-version-check', '--no-cache-dir']

Additional context

I know I could also set an environment variable with an access token instead of calling az devops login, unfortunately that also adds a 10 second delay on the GitHub Hosted Runner / Azure Pipelines runner as it tries to run az login, sees it fail and only then reverts to using the token supplied on the commandline.

So you might think that setting AZURE_DEVOPS_EXT_PAT will be the fastest, but what happens under the hood might be surprising: az devops will first try to run az login and az devops login before attempting to use the environment variable, even if you explicitly pass it in. There should be a way to force the configuration used.

Filed an issue:

az cli seems to spend 10 seconds looking for account config, even if it doesn't need them #27439

Describe the bug

While looking into what causes the az-cli to load slowly on the hosted agents of GitHub Actions and Azure pipelines I noticed that az tries to look for accounts and throws a couple of exceptions even if the account isn't needed.

In this case the access token to access Azure DevOps is supplied through the environment, yet az is trying to load accounts and spends unneeded time there.

It's hard to tell from the logs, bet between the last logged line and the error handling there is a jump of about 10 seconds in the Azure pipelines log. Would be lovely if that time waste could be prevented.

Az DevOps pipelines show ...

Errors

2023-09-15T18:37:12.7002991Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreCommandTableTruncate [<function AzCliLogging.init_command_file_logging at 0x0000025BD8AEC430>] 2023-09-15T18:37:12.7003914Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreArgumentLoad [<function register_global_subscription_argument..add_subscription_parameter at 0x0000025BD8AEF0A0>] 2023-09-15T18:37:22.0011676Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostArgumentLoad [] 2023-09-15T18:37:22.0012102Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostCommandTableCreate [<function register_ids_argument..add_ids_arguments at 0x0000025BD8B51090>, <function register_cache_arguments..add_cache_arguments at 0x0000025BD8B511B0>] 2023-09-15T18:37:22.0026227Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnCommandTableLoaded [] 2023-09-15T18:37:22.0026678Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreParseArgs [] 2023-09-15T18:37:22.3327775Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostParseArgs [<function OutputProducer.handle_output_argument at 0x0000025BD63AFF40>, <function CLIQuery.handle_query_parameter at 0x0000025BD63F92D0>, <function register_ids_argument..parse_ids_arguments at 0x0000025BD8B51120>, <function DevCommandsLoader.post_parse_args at 0x0000025BD8B72B90>] 2023-09-15T18:37:22.3348603Z DEBUG: cli.azext_devops.dev.common.services: PAT is present which can be used against this instance 2023-09-15T18:37:22.3375557Z DEBUG: cli.azure.cli.core: Current cloud config: 2023-09-15T18:37:22.3375819Z AzureCloud 2023-09-15T18:37:22.3376906Z DEBUG: cli.azext_devops.dev.common.services: az login is not present 2023-09-15T18:37:22.3382543Z DEBUG: cli.azext_devops.dev.common.services: There are no active accounts. 2023-09-15T18:37:22.3382797Z Traceback (most recent call last): 2023-09-15T18:37:22.3383541Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 537, in get_current_account_user 2023-09-15T18:37:22.3384147Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 546, in get_subscription 2023-09-15T18:37:22.3384535Z knack.util.CLIError: Please run 'az login' to setup account. 2023-09-15T18:37:22.3384768Z 2023-09-15T18:37:22.3385009Z During handling of the above exception, another exception occurred: 2023-09-15T18:37:22.3385200Z 2023-09-15T18:37:22.3385466Z Traceback (most recent call last): 2023-09-15T18:37:22.3385884Z File "C:\Program Files\Common Files\AzureCliExtensionDirectory\azure-devops\azext_devops\dev\common\services.py", line 61, in _get_credentials 2023-09-15T18:37:22.3386261Z token_from_az_login = get_token_from_az_logins(organization, pat_token_present) 2023-09-15T18:37:22.3386725Z File "C:\Program Files\Common Files\AzureCliExtensionDirectory\azure-devops\azext_devops\dev\common\services.py", line 118, in get_token_from_az_logins 2023-09-15T18:37:22.3387095Z dummy_user = profile.get_current_account_user() # noqa: F841 2023-09-15T18:37:22.3387524Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 539, in get_current_account_user 2023-09-15T18:37:22.3387878Z knack.util.CLIError: There are no active accounts. 2023-09-15T18:37:22.3388195Z INFO: cli.azext_devops.dev.common.services: received PAT from environment variable 2023-09-15T18:37:22.3388587Z INFO: cli.azext_devops.dev.common.services: Creating connection with personal access token.

Issue script & Debug output

2023-09-15T18:37:12.7002991Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreCommandTableTruncate [<function AzCliLogging.init_command_file_logging at 0x0000025BD8AEC430>] 2023-09-15T18:37:12.7003914Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreArgumentLoad [<function register_global_subscription_argument..add_subscription_parameter at 0x0000025BD8AEF0A0>] 2023-09-15T18:37:22.0011676Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostArgumentLoad [] 2023-09-15T18:37:22.0012102Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostCommandTableCreate [<function register_ids_argument..add_ids_arguments at 0x0000025BD8B51090>, <function register_cache_arguments..add_cache_arguments at 0x0000025BD8B511B0>] 2023-09-15T18:37:22.0026227Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnCommandTableLoaded [] 2023-09-15T18:37:22.0026678Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreParseArgs [] 2023-09-15T18:37:22.3327775Z DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostParseArgs [<function OutputProducer.handle_output_argument at 0x0000025BD63AFF40>, <function CLIQuery.handle_query_parameter at 0x0000025BD63F92D0>, <function register_ids_argument..parse_ids_arguments at 0x0000025BD8B51120>, <function DevCommandsLoader.post_parse_args at 0x0000025BD8B72B90>] 2023-09-15T18:37:22.3348603Z DEBUG: cli.azext_devops.dev.common.services: PAT is present which can be used against this instance 2023-09-15T18:37:22.3375557Z DEBUG: cli.azure.cli.core: Current cloud config: 2023-09-15T18:37:22.3375819Z AzureCloud 2023-09-15T18:37:22.3376906Z DEBUG: cli.azext_devops.dev.common.services: az login is not present 2023-09-15T18:37:22.3382543Z DEBUG: cli.azext_devops.dev.common.services: There are no active accounts. 2023-09-15T18:37:22.3382797Z Traceback (most recent call last): 2023-09-15T18:37:22.3383541Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 537, in get_current_account_user 2023-09-15T18:37:22.3384147Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 546, in get_subscription 2023-09-15T18:37:22.3384535Z knack.util.CLIError: Please run 'az login' to setup account. 2023-09-15T18:37:22.3384768Z 2023-09-15T18:37:22.3385009Z During handling of the above exception, another exception occurred: 2023-09-15T18:37:22.3385200Z 2023-09-15T18:37:22.3385466Z Traceback (most recent call last): 2023-09-15T18:37:22.3385884Z File "C:\Program Files\Common Files\AzureCliExtensionDirectory\azure-devops\azext_devops\dev\common\services.py", line 61, in _get_credentials 2023-09-15T18:37:22.3386261Z token_from_az_login = get_token_from_az_logins(organization, pat_token_present) 2023-09-15T18:37:22.3386725Z File "C:\Program Files\Common Files\AzureCliExtensionDirectory\azure-devops\azext_devops\dev\common\services.py", line 118, in get_token_from_az_logins 2023-09-15T18:37:22.3387095Z dummy_user = profile.get_current_account_user() # noqa: F841 2023-09-15T18:37:22.3387524Z File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/_profile.py", line 539, in get_current_account_user 2023-09-15T18:37:22.3387878Z knack.util.CLIError: There are no active accounts. 2023-09-15T18:37:22.3388195Z INFO: cli.azext_devops.dev.common.services: received PAT from environment variable 2023-09-15T18:37:22.3388587Z INFO: cli.azext_devops.dev.common.services: Creating connection with personal access token.

Expected behavior

Account info is only loaded when needed. Lack of account data doesn't result in a swallowed error.

Environment Summary

azure-cli 2.52.0

core 2.52.0 telemetry 1.1.0

Extensions: azure-devops 0.26.0

Dependencies: msal 1.24.0b1 azure-mgmt-resource 23.1.0b2

Python location 'C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe' Extensions directory 'C:\Program Files\Common Files\AzureCliExtensionDirectory'

Python (Windows) 3.10.10 (tags/v3.10.10:aad5f6a, Feb 7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)]

Additional context

https://github.com/actions/runner-images/pull/8294

First run of az on the hosted runner always causes a rebuild of the command index

Similar to the case matters issue above, the first time az is invoked on the hosted runners a command index rebuild is triggered. Why? Because the command index doesn't exist in the user profile of the user that runs your workflow. These files are normally stored in your profile directory under ~/.azure and ~/.azure-devops, but since your workflow runs in a pristine user profile, these cached files are gone.

Submitted a pull request to move these folders to a persistent location:

Runs az and az-devops during warmup #8294

Description

Speed up performance of az and az devops. In my own experiments savings are sometimes more than 60 seconds.

  • This is done by changing the default settings of az based on the recommended settings from az init for automation workflows:

    • Enables the settings recommended by az init for automation workflows.
    • Disables auto-upgrade
    • Disables colors
    • Disables surveys
    • Disables progress bars
    • Disables recommendations
  • It invokes az and az devops during the image build to ensure Python pre-caches its objects.

  • It moves the .azure and .azure-devops and .azure-devops/cache folders into the CommonFiles directory similarly to how the runner already moved installed extensions to the CommonFiles directory.

First call performance, especially of az devops commands will still be slower than subsequent calls due to the fact that az devops fetches and caches information about the Azure DevOps Organization that it connects to. These calls still add about a 15s overhead. Users that want to speed up az devops further could consider caching the $env:AZURE_DEVOPS_CACHE_DIR.

Links to sources used to make this pull request:

Technically changing any default configuration settings can be breaking. My personal view on these settings:

  • az config set auto-upgrade.enable=false - should not break anyone. The hosted image is updated regularly enough that it should not be needed to auto-upgrade az-cli during a workflow/pipeline run.
  • az config set core.error_recommendation=off - should not break anyone. Worst case it makes debugging scripts a little harder, though running the same script locally should still provide recommendations.
  • az config set core.disable_progress_bar=true - progress bars don't work in the hosted runner anyway.
  • az config set core.no_color=true - will make the logs less pretty, but also fixes issues where certain ascii colors aren't coming through correctly. Should not really break anyone.
  • az config set core.survey_message=false - This setting suppresses the survey links that the CLI sometimes outputs, they have no place in a pipeline or workflow anyway.

I had already removed the settings from the suggestions made by az init and left them on default:

  • az config set logging.enable_log_file=no - I personally don't rely on the log file, I suppose someone could. This might be the most breaking change in the list.. There is anecdotal evidence in other github issues that suggests file logging slows down az-cli. I'd prefer to make it opt in. The less the agent does by default the better.
  • az config set only_show_errors=true - technically not a breaking change and making the log less chatty. I removed this cause I feel it makes it harder to see what the cli is and isn't doing. Ideally this flag would be set based on the diagnostic settings of the pipeline itself.
  • az config set collect_telemetry=false - though there is a lot of anecdotal evidence that telemetry slows down az or some of its extensions, I imagine the cli team would like to get telemetry from the hosted runners.

People who want to save a little more time can opt to add these settings to their pipeline/workflow.

Fixes: #8296?

Check list

  • [x] Related issue / work item is attached
  • [ ] Tests are written (if applicable)
  • [ ] Documentation is updated (if applicable)
  • [x] Changes are tested and related VM images are successfully generated

First run of az on hosted runner causes python compilation

Because the image generation code never actually runs az and az devops and because the ~/.azure directory and ~/.azure-devops aren't made available to the user that runs the workflow; the first run compiles several python modules and caches the result.

This is also fixed by setting environment variables and calling az --help and az devops --help during image generation.

Submitted a pull request to move these folders to a persistent location and warm-up az and az devops:

%{ embed https://github.com/actions/runner-images/pull/8294 %}

Some of the data that az devops needs isn't stored under ~/.azure, but has its own config path under ~/.azure-devops. While trying to redirect that folder and the underlying ./python/cache folder I found out that not all the ways you can redirect these folders have the exact same effect, because the cache folder location is computed in different ways.

So, I filed another issue:

Bug: the azure-devops cache folder location is computed differently in different places #1366

The Azure DevOps CLI has a way to redirect the config and the cache directory to a different location.

In most code paths it's enough to only redirect the CONFIG directory and in that case the cache folder will automatically be placed underneath the config folder.

But there are a few places in which the cache folder is placed under the home directory instead.

Under home https://github.com/Azure/azure-devops-cli-extension/blob/bd34a6fd0658a15dadf6c09c7f6217ca5ffa662b/azure-devops/azext_devops/devops_sdk/_file_cache.py#L114

Under config https://github.com/Azure/azure-devops-cli-extension/blob/bd34a6fd0658a15dadf6c09c7f6217ca5ffa662b/azure-devops/azext_devops/dev/common/file_cache.py#L15

I'd expect a single consistent behavior.

https://github.com/search?q=repo%3AAzure%2Fazure-devops-cli-extension+os.getenv&type=code

Default configuration of az

The default configuration of az leaves several settings enabled or unconfigured:

  • Telemetry
  • Disk logging
  • Progress bars
  • Console colors
  • Survey prompts
  • Auto-upgrade

There is a pretty unknown az init command which will prompt whether you want to run in an interactive or an automation environment and it forces a number of these settings to off.

So, I filed a pull request to do the same thing on the Hosted Runner:

Runs az and az-devops during warmup #8294

Description

Speed up performance of az and az devops. In my own experiments savings are sometimes more than 60 seconds.

  • This is done by changing the default settings of az based on the recommended settings from az init for automation workflows:

    • Enables the settings recommended by az init for automation workflows.
    • Disables auto-upgrade
    • Disables colors
    • Disables surveys
    • Disables progress bars
    • Disables recommendations
  • It invokes az and az devops during the image build to ensure Python pre-caches its objects.

  • It moves the .azure and .azure-devops and .azure-devops/cache folders into the CommonFiles directory similarly to how the runner already moved installed extensions to the CommonFiles directory.

First call performance, especially of az devops commands will still be slower than subsequent calls due to the fact that az devops fetches and caches information about the Azure DevOps Organization that it connects to. These calls still add about a 15s overhead. Users that want to speed up az devops further could consider caching the $env:AZURE_DEVOPS_CACHE_DIR.

Links to sources used to make this pull request:

Technically changing any default configuration settings can be breaking. My personal view on these settings:

  • az config set auto-upgrade.enable=false - should not break anyone. The hosted image is updated regularly enough that it should not be needed to auto-upgrade az-cli during a workflow/pipeline run.
  • az config set core.error_recommendation=off - should not break anyone. Worst case it makes debugging scripts a little harder, though running the same script locally should still provide recommendations.
  • az config set core.disable_progress_bar=true - progress bars don't work in the hosted runner anyway.
  • az config set core.no_color=true - will make the logs less pretty, but also fixes issues where certain ascii colors aren't coming through correctly. Should not really break anyone.
  • az config set core.survey_message=false - This setting suppresses the survey links that the CLI sometimes outputs, they have no place in a pipeline or workflow anyway.

I had already removed the settings from the suggestions made by az init and left them on default:

  • az config set logging.enable_log_file=no - I personally don't rely on the log file, I suppose someone could. This might be the most breaking change in the list.. There is anecdotal evidence in other github issues that suggests file logging slows down az-cli. I'd prefer to make it opt in. The less the agent does by default the better.
  • az config set only_show_errors=true - technically not a breaking change and making the log less chatty. I removed this cause I feel it makes it harder to see what the cli is and isn't doing. Ideally this flag would be set based on the diagnostic settings of the pipeline itself.
  • az config set collect_telemetry=false - though there is a lot of anecdotal evidence that telemetry slows down az or some of its extensions, I imagine the cli team would like to get telemetry from the hosted runners.

People who want to save a little more time can opt to add these settings to their pipeline/workflow.

Fixes: #8296?

Check list

  • [x] Related issue / work item is attached
  • [ ] Tests are written (if applicable)
  • [ ] Documentation is updated (if applicable)
  • [x] Changes are tested and related VM images are successfully generated

These changes are currently under consideration as they don't have the same impact as the other changes mentioned above ( some could be breaking ). You could set these in your own workflows if you'd like.

And filed a pull request to turn off auto-upgrade in az init as well:

configures survey messages and telemetry #6793

This checklist is used to make sure that common guidelines for a pull request are followed.

Adds settings for [core.survey_message](https://github.com/Azure/azure-cli/issues/25242) and core.collect_telemetry to az init. Adds the following defaults:

  • Manual Mode
    • Survey=on
    • Telemetry=on
    • Auto-update=on
  • Automation Mode
    • Survey=off
    • Telemetry=off
    • Auto-update=off

General Guidelines

  • [ ] Have you run azdev style <YOUR_EXT> locally? (pip install azdev required)
  • [x] Have you run python scripts/ci/test_index.py -q locally?

For new extensions:

About Extension Publish

There is a pipeline to automatically build, upload and publish extension wheels.
Once your pull request is merged into main branch, a new pull request will be created to update src/index.json automatically.
You only need to update the version information in file setup.py and historical information in file HISTORY.rst in your PR but do not modify src/index.json.

AzureCLI@2 does a version update check

When you use the AzureCLI@2 task, the first thing it does is running

az --version
Enter fullscreen mode Exit fullscreen mode

Which in turn performs and update check and caches the results, but as you can guess by now, that cached result isn't available in the pristine profile of the user running your workflow.

I feel that AzureCLI@2 has no business doing an update check anyway and it turns out that there is a second command az version (note the lack of --). It mostly does the same thing but skips the update check.

Filed another pull request:

Replaces `az --version` with `az version` #19039

az version doesn't perform an update check and is faster.

Fixes: #19006

Task name: AzureCLI

Description: Replaces the call to az --version with az version.

Documentation changes required: N

Added unit tests: N

Attached related issue: #19006

Checklist:

  • [x] Task version was bumped - please check instruction how to do it
  • [x] Checked that applied changes work as expected

And while it's nice that the alternative command exists, it would seem better to me to make the version-check configurable:

`az --version` performs an update check #27437

Describe the bug

When running az --version an update check is performed against az-cli and the extensions.

Azure Pipelines' AzureCLI task runs az --version to capture the version details in the logs.

It would save quite a bit of CPU if az-cli would not perform an update check in a CI/CD scenario.

There already is an core.auto-upgrade=false option, would it not make sense to piggy back on that or to add another config setting to turn off update checking.

az --version

Errors

N/a

Issue script & Debug output

N/a

Expected behavior

No update check is performed in CI.

Environment Summary

2.52.0

Additional context

No response

Auto-detecting your Azure DevOps collection and Project is slower

Many az devops commands need to know the location of your Azure DevOps organization and the name of the project you're connecting to. If you don't explicitly pass these in, az devops will try to run git remote --verbose to look up the details. This adds another little bit of overhead.

Since Azure Pipelines already knows the organization and the project and sets environment variables for you, you might as well pass them in explicitly:

- pwsh: |
    az pipelines runs show --id $env:BUILD_BUILDID --query "definition.id" `
      --organization $env:SYSTEM_COLLECTIONURI `
      --project $env:SYSTEM_TEAMPROJECT
Enter fullscreen mode Exit fullscreen mode

It would be nice if auto-detect would use these environment variables if they are available. Haven't filed an issue for that yet.

On Azure Pipelines everything runs inside PowerShell 5.1

In Azure Pipelines everything except Node based tasks run inside PowerShell 5.1. Even a Windows Shell task (CmdLine@2) runs inside of PowerShell 5.1:

Starting: CmdLine
==============================================================================
Task : Command line
Description : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
Version : 2.212.0
Author : Microsoft Corporation
Help : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
==============================================================================
##[debug]VstsTaskSdk 0.9.0 commit 6c48b16164b9a1c9548776ad2062dad5cd543352
##[debug]Entering D:\a\_tasks\CmdLine_d9bafed4-0b18-4f58-968d-86655b4d2ce9\2.212.0\cmdline.ps1.
========================== Starting Command Output ===========================
##[debug]Entering Invoke-VstsTool.
##[debug] Arguments: '/D /E:ON /V:OFF /S /C "CALL "D:\a\_temp\5cf768ae-b47d-41b4-a176-02a9957e0e28.cmd""'
##[debug] FileName: 'C:\Windows\system32\cmd.exe'
##[debug] WorkingDirectory: 'D:\a\1\s'
Enter fullscreen mode Exit fullscreen mode

Unlike GitHub Actions, even pwsh runs inside of PowerShell 5.1 in Azure Pipelines. This makes script: and pwsh: and a tiny bit slower on Azure Pipelines than using powershell:

But under certain conditions (like the one below), this tiny bit slower can add a quite noticeable 6s to 10s additional delay.

I requested that a proper PowerShell Core Execution handler is added to the Azure Pipelines Agent.

[enhancement]: Add PowerShell Core 7 execution handler #4455

Describe your feature request here

All tasks that use the PowerShell and PowerShell3 execution handler rely on PowerShell 5.1 as the main execution engine of the task. Any task that relies on PowerShell Core must call PowerShell Core either from PowerShell 5.1 or from Node.

Back in 2015 when the agent was first introduced and PowerShell Core was mostly a gimmick, it made sense, maintaining 2 different but mostly equal execution handlers only add maintenance overhead.

Fast-forward 8 years and PowerShell Core is now the gold standard. It's faster, cross platform and most new features come to it, including fancy language features. Most of all its cross platform and it doesn't depend on 4MB of random node modules.

It's time to introduce a proper PowerShellCore execution handler that doesn't rely on PowerShell 5.1, that has its own VstsTaskSdk and that runs cross-platform.

Why?

  • It's faster. In the past days I've been looking at hosted agent performance and pwsh: as well as AzureCLI and AzurePowerShell and PowerShell tasks configured to use PowerShell Core add a few seconds overhead to each task execution.
  • It's cross platform. I'm no star in Bash nor a hero in Typescript, but I know my way around the CLR and have written quite a bit of PowerShell, let me take my skills to Linux and Mac as well.
  • It's more secure than Node based Tasks. Node based tasks easily carry 100s of additional Node Packages to do things that are built-in to PowerShell Core. XML parsing? Json parsing? Zipping archives, XSLT, JsonPath, REST API calls, you name it. In my research I can't find any PowerShell based extension with security vulns, yet at the moment there is no Node based tasks that doesn't trigger at least 2 critical warnings.
  • It's where the current state of the market is at. Even PowerShell itself tries to tell me every time I accidentally open the wrong shell: Install the latest PowerShell for new features and improvements! https://aka.ms/PSWindows.
  • GitHub Actions already has support for PowerShell core as default script handler and you can use PowerShell Core in composite actions to build your tasks. Having the same option in Azure Pipelines would make it easier to port tasks between the wo.

Windows might be trying to be helpful

While chatting with an engineer on the actions/runner-images team, we talked about a few issues that have been plaguing the hosted runner recently. It turns out that Windows Server has added a few more ways to ensure that Windows Update always runs and to run a few jobs to keep Windows performing at its best.

The hosted runner images are updated very frequently and when the image runs, it only needs to run one job before it's deleted. As such, these services are disabled during the generation of the Windows runner images.

Until recently that is.

A change in Windows has caused the "Storage Service" and the "Windows Update Service" to be re-enabled and restarted as soon as the image boots up. And if you're unlucky they slow down the hosted runner while doing their intended job.

A new image will be rolling out shortly, one that again turns off these services , this time a little more forceful:

Agents very slow #8380

Description

Since September 21 agents are very slow. Even simple tasks that usually take 1-2 seconds now take 25-30 seconds. A comparison of some tasks from two runs can be seen below: image On the left side we have a run before September 21 (simple tasks taking 1-2 seconds to complete), on the right side we have the same tasks running on September 25 (same tasks taking 25-30 seconds to complete.

Platforms affected

  • [X] Azure DevOps
  • [ ] GitHub Actions - Standard Runners
  • [ ] GitHub Actions - Larger Runners

Runner images affected

  • [ ] Ubuntu 20.04
  • [ ] Ubuntu 22.04
  • [ ] macOS 11
  • [ ] macOS 12
  • [ ] macOS 13
  • [X] Windows Server 2019
  • [X] Windows Server 2022

Issue started from version 20230918.1.0

Is it regression?

20230912.1.0

Expected behavior

Tasks are fast to execute

Actual behavior

Task execution is very slow

Repro steps

[windows] disable StorSvc for runners #8388

Description

StorSvc service wakes up in 700-800 sec and scans all files, which affects runner a lot. Let us try to disable it

Check list

  • [ ] Related issue / work item is attached
  • [ ] Tests are written (if applicable)
  • [ ] Documentation is updated (if applicable)
  • [ ] Changes are tested and related VM images are successfully generated

In the meantime, you can use this snippet in your workflow to turn these services back off:

Azure Pipelines:

# 🔁 Azure Pipelines
steps: 
  - pwsh: |
      $services = @("WaasMedicSvc", "StorSvc", "wuauserv")

      $services | %{
        $action = "''/30000/''/30000/''/30000"
        $output = sc.exe failure $_ actions=$action reset=4000

        stop-service $_ -force
        if ($_ -ne "WaasMedicSvc")
        {
          Set-Service -StartupType Disabled $_
        }

        if ((get-service $_).Status -ne "Stopped")
        {
          $id = Get-CimInstance -Class Win32_Service -Filter "Name LIKE '$_'" | 
          Select-Object -ExpandProperty ProcessId
          $process = Get-Process -Id $id
          $process.Kill()
        }
      }

      $services | %{
        write-host $_ $((get-service $_).Status)
      }

  - checkout: self
Enter fullscreen mode Exit fullscreen mode

GitHub Actions:

# ▶️ GitHub Actions
steps: 
  - script: |
      $services = @("WaasMedicSvc", "StorSvc", "wuauserv")

      $services | %{
        $action = "''/30000/''/30000/''/30000"
        $output = sc.exe failure $_ actions=$action reset=4000

        stop-service $_ -force
        if ($_ -ne "WaasMedicSvc")
        {
          Set-Service -StartupType Disabled $_
        }

        if ((get-service $_).Status -ne "Stopped")
        {
          $id = Get-CimInstance -Class Win32_Service -Filter "Name LIKE '$_'" | 
          Select-Object -ExpandProperty ProcessId
          $process = Get-Process -Id $id
          $process.Kill()
        }
      }

      $services | %{
        write-host $_ $((get-service $_).Status)
      }

  - uses: actions/checkout@v4
Enter fullscreen mode Exit fullscreen mode

Some people might blame Microsoft in general for these issues, but of course this is an interaction caused by changes made by 2 teams with hugely different goals. Where the Windows team is trying to make Windows as secure and up to date as possible, the Runners team is trying to create an image of Windows that is as lean as it can be, up-to-date, secure, yet can run 100s of different tools ranging from .NET 4 to .NET 8 and from Python to Rust as well as Node and PHP. Strange interactions are almost impossible to prevent.

az devops caches instance level data

Whenever you connect to an Azure DevOps Organization, az devops will make 2 calls:

OPTIONS /{{organization}/_apis
GET /{organization}/_apis/ResourceAreas
Enter fullscreen mode Exit fullscreen mode

These calls return all the endpoints and their calling convention. But these are unlikely to change often. I've seen these calls adding 10-30s to the first call to az devops that make an API call. So, I went looking for ways to cache the data.

I haven't found the ideal solution yet, but this is a working prototype that will make az devops reuse the data from a previous workflow run:

# ⚠️ WARNING work in progress
steps:
    - task: Cache@2
      inputs:
        key: '".az-devops"'
        path: 'C:\Users\VssAdministrator\.azure-devops'
      displayName: "Restore .azure-devops"

    - pwsh: |
        (dir C:\Users\VssAdministrator\.azure-devops\python-sdk\cache\*.json) | %{$_.LastWriteTime = Get-Date} 
      displayName: "Reset last modified on .azure-devops/cache"
Enter fullscreen mode Exit fullscreen mode

Unfortunately, it doesn't work under all circumstances yet as the files may not always be there. And the pull request that will change the location of the .azure-devops folder, will change the paths in the snippet above.

It might be worth creating a custom task that caches these files. Unfortunately, custom tasks can't seem to access the cache in Azure Pipelines (or at least not through a mechanism in the agent or task-lib)

I've filed an issue to allow tasks to cache data:

[enhancement]: Add support for caching from an arbitrary task #4440

Describe your feature request here

For my azure pipelines extension, I'd like to be able to cache a few files to speed up future executions.

Right now, that's only possible if the user adds a Cache@2 task to their pipeline where needed preceded by a script step to calculate cache keys and decide whether to run the step or not.

This logic is MUCH easier to do inside my own task, as I'd be able to handle everything from there.

My proposal is to add a tasklib function similar to the ones used by GitHub Action's setup-node step.

Ideally it would use the same efficient mechanism as the current Cache steps, but would allow me to have influence over whether to overwrite the cache at the end of the job or not.

I'd use this to also cache files that can't compute the cache key based on the contents of the repo. Instead, I'd use the return value of an API call or check whether the files to cache have changed since the last run.

Conclusion

You can see results of all these changes below:

Investigating az-cli performance on the hosted Azure Pipelines and GitHub Runners
_Compare the execution time with the cached files (-on) and without them (-off). _

The combined set of changes reduces the time it takes to run az for the first time in a workflow by about 1 minute. Even with the added overhead of setting up the user profile the whole workflow is faster.

Some of these issues aren't exclusive to Windows either. The case sensitivity issue, keyring installation, Azure CLI's default settings and auto-detection of Azure DevOps Organization apply to Ubuntu and MacOS runners too.

If you want to tinker a bit with these things yourself, here is an Azure Pipelines file that I used to test most of these scenarios.

The Hosted Runner is a complex thing. It's built for convenience and has 100s of tools preinstalled on it. Many of these tools have funny interactions with each other. All of those are running an operating system that is also constantly changing.

Many of the tools we rely on to develop and deploy our code are flexible, extensible, and configurable and aren't always designed to run on a temporary server that executes the tool just once.

At the same time, az must run thousands, if not millions of times each day on the Windows Hosted runners and the issues above cause it to regularly run more than a minute longer than needed. I'm hoping that the work I've put in this week will not only speed up your workflows, but will save us all money. And - above all - it will hopefully reduce carbon emissions and water usage.

This investigation dug mostly into a single tool and its extensions. There may be an unknown number of similar issues lurking in the set of tools you rely on. Please do take the time to investigate and file issues, submit pull requests, and help improve these systems for all of us.

A huge shoutout to ilia-shipitsin for looking into these issues and testing them on actual hosted runners.

If these fixes are making your pipelines a little faster, consider sponsoring me to allow me to do these kinds of things more often.

Top comments (0)