Continuous benchmarking with Go and GitHub Actions

#go #github #benchmark #ci

Keeping eye on code performance is a good practice that helps moving in the right (greener) direction. Writing and running benchmarks in Go is as easy as writing and running unit tests.

Getting reliable results from benchmarks is not so easy though, performance varies with the load of host environment. If you run benchmarks and Slack on the same machine, you will likely see floating latency in your results. This is less of a problem for memory usage results since they are less affected by sporadic delays in processing.

Statistical analysis can help to estimate the quality of results by calculating deviation of multiple samples. One of the popular tools for that is benchstat.

go test -bench=. -count=5 -run=^a  ./... >bench.txt
benchstat bench.txt

name                                    time/op
pkg:github.com/swaggest/rest/request goos:linux goarch:amd64
Decoder_Decode-4                        1.50µs ±40%
DecoderFunc_Decode-4                    4.09µs ± 3%
Decoder_Decode_json-4                   50.3µs ± 7%
Decoder_Decode_queryObject-4            10.5µs ± 7%
DecoderFactory_SetDecoderFunc-4         3.27µs ± 3%

name                                    alloc/op
pkg:github.com/swaggest/rest/request goos:linux goarch:amd64
Decoder_Decode-4                          440B ± 0%
DecoderFunc_Decode-4                    1.51kB ± 0%
Decoder_Decode_json-4                   12.3kB ± 0%
Decoder_Decode_queryObject-4            2.00kB ± 0%
DecoderFactory_SetDecoderFunc-4         1.02kB ± 0%

name                                    allocs/op
pkg:github.com/swaggest/rest/request goos:linux goarch:amd64
Decoder_Decode-4                          4.00 ± 0%
DecoderFunc_Decode-4                      12.0 ± 0%
Decoder_Decode_json-4                      169 ± 0%
Decoder_Decode_queryObject-4              36.0 ± 0%
DecoderFactory_SetDecoderFunc-4           16.0 ± 0%

You can see alloc/op and allocs/op are stable, while time/op hops to unacceptable ±40% on a busy developer machine. Same problem often happens to busy CI servers running many jobs concurrently. Fortunately, it does not seem to be the case for GitHub Actions servers (at the time of writing).

Here is example GitHub Actions result.

name                                    time/op
pkg:github.com/swaggest/rest/request goos:linux goarch:amd64
Decoder_Decode-2                         880ns ± 3%
DecoderFunc_Decode-2                    2.46µs ± 3%
Decoder_Decode_json-2                   27.8µs ± 1%
Decoder_Decode_queryObject-2            6.30µs ± 2%
DecoderFactory_SetDecoderFunc-2         1.99µs ± 2%

name                                    alloc/op
pkg:github.com/swaggest/rest/request goos:linux goarch:amd64
Decoder_Decode-2                          448B ± 0%
DecoderFunc_Decode-2                    1.51kB ± 0%
Decoder_Decode_json-2                   12.4kB ± 0%
Decoder_Decode_queryObject-2            2.00kB ± 0%
DecoderFactory_SetDecoderFunc-2         1.02kB ± 0%

name                                    allocs/op
pkg:github.com/swaggest/rest/request goos:linux goarch:amd64
Decoder_Decode-2                          4.00 ± 0%
DecoderFunc_Decode-2                      12.0 ± 0%
Decoder_Decode_json-2                      169 ± 0%
Decoder_Decode_queryObject-2              36.0 ± 0%
DecoderFactory_SetDecoderFunc-2           16.0 ± 0%

Such stability makes GitHub Actions a great environment to implement continuous benchmarking.

When making changes to code base it is important that those changes do not bring unexpected performance regressions to original code. In order to track performance regressions we can compare performance of new code with performance of original code with benchstat.

Makefile target to run benchmarks and compare with original (master) results:

BENCH_COUNT ?= 5
REF_NAME ?= $(shell git symbolic-ref HEAD --short | tr / - 2>/dev/null)

## Run benchmark, iterations count controlled by BENCH_COUNT, default 5.
bench:
    @$(GO) test -bench=. -count=$(BENCH_COUNT) -run=^a  ./... >bench-$(REF_NAME).txt
    @test -s $(GOPATH)/bin/benchstat || GO111MODULE=off GOFLAGS= GOBIN=$(GOPATH)/bin $(GO) get -u golang.org/x/perf/cmd/benchstat
    @test -e bench-master.txt && benchstat bench-master.txt bench-$(REF_NAME).txt || benchstat bench-$(REF_NAME).txt

Now we need to configure GitHub Actions workflow to run benchmarks and store (with actions/cache) results of master for future comparisons.

.github/workflows/bench.yml

name: bench
on:
  push:
    tags:
      - v*
    branches:
      - master
  pull_request:
env:
  GO111MODULE: "on"
jobs:
  bench:
    strategy:
      matrix:
        go-version: [ 1.15.x ]
    runs-on: ubuntu-latest
    steps:
      - name: Install Go
        uses: actions/setup-go@v2
        with:
          go-version: ${{ matrix.go-version }}
      - name: Checkout code
        uses: actions/checkout@v2
      - uses: actions/cache@v2
        with:
          path: ~/go/pkg
          key: ${{ runner.os }}-go-pkg-${{ hashFiles('**/go.mod') }}
      - uses: actions/cache@v2
        with:
          path: ~/go/bin/benchstat
          key: ${{ runner.os }}-benchstat
      - uses: actions/cache@v2
        with:
          path: |
            bench-master.txt
          # Using base sha for PR or new commit hash for master/main push in benchmark result key.
          key: ${{ runner.os }}-bench-${{ (github.event.pull_request.base.sha != github.event.after) && github.event.pull_request.base.sha || github.event.after }}
      - name: Benchmark
        run: REF_NAME=${GITHUB_REF##*/} make bench

Then it will be possible to examine performance difference in action logs, for example this PR makes a minor performance improvement in few cases while keeping same performance in other cases:

name                                    old time/op    new time/op    delta
pkg:github.com/swaggest/rest/jsonschema goos:linux goarch:amd64
RequestValidator_ValidateRequestData-2    1.74µs ± 4%    1.64µs ± 1%  -5.97%  (p=0.008 n=5+5)
pkg:github.com/swaggest/rest/request goos:linux goarch:amd64
Decoder_Decode-2                           929ns ± 3%     910ns ± 1%    ~     (p=0.095 n=5+5)
DecoderFunc_Decode-2                      2.66µs ± 3%    2.58µs ± 2%    ~     (p=0.095 n=5+5)
Decoder_Decode_json-2                     30.2µs ± 4%    31.2µs ± 4%    ~     (p=0.095 n=5+5)
Decoder_Decode_queryObject-2              6.42µs ± 2%    6.50µs ± 1%    ~     (p=0.310 n=5+5)
DecoderFactory_SetDecoderFunc-2           2.14µs ± 2%    2.12µs ± 1%    ~     (p=0.151 n=5+5)
pkg:github.com/swaggest/rest/response/gzip goos:linux goarch:amd64
Middleware-2                               278µs ± 7%     260µs ± 1%  -6.53%  (p=0.008 n=5+5)
Middleware_control-2                      4.43µs ± 3%    4.31µs ± 2%    ~     (p=0.095 n=5+5)

Of course benchmark results should be always taken with a grain of salt and if there is a surprising drastic difference it is worth running benchmarks few more times to confirm it before jumping into action of fixing performance issue.

Example setup of continuous benchmarking can be found here.