Originally published on abhijithota.me
The background
I stumbled across a situation while writing a Golang project where I needed to loop over a slice returned by a function.
The function being strings.Split
. No big deal. I wrote the following:
for _, id := range strings.Split(path, "/") {
// Something()
}
There's another way of writing the same thing:
splitPath := strings.Split(path, "/")
for _, id := range splitPath {
// Something()
}
To the novice-eyed, like me, it would seem the latter was an optimized version as the function is only called once before the loop and hence more performant.
I did not care about the negligible performance gain I would get as the path
variable passed to the split function would be limited in length by application constraints.
But what if it wasn't?
tldr; It doesn't matter how you write it. They are both the same.
From the Golang spec,The range expression x is evaluated once before beginning the loop, with one exception: if at most one iteration variable is present and len(x) is constant, the range expression is not evaluated.
This seems obvious once you know about it because it's such a low-hanging fruit for the compiler. It's not even compiler optimization. It's compiler duty.
From here, I'm going to write about how I nerd-sniped myself into finding the answer to this and naturally, spent around 4 hours benchmarking, trying to figure out assembly, checking the compiler optimization wiki, some trivial experimentation that confirmed my theory and one line in the spec that made it all come together.
Before we proceed into this, have some context about the original scenario. The function should split a separated string called path
with /
as delimiter and extract all integers into an array like this:
func ExtractInts(path string) []int {
pathIds := []int{}
for _, v := range strings.Split(path, "/") {
if v != "" {
val, _ := strconv.Atoi(v)
pathIds = append(pathIds, val)
}
}
return pathIds
}
Benchmarking the two functions
I put together a quick small project to bench the thing consisting of main.go
and main_test.go
:
main.go
package main
import (
"strconv"
"strings"
)
func WithOptim(path string) int {
pathIds := []int{}
pathIdsStr := strings.Split(path, "/")
for _, v := range pathIdsStr {
if v != "" {
val, _ := strconv.Atoi(v)
pathIds = append(pathIds, val)
}
}
return len(pathIds)
}
func WithoutOptim(path string) int {
pathIds := []int{}
for _, v := range strings.Split(path, "/") {
if v != "" {
val, _ := strconv.Atoi(v)
pathIds = append(pathIds, val)
}
}
return len(pathIds)
}
main_test.go
package main
import (
"strings"
"testing"
)
var path = strings.Repeat("734/956/831/811/", 100_000)
var resultWithOptim int
func BenchmarkWithOptim(b *testing.B) {
var r int
for n := 0; n < b.N; n++ {
r = WithOptim(path)
}
resultWithOptim = r
}
var resultWithoutOptim int
func BenchmarkWithoutOptim(b *testing.B) {
var r int
for n := 0; n < b.N; n++ {
r = WithoutOptim(path)
}
resultWithoutOptim = r
}
The variables
r
,resultWithOptim
andresultWithoutOptim
are needed to escape compiler optimizations and get reliable benchmarks.Check out this section from the awesome benchmarking article by Dave Cheney.
Results
I ran the benchmark a couple of times.1
In most cases, it was neck-and-neck:
BenchmarkWithoutOptim-8 63 18067614 ns/op
BenchmarkWithOptim-8 64 17833483 ns/op
3.282s
BenchmarkWithoutOptim-8 64 17879142 ns/op
BenchmarkWithOptim-8 66 17943536 ns/op
3.287s
BenchmarkWithoutOptim-8 66 17915000 ns/op
BenchmarkWithOptim-8 66 17567501 ns/op
3.292s
BenchmarkWithoutOptim-8 56 18344452 ns/op
BenchmarkWithOptim-8 57 17649186 ns/op
2.090s
In some cases the one with optimizations performed better:
BenchmarkWithoutOptim-8 64 18446009 ns/op
BenchmarkWithOptim-8 73 18054308 ns/op
3.548s
BenchmarkWithoutOptim-8 63 18466071 ns/op
BenchmarkWithOptim-8 70 18600002 ns/op
3.422s
BenchmarkWithOptim-8 64 18917323 ns/op
BenchmarkWithoutOptim-8 58 18092279 ns/op
3.215s
But in some cases the one without any optimizations performed faster!
BenchmarkWithOptim-8 55 18848131 ns/op
BenchmarkWithoutOptim-8 68 19239865 ns/op
2.395s
BenchmarkWithOptim-8 57 18323290 ns/op
BenchmarkWithoutOptim-8 63 17598465 ns/op
2.206s
Digging into the why
The first resource I came across was the Compiler And Runtime Optimizations wiki which stated nothing about such behaviour.
I also tried converting the code into Assembly using Compiler Explorer but couldn't understand any of it.
Was the compiler calling the function only once? Are the 2 ways of writing same? I had no way of knowing. Then I thought of something so trivial I felt dumb.
The trivial experiment
Consider the following code:
func Nums() []int {
print("Nums called\n")
return []int{1, 2, 3, 4, 5}
}
func Loop() {
for _, v := range Nums() {
print(v)
}
}
func main() {
Loop()
}
And surely enough, Nums called
is only printed once to STDOUT
.
This felt so obvious once I realised it.
Eureka + Facepalm moment
Googling "go range internals" gave me this article by Robbie V which sent me to the Go language specification where I found this:
The range expression x is evaluated once before beginning the loop, with one exception: if at most one iteration variable is present and len(x) is constant, the range expression is not evaluated.
Conclusion
There's nothing much to conclude but there's a lesson to learn. In Robbie V's article, the Step 1 was to RTFM.Unlike that, I dove right into benchmarking which was like comparing two sum functions which returned a + b
vs. b + a
.
I discovered a lot but this shouldn't have taken so much of my time. It's a reminder on how we can take some slightly interesting things, like the one I talked about here, for granted and never realise the machinery of it.
As to which function is better to write, that would be an opinion.
Other languages
I tried the same thing in a few other languages I've coded in because I never realised this:
JavaScript
function nums() {
console.debug('Nums called');
return [1, 2, 3, 4, 5];
}
function loop() {
for (const v of nums()) {
console.debug(v);
}
}
loop();
Python
def nums():
print("Nums called")
return [1,2,3,4,5]
def loop():
for l in nums():
print(l)
loop()
-
Using
go test -bench=. -run=xxx
↩Specs:
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHzOrder of function execution was changed a few times.
Top comments (0)