tl;dr for-loops are not necessary way more performant than stream()
when in JDK 8 stream() was introduced to the java Collections, the immediate reaction was: "they are sooooo
much slower than a normal for-loop".
I tested it and indeed it was slower. by alot. (I remember at some point that oracle admitted that it is slower, but the focus was on functionality and the performance would probably improve in the future. alas I can not find a source for my memory, so let's assume it is wrong.)
what definetly exists are lots of articles about how slow streams are compared to for-loops. e.g. nipafx, who proved it with JMH and Angelika with the compelling argument, that the compiler optimization for loops is too good to be beaten by streams.
some developers took this fact and kept it stored in their brain forever. but streams were introduced 2014. 8 years have passed. how does it look today? is it really as slow as some repeatedly declare? let's find out.
I wrote a set of benchmarks that (to my knowledge) use the correct procedure in JMH.
- create the data in a @State object
- destroy the result in a blackhole
I let it process loops of 10, 10_000 and 10_000_000 entries. and these are the results:
10 entries
Benchmark Mode Cnt Score Error Units
JmhStreamPerformanceMeasurement.collectFilteredFor thrpt 25 691000.985 ± 5338.170 ops/s
JmhStreamPerformanceMeasurement.collectFilteredForGet thrpt 25 687244.094 ± 2287.375 ops/s
JmhStreamPerformanceMeasurement.collectFilteredStream thrpt 25 620127.959 ± 11149.611 ops/s
JmhStreamPerformanceMeasurement.collectFor thrpt 25 601047.148 ± 6901.828 ops/s
JmhStreamPerformanceMeasurement.collectForGet thrpt 25 593137.918 ± 7027.976 ops/s
JmhStreamPerformanceMeasurement.collectStream thrpt 25 583345.516 ± 2706.945 ops/s
JmhStreamPerformanceMeasurement.easyTaskFor thrpt 25 752205.384 ± 3155.479 ops/s
JmhStreamPerformanceMeasurement.easyTaskForGet thrpt 25 753751.877 ± 2618.748 ops/s
JmhStreamPerformanceMeasurement.easyTaskStream thrpt 25 732847.868 ± 1268.374 ops/s
JmhStreamPerformanceMeasurement.heavyTaskFor thrpt 25 725538.827 ± 859.032 ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet thrpt 25 725200.238 ± 825.300 ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream thrpt 25 723650.793 ± 1007.079 ops/s
10_000 entries
Benchmark Mode Cnt Score Error Units
JmhStreamPerformanceMeasurement.collectFilteredFor thrpt 25 4700.019 ± 13.206 ops/s
JmhStreamPerformanceMeasurement.collectFilteredForGet thrpt 25 4613.177 ± 52.664 ops/s
JmhStreamPerformanceMeasurement.collectFilteredStream thrpt 25 4718.937 ± 232.897 ops/s
JmhStreamPerformanceMeasurement.collectFor thrpt 25 1369.088 ± 10.711 ops/s
JmhStreamPerformanceMeasurement.collectForGet thrpt 25 1337.578 ± 10.015 ops/s
JmhStreamPerformanceMeasurement.collectStream thrpt 25 1383.158 ± 49.265 ops/s
JmhStreamPerformanceMeasurement.easyTaskFor thrpt 25 39043.233 ± 708.907 ops/s
JmhStreamPerformanceMeasurement.easyTaskForGet thrpt 25 42027.702 ± 91.457 ops/s
JmhStreamPerformanceMeasurement.easyTaskStream thrpt 25 40108.355 ± 123.484 ops/s
JmhStreamPerformanceMeasurement.heavyTaskFor thrpt 25 9309.883 ± 13.252 ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet thrpt 25 14033.988 ± 13.011 ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream thrpt 25 13440.062 ± 98.916 ops/s
10_000_000 entries
Benchmark Mode Cnt Score Error Units
JmhStreamPerformanceMeasurement.collectFilteredFor thrpt 25 1.256 ± 0.044 ops/s
JmhStreamPerformanceMeasurement.collectFilteredForGet thrpt 25 1.240 ± 0.038 ops/s
JmhStreamPerformanceMeasurement.collectFilteredStream thrpt 25 1.182 ± 0.052 ops/s
JmhStreamPerformanceMeasurement.collectFor thrpt 25 0.321 ± 0.006 ops/s
JmhStreamPerformanceMeasurement.collectForGet thrpt 25 0.324 ± 0.005 ops/s
JmhStreamPerformanceMeasurement.collectStream thrpt 25 0.322 ± 0.006 ops/s
JmhStreamPerformanceMeasurement.easyTaskFor thrpt 25 39.874 ± 0.326 ops/s
JmhStreamPerformanceMeasurement.easyTaskForGet thrpt 25 40.546 ± 0.356 ops/s
JmhStreamPerformanceMeasurement.easyTaskStream thrpt 25 40.263 ± 0.374 ops/s
JmhStreamPerformanceMeasurement.heavyTaskFor thrpt 25 14.993 ± 0.083 ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet thrpt 25 14.795 ± 0.091 ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream thrpt 25 14.746 ± 0.076 ops/s
Conclusion
the benchmarks ran for 6 hours and ironed out most of the peaks.
the result is in operations per second, so the bigger the better.
in case you're too lazy to look what the benchmarks mean:
-
For
is a normal modern for-loopfor (X x:xs)
that uses an iterator to run over the entries. -
ForGet
is a an old-schoolfor (int i = 0; i < xs.size();x++)
that calls thenget(i)
on an ArrayList -
Stream
is the modern stream() variant. -
Collect
adds all entities of the list to a set. -
CollectFiltered
adds only selected values to the set -
EasyTask
sums up all the entries -
HeavyTask
does a bit more if else and math stuff with the entries.
My expectations
I would guess that the ForGet
benchmarks will be the fastest one of the three, because there will be no Iterator generated and get(i)
on ArrayList is basically only a wrapped array access.
I would also assume that the For
is faster than the Stream
because it only generates one more Iterator instance, while stream()
generates a bunch of instances to process the data.
I also assume that this overhead will go away with longer loops. One instance vs 10 instance on 10 million iterations is neglectable.
The result
The data looks almost as expected, except that stream()
does not at all look like always the slowest. Feel free to check my benchmark code and maybe I did a mistake.
It looks that with short loops (few entries) the stream is up to 11% slower than a for-loop. but it depends very much on what you execute. the easyTask is the worst. the filteredCollect also not looking good for 10 entries.
but this changes already with 10_000 entries: then filteredCollect with stream is the fastest.
SOOOO I think the difference between the three measured loops is irrelevant.
I don't think that any of them is "way faster".
all three work very different, some have more overhead, but might be more intelligent, but none of them will be the bottleneck in any way.
some numbers seem odd and therefore I ran the benchmark twice to eliminate background processes interfering with the result.
Rule of thumb maybe:
- short simple tasks probably better a for-loop.
- long complex tasks probably better a stream()
as soon as you have to handle exceptions the for-loop is better anyway, because that is terrible in stream()
There is the one benchmark that looks suspicious: heavyTaskFor with 10_000 entries. I will repeat that again and comment on it. I assume my machine did something weird at that time. ignore it please for now
cheers, thanks for reading.
Top comments (1)
interestingly enough, the result was reproduced
has someone an idea why the
for
approach fails so badly for 10_000 entries?Edit: I assume the JIT somehow fails to optimize the loop?
third time, all of them parameterized