Discussion on: ⚡ ️Blazing Python 🐍 Scripts with Concurrency ⚡️️

View post

Very nice post, easy to understand and lots of new stuff learned. I was wondering if measuring the time it take for completion with a time library directly inside the program isn't tied to how the machine perform in itself and that could be different on other machine if they are slower or faster ?

Also, I don't know why but most of the links are broken and it's a shame because I really wanted to read those complements.

CED • May 28 '19

Sorry for the inconvenience @jim . The links are working fine now.

Ruined1 • May 29 '19 • Edited

The completion time will 100% have to due with the machine it is run on. This is true of all intensive tasks. There are a lot of factors too and I don't just mean clock speed, cache size, etc. For example if you are involving network tasks (such as requests in this article) your bandwidth, network saturation, latency and other variables come into play. If you're involving a lot of disk I/O, your system could have almost identical hardware, but if one is using a standard hard drive and the other a decent SSD, again, those times will be different.

The multi-threaded test the author posted above "Downloaded 90 in 4.992406606674194 seconds" on my Ryzen 5 2600 over gigabit Ethernet, but the same test on my Raspberry Pi 3B over 2.4ghz Wifi 802.11n "Downloaded 90 in 7.817208528518677 seconds".

Multiple factors came into play there. Slower processor, slower memory, greater network latency (due to wifi, rasppi3B is about 25+ feet away, lots of other 2.4ghz networks in the building).

To give a greater understanding however, let me run 900 instead of 90 (sorry Google, Facebook, and Twitter, it's for science!):

Ryzen:
Downloaded 900 in 46.64533185958862 seconds
Raspberry Pi 3B:
Downloaded 900 in 71.28835225105286 seconds

The faster system, with a better network connection, took just over 45 seconds while the slower wifi-crippled system took 1 minute and 11 seconds!

If I took strictly computational tasks (cutting out the network and waiting on responses to requests), it would look more like this:

Synchronous:

import time, concurrent.futures, random

start_time = time.time()

for x in range(1, 90000):
    y = random.randrange(x, x*2) * x
    z = random.randrange(y, y*2) * random.randrange(x,x*2) * y * x
    print(f"{x} | {y} | {z}")

end_time = time.time() - start_time
print(f"Finished computations in {end_time} seconds.")

Ryzen:
Finished computations in 7.4013049602508545 seconds.

Raspberry Pi 3B:
Finished computations in 7.918483018875122 seconds.

Multi-threaded:

import time, concurrent.futures, random

start_time = time.time()

def compute(x):
    y = random.randrange(x, x*2) * x
    z = random.randrange(y, y*2) * random.randrange(x,x*2) * y * x
    print(f"{x} | {y} | {z}")

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    for x in range(1, 90000):
        executor.map(compute(x))

end_time = time.time() - start_time
print(f"Finished computations in {end_time} seconds.")

Ryzen:
Finished computations in 6.342863082885742 seconds.

Raspberry Pi 3B:
Finished computations in 8.66378116607666 seconds.

BUT WAIT, THE RASPBERRY PI WENT UP IN TIME?!

That's right, because my 6-core, 12-thread ryzen was able to benefit from use of 12 threads at a time to queue up work for concurrency. My 4-core, 4-thread ARM processor in my Pi actually took a hit from breaking up the work in a non-beneficial way AND, of course, was slower, due to it's slower performance and lower number of threads.

I hope that clears some stuff up! :)

Jean-Michel Plourde • May 29 '19

Thanks for this awesome experiment. That's what I was suspecting. That's why big O notation is a more reliable way (not always) to measure code performance.