Adityaberi

Posted on May 4, 2020

Python's Sum or NumPy's np.sum() ???I found a big difference in time!!

#discuss #python #machinelearning #help

According to me
Use python's methods (sum()) on python datatypes and use NumPy's methods on NumPy arrays (np.sum()).

massive_array=np.random.random(100000)
massive_array.size

100000

massive_array

array([0.81947279, 0.24254041, 0.76437261, ..., 0.15969415, 0.34502387,
0.15858268])

%timeit sum(massive_array) #Python sum
%timeit np.sum(massive_array) #Numpy sum

16 ms ± 494 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
50.6 µs ± 4.83 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

That's a massive difference!!

What do you guys think???

Top comments (4)

rhymes • May 4 '20

It is to be expected that there's a difference:

Python lists are dynamic in size (which means you can append or remove items after you defined the list)
Python lists are not optimized for numeric computations as they can hold any Python object

I had slightly different timings though.

Given:

import random
import array
import numpy

ar = [random.random() for i in range(100000)]
np_ar = numpy.random.random(100000)
tuple_ar = tuple(ar)
array_ar = array.array('f', ar)

these are the timings:

In [26]: %timeit sum(ar)
738 µs ± 117 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [27]: %timeit sum(np_ar)
24.8 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [28]: %timeit numpy.sum(np_ar)
57.2 µs ± 7.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [32]: %timeit sum(tuple_ar)
667 µs ± 78 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [33]: %timeit numpy.sum(tuple_ar)
5.92 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [34]: %timeit numpy.sum(ar)
5.21 ms ± 334 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [38]: %timeit sum(array_ar)
2.1 ms ± 433 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [39]: %timeit numpy.sum(array_ar)
70.7 µs ± 13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [40]: %timeit sum(np_ar)
25.2 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using Python 3.8

Adityaberi • May 4 '20

Also Numpy written in C, and executes very quickly as a result. By comparison, Python is a dynamic language that is interpreted by the Python interpreter, converted to byte code, and executed. So compiled C code is always going to be faster. ... Python loops are slower than C loops

rhymes • May 4 '20

Python lists are written in C as well: github.com/python/cpython/blob/mas...

The iterator protocol is in C too: github.com/python/cpython/blob/mas...

The sum function, as most builtins, is written in C as well github.com/python/cpython/blob/c00...

;-)

Adityaberi • May 4 '20

Also i think an array is a collection of homogeneous data-types which are stored in contagious memory locations, on the other hand, a list in Python is collection of heterogeneous data types stored in non-contagious memory locations.

DEV Community

Python's Sum or NumPy's np.sum() ???I found a big difference in time!!

Top comments (4)

Read next

Automated Session Control with Bluetooth: An Insight into ble-lock-session

Why I Can't Stop Talking About Arc Browser: A Developer's Story

ECCV 2024: Zero-shot Video Anomaly Detection: Leveraging Large Language Models for Rule-Based Reasoning

Advent of Code 2024 - Day 14 : Restroom Redoubt