DEV Community

Adityaberi
Adityaberi

Posted on

Python's Sum or NumPy's np.sum() ???I found a big difference in time!!

According to me
Use python's methods (sum()) on python datatypes and use NumPy's methods on NumPy arrays (np.sum()).

massive_array=np.random.random(100000)
massive_array.size

100000

massive_array

array([0.81947279, 0.24254041, 0.76437261, ..., 0.15969415, 0.34502387,
0.15858268])

%timeit sum(massive_array) #Python sum
%timeit np.sum(massive_array) #Numpy sum

16 ms ± 494 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
50.6 µs ± 4.83 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

That's a massive difference!!

What do you guys think???

Top comments (4)

Collapse
 
rhymes profile image
rhymes

It is to be expected that there's a difference:

  • Python lists are dynamic in size (which means you can append or remove items after you defined the list)
  • Python lists are not optimized for numeric computations as they can hold any Python object

I had slightly different timings though.

Given:

import random
import array
import numpy

ar = [random.random() for i in range(100000)]
np_ar = numpy.random.random(100000)
tuple_ar = tuple(ar)
array_ar = array.array('f', ar)

these are the timings:

In [26]: %timeit sum(ar)
738 µs ± 117 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [27]: %timeit sum(np_ar)
24.8 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [28]: %timeit numpy.sum(np_ar)
57.2 µs ± 7.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [32]: %timeit sum(tuple_ar)
667 µs ± 78 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [33]: %timeit numpy.sum(tuple_ar)
5.92 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [34]: %timeit numpy.sum(ar)
5.21 ms ± 334 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [38]: %timeit sum(array_ar)
2.1 ms ± 433 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [39]: %timeit numpy.sum(array_ar)
70.7 µs ± 13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [40]: %timeit sum(np_ar)
25.2 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using Python 3.8

Collapse
 
adityaberi8 profile image
Adityaberi

Also Numpy written in C, and executes very quickly as a result. By comparison, Python is a dynamic language that is interpreted by the Python interpreter, converted to byte code, and executed. So compiled C code is always going to be faster. ... Python loops are slower than C loops

Collapse
 
rhymes profile image
rhymes

Python lists are written in C as well: github.com/python/cpython/blob/mas...

The iterator protocol is in C too: github.com/python/cpython/blob/mas...

The sum function, as most builtins, is written in C as well github.com/python/cpython/blob/c00...

;-)

Thread Thread
 
adityaberi8 profile image
Adityaberi

Also i think an array is a collection of homogeneous data-types which are stored in contagious memory locations, on the other hand, a list in Python is collection of heterogeneous data types stored in non-contagious memory locations.