DEV Community

Khaled Nassar
Khaled Nassar

Posted on • Updated on

Don't use Python List Everywhere

the common thing that if you wanna add some items to someplace may thinking to use List, that's great python list has a lot of features that you need like

  • mutable: you can add/delete what you want from the list
  • ordered: the items have a defined order
  • allowed duplicates: lists can have items with the same value

so you will ask me a question if lists have something like that so why should I avoid it then?
well the answer key is that performance is the key, YES Lists are not the best choice if you're caring about your program's performance

before beginning we can take this as an example

l = range(100)
Enter fullscreen mode Exit fullscreen mode

this a simple code that returns a int list from 0 to 99, so let's talk more about python integers

what are Python Integers actually

we all know that the python interpreter is written in C, so the python objects are C structures but it not contain the its value only, nope it contains extra information like

  • variable type
  • size of the data members
  • reference count
  • the actual value the variable
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
Enter fullscreen mode Exit fullscreen mode

All this extra information is what gives python more flexibility to lets you code freely
Not just integers but all the data types in Python comes with this cost that becomes significant in structures that combine many of these objects like List

you understand what's python types are, so let's talk about

Why List is not the best choice

Python list should be like a C array both are mutable but the difference is C array should be homogeneous not like Python list, yes this makes python list more flexible but it is more costly to be heterogeneous because each of the elements of the list must contain its extra information that I mentioned above
in short, each item is a complete Python object in the List
so when all list items are of the same type with this extra information that becomes redundant

Also if you tried to add more data to your python list you will do something very costly because python creates a new list with more extra space than the original list and push the new data and the old one into the new space
for example

>>> l = [1,2]
>>> l.append(3)
# CREATE ANOTHER ONE len(x) == 3
# APPEND number 3
Enter fullscreen mode Exit fullscreen mode

digram

imagine using something like that with the list that contains 100000 items, we will copy 100000 items in every append process

# m = 100000
for i in range(100):
    m.append(i)
Enter fullscreen mode Exit fullscreen mode

Alternatives

Tuples

Tuples are immutable so it has a fixed size, Also it's allow duplicates and Ordered

>>> %time l = [0,1,2,3,4,5,6,7,8,9]
1000000 loops, best of 3: 285 ns per loop
>>> %timeit t = (0,1,2,3,4,5,6,7,8,9)
10000000 loops, best of 3: 55.7 ns per loop
Enter fullscreen mode Exit fullscreen mode

Sets

Sets like Tuples but Unordered and Unchangeable it doesn't allow the duplicates items

let's take this example code

def list_unique_names(phonebook):
    unique_names = []
    for name, phonenumber in phonebook: 1
        first_name, last_name = name.split(" ", 1)
        for unique in unique_names: 
            if unique == first_name:
                break
        else:
            unique_names.append(first_name)
    return len(unique_names)

def set_unique_names(phonebook):
    unique_names = set()
    for name, phonenumber in phonebook: 
        first_name, last_name = name.split(" ", 1)
        unique_names.add(first_name) 
    return len(unique_names)

phonebook = [
    ("John Doe", "555-555-5555"),
    ("Albert Einstein", "212-555-5555"),
    ("John Murphey", "202-555-5555"),
    ("Albert Rutherford", "647-555-5555"),
    ("Guido van Rossum", "301-555-5555"),
]

print("Number of unique names from set method:", set_unique_names(phonebook))
print("Number of unique names from list method:", list_unique_names(phonebook))
Enter fullscreen mode Exit fullscreen mode

we have two functions here, One used list, the other sets
and after calling them this the performance report

>>> %timeit list_unique_names(large_phonebook)
1 loops, best of 3: 2.56 s per loop
>>> %timeit set_unique_names(large_phonebook)
100 loops, best of 3: 9.57 ms per loop
Enter fullscreen mode Exit fullscreen mode

Numpy Arrays

numpy arrays are almost certainly a better choice if you are doing anything heavily nu‐
meric, as you get more datatype options and many specialized and fast functions. You
might choose to avoid numpy if you want fewer dependencies for your project, though
Cython and Pythran work equally well with array and numpy arrays; Numba works with
numpy arrays only

References

thanks for reading
bye :D

BYE

Discussion (3)

Collapse
mellen profile image
Matt Ellen • Edited on

Intersting piece, especially the performance evaluations.

Just a couple of points:

l = range(100)
Enter fullscreen mode Exit fullscreen mode

This does not create a list, it creates a range object

type(l)
<class 'range'>
Enter fullscreen mode Exit fullscreen mode

Which means you cannot append to l. you need to turn it into a list:

l = list(range(100))
Enter fullscreen mode Exit fullscreen mode

The second point is very minor, but not all python implementations are written in C. IronPython (C#) and Jython (Java) are two that come to mind.

Collapse
fen1499 profile image
Fen

Really cool to see how python objects are represented on C, never thought about that. This article implies some hard truth that I struggle to convince people that is if performance is needed either go for pandas/numpy or use a different language.

Any sort of workaround might prove itself rather complicated in the context of a real application which kind of kills the whole purpose of using python

Collapse
erickvh_ profile image
Erick Ventura

Nice article, it would be great to see how C structures are implemented for set and tuples on Python