DEV Community

loading...
Cover image for RAM consumption in Python

RAM consumption in Python

jemaloQiu
・6 min read

Yesterday I was busy helping a young engineer in our startup to solve a technique problem about strangely high RAM consumption. He has recently implemented an API in our Back-end project for uploading data from json/jsonl file to MongoDB. I spent some time yesterday on digging into the RAM consumption of Python. I would like to share what I've learnt for this subject.

Check RAM status of Linux

Above all, I want to note down how I check the usage of RAM in our Ubuntu 18.04 running on server.

free -h
Enter fullscreen mode Exit fullscreen mode

Output in my case:
Alt Text

Some explanation of the output of free command from this page:

Free memory is the amount of memory which is currently not used for anything. This number should be small, because memory which is not used is simply wasted.

Available memory is the amount of memory which is available for allocation to a new process or to existing processes.

Memory allocation in C++

I'd like to recall firstly how C++ allocates memories for its variables. In C++ (prior to C++ 11), all variables are declared with a predefined type. Thus the compiler can easily decide the size of the variable and where to store it (heap, stack or static area). See this example I wrote yesterday (I have a 64-bit CPU and the compiler is g++ for x86_64) as file testMem.cpp:

#include <iostream>
/* I set here alignment to 1 byte.
   One can remove this line to see RAM consumption with default alignment setting /*
#pragma pack(1) 

using namespace std;
int g1 = 4;

class c1
{

};

class c2
{        
    int x = 1;        
    int y = 2;
    char z = 12;
    char* name;
};

int main()
{
    cout << " ================= " << endl;
    cout << " Sizeof g1: " << sizeof(g1) << endl;
    cout << " Address of g1: " << &(g1) << endl;
    cout << " ================= " << endl;  
    int a = 100;
    double b = 20.0;

    c1* myC1 = new c1();  // heap

    c2* myC2 = new c2();  // heap

    char c = 55;
    short d = 122;

    cout << " Sizeof a: " << sizeof(a) << endl;
    cout << " Address of a: " << &(a) << endl;


    cout << " Sizeof b: " << sizeof(b) << endl;
    cout << " Address of b: " << &(b) << endl;

    cout << " Sizeof c: " << sizeof(c) << endl;
    cout << " Address of c: " << static_cast<void *>(&c) << endl;

    cout << " Sizeof d: " << sizeof(d) << endl;
    cout << " Address of d: " << static_cast<void *>(&d) << endl;

    cout << " ================= " << endl;
    cout << " Sizeof c1: " << sizeof(c1) << endl;
    cout << " Sizeof c2: " << sizeof(c2) << endl;  
    cout << " Sizeof myC1: " << sizeof(myC1) << endl;
    cout << " Sizeof myC2: " << sizeof(myC2) << endl;

    cout << " ================= " << endl;
    cout << " Address of ptr myC1: " << static_cast<void *>(&myC1) << endl;
    cout << " Address of ptr myC2: " << static_cast<void *>(&myC1) << endl;

    cout << " Address value of myC1: " << static_cast<void *>(myC1) << endl; // heap
    cout << " Address value of myC2: " << static_cast<void *>(myC1) << endl; // heap

    cout << " ================= " << endl;
    int arr[10] = {1};    
    cout << " Sizeof arr: " << sizeof(arr) << endl; // array of 10 integers
    cout << " Address of arr: " << arr << endl;
}

Enter fullscreen mode Exit fullscreen mode

Compile this file and execute it:

> g++ testMem.cpp -o testMem
> ./testMem
Enter fullscreen mode Exit fullscreen mode

Below is the output:
Alt Text

In C++, it's quite clear for us to predict in which memory area (stack/heap/static) a variable is stored by only reading the code. The size of a simple variable in C++ is exactly the number of bytes in which its data has been stored, and it's also straight forward to calculate the size of a compound data type variable. As shown in this example, one can calculate the size of a class/struct by summing up the sizes of its non-static data members (one can search google for a further explanation).

Memory allocation in Python

Now let's do some tests in Python. We can use sys.getsizeof() and id() to get the size and address of an object in Python, however, the real calculation of RAM consumption in Python is a little more complicated than expected.

import sys
import time

def testMem():
    a1, a2 = 1, 1.0
    print("++ a1 has size: {}, address: {}".format( sys.getsizeof(a1),  id(a1) ))
    print("-- a2 has size: {}, address: {}".format( sys.getsizeof(a2),  id(a2) ))

    b1, b2 = 256, 257
    print("++ b1 has size: {}, address: {}".format( sys.getsizeof(b1),  id(b1) ))
    print("-- b2 has size: {}, address: {}".format( sys.getsizeof(b2),  id(b2) ))   

    c1, c2 = -5, -6
    print("++ c1 has size: {}, address: {}".format( sys.getsizeof(c1),  id(c1) ))
    print("-- c2 has size: {}, address: {}".format( sys.getsizeof(c2),  id(c2) ))   


    d1 = {"x":12}
    d2 = {"x1":100000, "x2":"abcdefg", "x3":-100000000000, "x4":0.00000005, "x5": 'v'}

    print("++ d1 has size: {}, address: {}".format( sys.getsizeof(d1),  id(d1) ))
    print("-- d2 has size: {}, address: {}".format( sys.getsizeof(d2),  id(d2) ))   


    e1 = (1, 2, 3)
    e2 = [1, 2, 3]
    print("++ e1 has size: {}, address: {}".format( sys.getsizeof(e1),  id(e1) ))
    print("-- e2 has size: {}, address: {}".format( sys.getsizeof(e2),  id(e2) ))   


if __name__ =="__main__":
    testMem()
Enter fullscreen mode Exit fullscreen mode

Execution output:
Alt Text

As we can see in the picture above, variables' sizes in Python are larger than in C++. Reason for this fact is that everything in Python is an Object (i.e instance of a class-type). Some interesting facts seen in this example:

  • addresses of integers in [-5, 256] are far away from that of other integers (-6, 257 in this example)
  • a short Dict has the same size as a long Dict
  • size of a tuple/list is not the sum of all its items

To understand the details of memory management in Python, one can refer to this article. Here I want to emphasize two important things :

  • The management of Python is quite different from C++, Python objects have a huge fixed overhead regarding C++.
  • For Python containers, sys.getsizeof() does not return the sum of its containing objects, however, it returns only the memory consumption of the container itself and the pointers to its objects.

Below is an example for calculating the "real size" of a container. This function total_size() will go over all items in the container and sum up their sizes to give a total size of the container. See the code:

from __future__ import print_function
from sys import getsizeof, stderr
from itertools import chain
from collections import deque
try:
    from reprlib import repr
except ImportError:
    pass

import sys 

def total_size(o, handlers={}, verbose=False):
    """ Returns the approximate memory footprint an object and all of its contents.
    Automatically finds the contents of the following builtin containers and
    their subclasses:  tuple, list, deque, dict, set and frozenset.
    To search other containers, add handlers to iterate over their contents:
        handlers = {SomeContainerClass: iter,
                    OtherContainerClass: OtherContainerClass.get_elements}
    """
    dict_handler = lambda d: chain.from_iterable(d.items())
    all_handlers = {tuple: iter,
                    list: iter,
                    deque: iter,
                    dict: dict_handler,
                    set: iter,
                    frozenset: iter,
                   }
    all_handlers.update(handlers)     # user handlers take precedence
    seen = set()                      # track which object id's have already been seen
    default_size = getsizeof(0)       # estimate sizeof object without __sizeof__

    def sizeof(o):
        if id(o) in seen:       # do not double count the same object
            return 0
        seen.add(id(o))
        s = getsizeof(o, default_size)

        if verbose:
            print(s, type(o), repr(o), file=stderr)

        for typ, handler in all_handlers.items():
            if isinstance(o, typ):
                s += sum(map(sizeof, handler(o)))
                break
        return s

    return sizeof(o)


def testMemory(): 
    a = {"x":12}
    b = {"x1":1, "x2":"hello", "x3":1.2, "x4":-3, "x5":2000000}
    print("memory of a: {}".format( total_size(a) ))
    print("memory of b: {}".format( total_size(b) ))
    print('Done!')

if __name__ == '__main__':
    testMemory()
Enter fullscreen mode Exit fullscreen mode

Execution output:
Alt Text

Our issue

To analyze our technique problem, I have used a very handy tool memory_profiler for displaying memory consumption status. Here is my test code (function total_size() is needed but not shown here):

from pymongo import MongoClient
from memory_profiler import profile
import sys 

@profile
def testMemory():


    client = MongoClient("mongodb://xxxx:tttttt@ourserver:8917")

    db = client["suppliers"]
    col = db.get_collection("companies_pool")

    re_query_filter = {"domain": {'$regex': "科技"} }
    docs = col.find(re_query_filter)
    print(type(docs))
    docs = docs[10:]
    l = list(docs)

    print("memory of l: {}".format( total_size(l) ))


    f = open("D:\\concour_s2\\Train\\dd.zip",  "br") // a large file in my test, in server case, it shall load only json/jsonl file
    s = f.read()

    print("memory of s: ", sys.getsizeof(s))


    del l
    del s

    print('Done!')


if __name__ == '__main__':
    testMemory()

Enter fullscreen mode Exit fullscreen mode

Below is the execution output:
Alt Text

The code in this part is not what our engineer has written in his project, but he has implemented some similar operations in his API method. With this simple example, he has now understood why his method devoured surprisingly so much RAM at times. Then the problem has been quickly solved.

Discussion (0)