DEV Community

Cover image for Tensor performance benchmarks for Python and C++
Andrew Garcia
Andrew Garcia

Posted on • Updated on

Tensor performance benchmarks for Python and C++

What are tensors in computing?

Perhaps this requires a longer discussion but in computing, tensors are a mathematical abstraction. You may find the typical definition of tensors is "multidimensional arrays", and this may be interpreted as a collection of arrays within an array (I will refer to these manifestations as nested vectors in the table below). Tensors, however, are vectorized ("flattened") tensors at the lowest level of code and thus, are typically fastest in this primitive form.

Tensor performance: axes swaps

The image on the cover is a graphical representation of a tensor with 64 elements from 1 to 64. A tensor of any side length LL can be easily build as a vector in Python and C++:

# Python 
vector = [i for i in range(L*L*L)]

# C++
std::vector<int> vector;
for (int i=1; i<= L*L*L; i++) vector.push_back(i);

Enter fullscreen mode Exit fullscreen mode

For the benchmarks the tensor is a 100 x 100 x 100 tensor with 1 million elements from 1 to 1 million. This is a huge tensor; making a dictionary with all its coordinates results in a 12 MB dictionary file!

Here's a graphical zoomed-in representation of this tensor to compare with the significantly smaller one in the cover photo:

Image description

The operation to benchmark has been chosen as the swapping of this huge tensor's 0th axes with its 2nd axes, with index reversal of all indices (that is, the first element becomes the last, the second the penultimate, and so on). This operation has been chosen as transposition has a high time complexity of O(L3)\mathcal{O}(L^3) .
The graphical representation of this transposed tensor is presented below. It's also zoomed-in because they are large data and hard to move (even when using WebGL which uses the GPU):

Image description

The benchmarked operation was run with different computing methodologies and tensor representations. As was evident, vectorized tensors are superior in performance to nested vectors. Not so evidently, this trend reverses in Python when a JIT-compiler is used.

Language Tensor Init. Container JIT Compiling Operation time Timer
Python numpy.zeros((N,N,N)); Nested Vectors - 2,520 ms ± 229 ns inline %timeit
Python numpy.zeros(N*N*N); Vectorized - 613 ms ± 68.4 ms inline %timeit
Python numpy.zeros((N,N,N)); Nested Vectors Numba 1.38 ms ± 63,600 ms inline %timeit
Python numpy.zeros(N*N*N); Vectorized Numba 5.33 ms ± 186,000 ms inline %timeit
C++ vector<vector<vector<int>>>; Nested Vectors - 10.5501ms chrono::high_resolution_clock
C++ vector<int>; Vectorized - 0.015105 ms chrono::high_resolution_clock

The operations under JIT compilers show a large spread in operation time because they initially take some time to initialize and compile the scripts into machine code. Interestingly enough, this suggests there may be a more succint way to write machine code for multidimensional arrays compared to vectorized tensors when programming in Python.

Finally, C++ vectorized beats all. It beats JIT-compiled Python code by O(2)\mathcal{O}(2) (i.e. 100x faster) and conventional Python executions by O(45)\mathcal{O}(4-5) orders of magnitude.

-Andrew

Top comments (0)