Tensor performance benchmarks for Python and C++

#cpp #python #testing #performance

What are tensors in computing?

Perhaps this requires a longer discussion but in computing, tensors are a mathematical abstraction. You may find the typical definition of tensors is "multidimensional arrays", and this may be interpreted as a collection of arrays within an array (I will refer to these manifestations as nested vectors in the table below). Tensors, however, are vectorized ("flattened") tensors at the lowest level of code and thus, are typically fastest in this primitive form.

Tensor performance: axes swaps

The image on the cover is a graphical representation of a tensor with 64 elements from 1 to 64. A tensor of any side length $L$ can be easily build as a vector in Python and C++:

# Python 
vector = [i for i in range(L*L*L)]

# C++
std::vector<int> vector;
for (int i=1; i<= L*L*L; i++) vector.push_back(i);

For the benchmarks the tensor is a 100 x 100 x 100 tensor with 1 million elements from 1 to 1 million. This is a huge tensor; making a dictionary with all its coordinates results in a 12 MB dictionary file!

Here's a graphical zoomed-in representation of this tensor to compare with the significantly smaller one in the cover photo:

The operation to benchmark has been chosen as the swapping of this huge tensor's 0th axes with its 2nd axes, with index reversal of all indices (that is, the first element becomes the last, the second the penultimate, and so on). This operation has been chosen as transposition has a high time complexity of $\mathcal{O}(L^3)$ .
The graphical representation of this transposed tensor is presented below. It's also zoomed-in because they are large data and hard to move (even when using WebGL which uses the GPU):

The benchmarked operation was run with different computing methodologies and tensor representations. As was evident, vectorized tensors are superior in performance to nested vectors. Not so evidently, this trend reverses in Python when a JIT-compiler is used.

Language	Tensor Init. Container	JIT Compiling	Operation time	Timer
Python	numpy.zeros((N,N,N)); Nested Vectors	-	2,520 ms ± 229 ns	inline %timeit
Python	numpy.zeros(NNN); Vectorized	-	613 ms ± 68.4 ms	inline %timeit
Python	numpy.zeros((N,N,N)); Nested Vectors	Numba	1.38 ms ± 63,600 ms	inline %timeit
Python	numpy.zeros(NNN); Vectorized	Numba	5.33 ms ± 186,000 ms	inline %timeit
C++	vector<vector<vector<int>>>; Nested Vectors	-	10.5501ms	chrono::high_resolution_clock
C++	vector<int>; Vectorized	-	0.015105 ms	chrono::high_resolution_clock

The operations under JIT compilers show a large spread in operation time because they initially take some time to initialize and compile the scripts into machine code. Interestingly enough, this suggests there may be a more succint way to write machine code for multidimensional arrays compared to vectorized tensors when programming in Python.

Finally, C++ vectorized beats all. It beats JIT-compiled Python code by $\mathcal{O}(2)$ (i.e. 100x faster) and conventional Python executions by $\mathcal{O}(4-5)$ orders of magnitude.

-Andrew

DEV Community

Tensor performance benchmarks for Python and C++

What are tensors in computing?

Tensor performance: axes swaps

Top comments (0)

Read next

Straight to the Money 💰 minimalistic yet all-inclusive Python project template

Build an API to Keep Your Marketing Emails Out of Spam

Introduction to QA Testing: Ensuring Software Excellence

Connect to multiple databases, make or generate SQL queries, analyze or visualize.