Ali Sherief

Posted on Mar 23, 2020

Numpy arrays at lightspeed ⚡ Part 3

#python

In this post I will cover some parts of numpy that are good to have handy when you are using it. This is a miscellaneous toolbox of concepts which I haven't sorted out but nevertheless still shine out when their value is needed.

A little more indexing thing

You can also reshape an array by setting its shape property to a new size tuple, like x.shape = (2,5).

Say you have this large array: z = np.arange(81).reshape(3,3,3,3). Indexing an array using lists of indices is different from using tuples of indices.

>>> z[[1,1,1,1]] # Grabs the entire second dimension, and makes four copies of it on another array
array([[[[27, 28, 29],
         [30, 31, 32], ...
>>> z[(1,1,1,1)] # Gets the element in the corresponding array coordinate/position
40

Ellipsis can be used as a shorthand for entire slices of intermediate dimensions. Slices also work too:

>>> z[1, Ellipsis, 1]       # same as [1,...,1], itself same as [1, :, :, 1]
array([[28, 31, 34],
       [37, 40, 43],
       [46, 49, 52]])
>>> z[1,1,1,slice(0,2)]     # same as [1,1,1,0:2]
array([39, 40])

Enclosing tuples don't seem to affect indexing at all, so using z[(1,1)] is the same as using z[1,1]. You can even create an indexing variable without a tuple, courtesy of Python syntax.

>>> indices = 1, Ellipsis, 1    # Same as (1, Ellipsis, 1)
>>> z[indices]
array([[28, 31, 34],
       [37, 40, 43],
       [46, 49, 52]])

Again, it's worth repeating from Part 1, since this is a very common error people make, instead of assigning to same index in an array multiple times:

>>> x = np.arange(0, 50, 10)
>>> x
array([ 0, 10, 20, 30, 40])
>>> x[np.array([1, 1, 3, 1])] += 1    # Doesn't work properly
>>> x
array([ 0, 11, 20, 31, 40])

Do somehting like this instead:

>>> x[np.array([3])] += 1
>>> x[np.array([1])] += 3
>>> x
array([ 0, 13, 20, 31, 40])         # You probably meant to to do this

Data types

Numpy has several data types that correspond to C types. They can be used where a dtype keyword argument is allowed.

There are platform-defined types which correspond to the types you're used to in procedural languages like C and C++, and also types that always hold a specific number of bits. The following are scalar types which are not instances of dtype objects but can be used in place of a dtype object, that is, any function where the dtype keyword argument is accepted.

Numpy scalar type	C type	Description
np.bool_	bool	Boolean (True or False) stored as a byte
np.byte	signed char	Platform-defined
np.ubyte	unsigned char	Platform-defined
np.short	short	Platform-defined
np.ushort	unsigned short	Platform-defined
np.intc	int	Platform-defined
np.uintc	unsigned int	Platform-defined
np.int_	long	Platform-defined
np.uint	unsigned long	Platform-defined
np.longlong	long long	Platform-defined
np.ulonglong	unsigned long long	Platform-defined
np.half / np.float16		Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
np.single	float	Platform-defined single precision float: typically sign bit, 8 bits exponent, 23 bits mantissa
np.float / np.double	double	Platform-defined double precision float: typically sign bit, 11 bits exponent, 52 bits mantissa.
np.longdouble	long double	Platform-defined extended-precision float
np.csingle	float _Complex	Complex number, represented by two single-precision floats (real and imaginary components)
np.cfloat / np.cdouble	double _Complex	Complex number, represented by two double-precision floats (real and imaginary components).
np.clongdouble	long double _Complex	Complex number, represented by two extended-precision floats (real and imaginary components).
np.int8	int8_t	Byte (-128 to 127)
np.int16	int16_t	Integer (-32768 to 32767)
np.int32	int32_t	Integer (-2147483648 to 2147483647)
np.int64	int64_t	Integer (-9223372036854775808 to 9223372036854775807)
np.uint8	uint8_t	Unsigned integer (0 to 255)
np.uint16	uint16_t	Unsigned integer (0 to 65535)
np.uint32	uint32_t	Unsigned integer (0 to 4294967295)
np.uint64	uint64_t	Unsigned integer (0 to 18446744073709551615)
np.intp	intptr_t	Integer used for indexing, typically the same as ssize_t
np.uintp	uintptr_t	Integer large enough to hold a pointer
np.float32	float
np.float64 / np.float_	double	Note that this matches the precision of the builtin python float.
np.complex64	float _Complex	Complex number, represented by two 32-bit floats (real and imaginary components)
np.complex128 / np.complex_	double _Complex	Note that this matches the precision of the builtin python complex.

Look at the table carefully. Some numpy types don't have the same size as similarly named C counterparts. In particular:

np.uint is unsigned long, not unsigned int.
np.float and np.cfloat are the same size as C's double and double _Complex, respectively.

(Reading through tables like this is good, as I didn't know C and C++ had a float/double complex type until now.)

Calling np.<whatever-type>() as a function with whatever numbers, lists or ndarrays you want to cast will cast them into an ndarray of that type. So these functions behave like C's (int) or C++'s static_cast<int>. An example:

In [1]: np.uint8([1,2,3])                                                       
Out[1]: array([1, 2, 3], dtype=uint8)
# This also works:
In [2]: np.array([1,2,3]).astype(np.int8)                                       
Out[2]: array([1, 2, 3], dtype=int8)

There are some "meta-types", specifically, generic types, existing in numpy that describe classes of types, such as integers and floating point numbers. One of their uses is figuring out an element type:

>>> d = np.dtype(int)
>>> d
dtype('int32')

>>> np.issubdtype(d, np.integer)
True

>>> np.issubdtype(d, np.floating)
False

The generic types are: number, inexact, floating, complexfloating, integer, signedinteger, unsignedinteger, character, generic, flexible, and void. I have created a script which will classify each dtype according to their subtype. If prints whether each numpy dtype is a subtype of a generic type. I thought about annotating the table above with these results but then it would get too cluttered.

In [1]: np_types = ["bool_", "byte", "ubyte", "short", "ushort", "intc", "uintc
   ...: ", "int_", "uint", "longlong", "ulonglong", "half", "single", "float16"
   ...: , "float", "double", "longdouble", "csingle", "cfloat", "cdouble", "clo
   ...: ngdouble", "int8", "int16", "int32", "int64", "uint8", "uint16", "uint3
   ...: 2", "uint64", "intp", "uintp", "float32", "float64", "float_", "complex
   ...: 64", "complex128", "complex_"]
In [2]: np_generics = [np.number, np.inexact, np.floating, np.complexfloating, 
   ...: np.integer, np.signedinteger, np.unsignedinteger, np.character, np.gene
   ...: ric, np.flexible] 
In [3]: for dt in np_types: 
   ...:     for gn in np_generics: 
   ...:         print("{0} is {1}: {2}".format(dt, gn.__name__, np.issubdtype(dt, gn))) 
bool_ is number: False
bool_ is inexact: False
bool_ is floating: False
bool_ is complexfloating: False
bool_ is integer: False
bool_ is signedinteger: False
bool_ is unsignedinteger: False
bool_ is character: False
bool_ is generic: True
bool_ is flexible: False
byte is number: True
byte is inexact: False
byte is floating: False
...

More data types

There are more combinations of objects that a dtype can specify than just integers and floating point numbers. They can also represent byte order, the field names and types inside the type if it's a structure, and the shape and datatype of elements if dtype is an array, and even a void type (for raw data).

This allows dtypes to emulate the full gamut of C and C++ types without enforcing the strict typing present in those languages.

A dtype string is composed of the following characters, in order:

> for big endian, < for little endian, and = for hardware default
a character signifying the type. Could be ? for boolean, b for signed byte, B for unsigned byte, i for signed integer, u for unsigned integer, f for floating point, c for complex floating point, m for a timedelta python object, M for a datetime object, O for some other python object, U for unicode string, and V for void, or raw data, void is rarely needed
the number of bytes in the type. (for Unicode string it's number of characters instead)

>>> dt = np.dtype('i4')   # 32-bit signed integer
>>> dt = np.dtype('f8')   # 64-bit floating-point number
>>> dt = np.dtype('c16')  # 128-bit complex floating-point number
>>> dt = np.dtype('a25')  # 25-length zero-terminated bytes
>>> dt = np.dtype('U25')  # 25-character string

At the beginning of a sequence of dtype characters you can insert dimensions like (3,2), which makes it a 3x2 array of that type.

You can put multiple dtypes in the same string separated by commas to represent a struct. Structs can be indexed by strings 'f0', 'f1', etc. unless you give the struct different field names. Here's what all this would look like:

# Array dtype
In [1]: dt = np.dtype(('(2,3)U4'))                                            

In [2]: np.array([(('fffeee', '4', '5'), 'b'), 'as', 'dw'], dt)               
Out[2]: 
array([[['fffe', '4', '5'],
        ['b', 'b', 'b']],

       [['a', 'a', 'a'],
        ['s', 's', 's']],

       [['d', 'd', 'd'],
        ['w', 'w', 'w']]], dtype='<U4')

# This fails because arrays aren't structs
In [3]: np.array([(('fffeee', '4', '5'), 'b'), 'as', 'dw'], dt)['f0']         
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-143-9d57896acaf5> in <module>
---------> 1 np.array([(('fffeee', '4', '5'), 'b'), 'as', 'dw'], dt)['f0']

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices


# Structure dtype
In [4]: dt = np.dtype(('i4, U4, f4'))     # fields default to 'f0', 'f1', 'f2'

In [5]: np.array([(1, 'a', 1.23), (2, 'b', 2.4), (3, 'c', 2.3)], dt)           
Out[5]: 
array([(1, 'a', 1.23), (2, 'b', 2.4 ), (3, 'c', 2.3 )],
      dtype=[('f0', '<i4'), ('f1', '<U4'), ('f2', '<f4')])

In [6]: np.array([(1, 'a', 1.23), (2, 'b', 2.4), (3, 'c', 2.3)], dt)['f0']    
Out[5]: array([1, 2, 3], dtype=int32)

Now that we got those out of the way, there are other parameter formats np.dtype() can take:

dtype(type) where type is any string in np.sctypeDict.keys(). It's worth elaborating that np.sctypeDict() contains the mappings between all numpy dtypes to fixed-width scalar types.
dtype(flexible_dtype, itemsize) where flexible_dtype is a numpy type, and itemsize is the number of bytes it will take. This only makes sense for variable width types like unicode string and void. Other types won't honor itemsize.

In [1]: np.dtype(np.int, 10)                                                  
Out[1]: dtype('int64')

In [2]: np.dtype('U', 10)                                                     
Out[2]: dtype('<U')

In [3]: np.array('1234567890', np.dtype('U', 10))                             
Out[3]: array('1234567890', dtype='<U10')

In [4]: np.array('12345678901', np.dtype('U', 10))                            
Out[4]: array('12345678901', dtype='<U11')

dtype(flexible_dtype, shape), same as above, but shape can be a tuple of array dimensions.
dtype([(field_name, field_dtype, shape), ...]) makes an array type. It has the name of a field followed by its shape and type string as formulated above. Note: A dtype string can also make arrays.

In [1]: dt = np.dtype([('R','u1'), ('G','u1'), ('B','u1'), ('A','u1')])

In [2]: np.array([(1,1,1,1), (2,2,22,2)], dt)                                 
Out[2]: 
array([(1, 1,  1, 1), (2, 2, 22, 2)],
      dtype=[('R', 'u1'), ('G', 'u1'), ('B', 'u1'), ('A', 'u1')])

In [3]: np.array([(1,1,1,1), (2,2,22,2)], dt)['G']                            
Out[3]: array([1, 2], dtype=uint8)

dtype({'names': <list>, 'formats': <list>, 'offsets': <list>, 'titles': <list>, 'itemsize': <list>}), this makes a struct where the names is a list of field names, formats is a list of dtype strings or numpy scalar types, offset can only be used for padding integer dtypes (it's a list of offsets in the structure each field is, like C structs), titles is a list of textual descriptions of each field, and itemsize is the size the entire dtype should be padded to, which you can make larger than the actual size of the struct if you want to e.g. align it to a power of 2. Note: A dtype string can also make structs.

In [1]: dt = np.dtype({'names': ['r','g','b','a'], 
   ...:                'formats': [np.uint8, np.uint8, np.uint8, np.uint8]})

In [2]: np.array([(1,2,3,4),(5,256,265,255)],dt)                              
Out[2]: 
array([(1, 2, 3,   4), (5, 0, 9, 255)],
      dtype=[('r', 'u1'), ('g', 'u1'), ('b', 'u1'), ('a', 'u1')])

In [3]: np.array([(1,2,3,4),(5,256,265,255)],dt)['r']                         
Out[3]: array([1, 5], dtype=uint8)

dtype((base_dtype, new_dtype)), creates an object of new_dtype but masquerades it as base_dtype.


In [1]: dt = np.dtype((np.int32,{'real':(np.int16, 0),'imag':(np.int16, 2)}))

In [2]: np.array([(6+(6<<20),5+(5<<20)),(4+(4<<20), 3+(3<<20))],dt)           
Out[2]: 
array([[6291462, 5242885],
       [4194308, 3145731]],
      dtype=(numpy.int32, [('real', '<i2'), ('imag', '<i2')]))


In [3]: np.array([(6+(6<<20),5+(5<<20)),(4+(4<<20), 3+(3<<20))],dt)['real']   
Out[3]: 
array([[6, 5],
       [4, 3]], dtype=int16)

In [4]: np.array([(6+(6<<20),5+(5<<20)),(4+(4<<20), 3+(3<<20))],dt)['imag']   
Out[4]: 
array([[96, 80],
       [64, 48]], dtype=int16)

This is a rough hack to use the bottom bits of an int32 as a real number and the top bits as an imaginary number.

In [1]: dt = np.dtype(('u4', [('r','u1'),('g','u1'),('b','u1'),('a','u1')]))

In [2]: np.array([-1,-2,-3,-4], dt)                                           
Out[2]: 
array([4294967295, 4294967294, 4294967293, 4294967292],
      dtype=(numpy.uint32, [('r', 'u1'), ('g', 'u1'), ('b', 'u1'), ('a', 'u1')]))

In [3]: np.array([-1,-2,-3,-4], dt)['r']                                      
Out[3]: array([255, 254, 253, 252], dtype=uint8)

In [4]: np.array([-1,-2,-3,-4], dt)['g']                                      
Out[4]: array([255, 255, 255, 255], dtype=uint8)

This method of dtype formulation would also allow you to access red, green, blue and alpha parts of a color by indexing, without having to make an array of structures or doing bitwise operations. Instead, it wraps all of these types around a uint32.

Useful dtype members

The following members are mostly things that I've gone over in the above sections, but they are convenient to have handy to access programmatically:

<dtype-obj>.type: Underlying type used by numpy for this dtype
<dtype-obj>.kind: One of the character codes above that identifies the type
<dtype-obj>.char and <dtype-obj>.num: More identifiers of the type
<dtype-obj>.str: The dtype string, more formally known as the array-protocol typestring
<dtype-obj>.name: Name of the dtype
<dtype-obj>.size: Number of bytes this dtype takes
<dtype-obj>.byteorder: A character describing the endian order of this dtype

For structs you also have access to:

<dtype-obj>.fields: Dictionary of names of fields and their dtypes. This is None if there are no fields.
<dtype-obj>.name: List of field names. This is None if there are no fields.

For arrays you also have access to:

<dtype-obj>.subdtype: Tuple of its dtype and shape. This is None if not an array dtype.
<dtype-obj>.subdtype: Array dtype's shape tuple. This is () if not an array dtype.

Array scalars

A single element from a numpy array is also a numpy type. The consistency of types between elements and arrays gives elements most of the members that are available for arrays. For example, you have dtype, size, reshape() and transpose() among many others.

Numeric limits

Being written in C, numpy data types are not overflow-resistant like Python's numeric types. So you can check the minimum and maximum values of each type with np.linfo() for integer types, and np.finfo() for floating point types.

In [1]: np.iinfo(np.int64)                                                     
Out[1]: iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

In [2]: np.finfo(np.float64)                                                   
Out[2]: finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)

Now, a word about floating point precision, since integer precision is well known to be limited to the number of bits it has:

Python uses 64-bit floats (double in other words) to represent floats. Numpy has its own float types. In particular, it also has a long double-like 80-bit floating point type that you saw above (np.longdouble), unless you are using Windows, in which case long double is only a 64-bit float because of the Windows compiler, MSVC. Now np.longdouble is the most precise floating point type that is possible in numpy, but there are symbolic variables and rational numbers support in the sympy package if you really need more precision, or unlimited precision. That will not be covered in this post.

Outside of numpy, extra precision offered by np.longdouble, if any, will be lost when converting it to Python's 64-bit float type, like if you want to format it to a string, or call any python (not numpy) function that expects a python float.

np.finfo(np.longdouble).eps is the smallest representable long double number. If correct values with np.longdouble is important to you, check that passing 1 and 1 + np.finfo(np.longdouble).eps to your functions don't get squished into the same number, particularly if long doubles are 80 bits on your system as eps will be lost when converting to a python float causing both values to be equivalent (to 1).

And we're done

Much material was sourced from the numpy manual.

If you see anything incorrect here please let me know so I can fix it.

Image by Arek Socha from Pixabay

DEV Community

Numpy arrays at lightspeed ⚡ Part 3

A little more indexing thing

Data types

More data types

Useful dtype members

Array scalars

Numeric limits

And we're done

Top comments (0)

Read next

Sudoku Solving in Python

From Web Developer to Machine Learning Engineer: A Transformational Journey to Boost Your Career

Eye Blinking and Lip Sync for Animal Images Using AI in Python

How to scrape Google search results with Python