Imaculate

Posted on Nov 24, 2020

Arrays and pointers

#c #array #pointer

In C/C++ it is common to assume that arrays are identical to pointers. The assumption holds true for most practical scenarios: we can assign arrays to pointers, substitute pointers for arrays, and even traverse arrays with pointers. For this reason, code similar to Listing 1 below is valid and very common.

#include <iostream>
using namespace std;

void print_arr(int *p, int arr_size)
{
    cout << "Printing array of size" << size << endl;
    for (int *iter = p; iter != (p + arr_size); ++iter)
        cout << *iter << endl;
}

int main()
{
    int arr[] = {5, 6, 7, 8, 9};
    int *p = arr;
    cout << "P points to " << *p << endl;
    int arr_size = sizeof(arr) / sizeof(int);
    print_arr(arr, arr_size);
}

Listing 1: Arrays and pointers used interchangeably

Output:

Looking at the output, it is reasonable to conclude that an array is equivalent to the pointer to its first element. Does this assumption hold for array functions and multidimensional arrays? Let's find out in Listing 2.

#include <iostream>
using namespace std;

int main()
{
    int arr[] = {5, 6, 7, 8, 9};
    int *p = arr;
    cout << "Size of arr: " << sizeof(arr) << endl;
    cout << "Size of p: " << sizeof(p) << endl;

    int arr2d[2][2] = {{5, 6}, {7, 8}};
    int **p2 = arr2d;
}

Listing 2: Arrays and pointers not interchangeable

Before all else, the listing doesn't compile. The compiler doesn't like the assignment of double pointer p2 to a 2D array.

On uncommenting the problematic assignment, we see somewhat unexpected results of sizeof(). We expected the size of the array to be a multiple of the number of elements and that is true for array arr but not the equivalent pointer.

Why is that? Even though it appears so, the assumption we made is not entirely true. What's true is that there is a subset of cases where arrays are equivalent to pointers but otherwise they are entirely different types. In this article, we'll examine all the various cases, starting from the basics.

Pointers

A pointer is a variable that holds the address of another variable, the pointed-to variable can be any type including function or other pointers. Pointers can be obtained using the address-of (&) operator or dynamically assigned memory with new/malloc. The dereference operator () is used to get the value at the pointer address. They can be assigned to other pointers, and the dynamic memory they point to can be resized. **Listing 3* shows different ways pointers can be manipulated.

int num = 4;
int *p1 = &num;
int *p2 = (int *)malloc(sizeof(int));
p1 = p2;
p2 = (int *)realloc((void *)p2, 2 * sizeof(int));

Listing 3: Pointers

In the listing, p1 is declared and assigned to the address of the integer num and later assigned to another addressp2; for that reason dereferencing it won't return 4. As demonstrated in the figure below, p1 points to the same location as p2 which has been resized to double its initial size.

Arrays

An array is a data structure that holds a fixed number of elements of the same type that can be accessed by indexing. Unlike pointers, arrays can't be initialized with or assigned to other arrays, nor can they be resized. They sacrifice flexibility for runtime performance. Listing 4 shows some operations that would be valid on pointers but do not compile with arrays; these operations have been commented out.

#include <iostream>
using namespace std;

int main()
{
    int arr1[3] = {0, 1, 2};
    // int arr2[] = arr1;
    int arr3[3] = {5, 6, 7};
    //arr3 = arr1;
    cout << "First array: " << arr1 << endl;
    cout << "Third array: " << arr3 << endl;
}

Listing 4: Arrays

On running the listing, we observe interesting output on printing the arrays.

Instead of array elements, we got pointers. We have hit one of the cases where pointers and arrays are equivalent. When does the equivalence rule apply? Let's find out.

Arrays can be pointers

With a few exceptions, whenever an array appears in an expression, the compiler substitutes it with a pointer to its first element. The exceptions being address-of(&) operator, sizeof() and some cases of character arrays. This explains why in listing 2, sizeof() returned the size of a pointer rather than size of the array and the arrays in listing 4 were displayed as pointers. Let's look at different scenarios where the equivalence rule may or may not apply.

1. Address-of operator

Listing 5 demonstrates how the rule works with this operator.

int arr[4] = {0, 1, 2, 3};
int *p = arr;
//int *p2 = &arr;
int (*pa)[4] = &arr;

Listing 5: Address-of operator

arr is a array variable which when referenced in the second statement, decays to a pointer to its first element, making it valid to assign to int pointer type p. arr decays into &arr[0] which is different from &arr, pointer to the array. This distinction is better represented visually below. While pa points to the whole array, p pointes to the first element.

Uncommenting the statement that assigns p2 to &arr will cause compile-time error due to type mismatch.

2. Subscript operator

The subscript ([]) operater commonly used on arrays also works on pointers. On a pointer, it dereferences a pointer that is index steps from the input pointer, that is p[index] is equivalent to *(p + index). When used on array, the equivalence rule applies, resulting in pointer subscripting. Incrementing a pointer moves it in steps of size of the type the pointer points to. For instance in the print_arr() loop in Listing 1 above, iter++ advances iter by the size of one int since it is of type int*. Likewise incrementing pa in Listing 5 will advance it by one int array of size 4, skipping over arr into undefined territory. Caution must be exercised when subscripting pointers to ensure they are not accessing undefined memory. Buffer overflow bugs are tricky to debug since they surface as unexpected results at runtime.

3. String literals

Since strings are collections of characters, it comes as no surprise that string literals can be used to initialize character arrays. With this initialization, the array has one extra character for null termination. A noteworthy fact about string literals is that unless they are assigned to character array, they turn into unnamed read-only array of characters. Therefore although it is correct to initialize a character pointer with string literals, the contents at the location can not be modified. These concepts have been illustrated in Listing 6 below.

#include <iostream>
using namespace std;

int main()
{
    char arr[] = "Coasts";
    cout << "Null terminating char leads to length: " << sizeof(arr) << endl;

    char *p1 = arr; // decay
    cout << "p1 can modify arr since it wasn't initialized with literals" << endl;
    p1[0] = 'T';
    printf("Updated array: %s\n", arr);

    char *p2 = "Coasts";
    cout << "p2 can't modify array since it was initialized with literals" << endl;
    p2[0] = 'T';
    printf("Updated array: %s\n", p2);
}

Listing 6: Character arrays

As expected, the output shows segmentation fault when we modify p2.

4. Passing arrays to functions

Since arrays decay to pointers when passed to functions, they are effectively passed by reference. As a result, arrays can be passed to functions where the expected parameter is an array or pointer of element type. This is demonstrated in Listing 7 below.

#include <iostream>
using namespace std;

void func1(char a[])
{
    cout << "The size of array is: " << sizeof(a) << endl;
}

void func2(char *a)
{
    cout << "First character is: " << *a << endl;
}

int main()
{
    char a[] = "A sentence";
    func1(a);
    func2(a);
}

Listing 7: Arrays in functions

Array a can be passed to func2() which expects a char pointer. The listing compiles but runs with a relevant warning about func1() and which displays unexpected result for the array size.

The compiler assumes the array parameter is a pointer because that is how arrays are passed. sizeof() returns size of the pointer which can be confusing since we expected the size of the array. As a result, array size has to be passed separately to array functions.

5. Multidimensional array

Knowing what we know, its fair to assume that multidimensional arrays decay into respective multilevel pointers (e.g 2D array to double pointer) but that is not the case. The equivalence rule is not recursive; the outer array decays to a pointer to its first element which is another array. Therefore a multidimensional array decays to a pointer to array with one less dimensions. In listing 8 below, arr2d can be passed to func3() which expects pointer to int array. Extra conversion is required to pass it to func4() which takes a double pointer.

#include <iostream>
using namespace std;

void func3(int (*a)[3])
{
    cout << "Size of outer array: " << sizeof(a) << endl;
    cout << "Size of inner array: " << sizeof(*a) << endl;
    cout << "First array: " << *a << endl;
}

void func4(int **p)
{
    cout << "Derefenced element: " << **p << endl;
}

int main()
{
    int arr2d[2][3] = {{1, 2, 3}, {4, 5, 6}};
    func3(arr2d);

    int *p_to_last = &arr2d[1][2];
    func4(&p_to_last);
}

Listing 8: 2-dimensional array

The output from func3() reveals that even though the outer array became a pointer, the inner array was preserved. When the pointer is dereferenced, an int array is returned which is displayed as int pointer.

Although the equivalence is not recursive, multilevel pointers are used to dynamically allocate multidimensional arrays. Dynamic memory is allocated at runtime when it cannot be determined at compile time. The catch is the programmer has to be disciplined enough to deallocate it when no longer need. In C++, this is done with new and delete keywords or C variations of malloc and free. Below we will explore different ways of allocating dynamic multidimensional arrays modeled with 2D arrays; these methods can be translated to more dimensions with more levels of indirection.

5.1 Multi level pointers

Through arrays of pointers, multilevel pointers can create multidimensional arrays. An array of pointers is not to be confused with pointer to array. They are different types with different implications. Listing 9 highlights the syntax difference in declaration and assignment.

const int LEN = 4;
int arr[LEN] = {0, 1, 2, 3};
int (*pa)[LEN] = &arr; //pointer to array of ints
int *ap[LEN];          // array of pointers to int

for (i = 0; i < LEN; i++)
{
    ap[i] = &arr[i]; /* assign the address of integer. */
}
int **pp = ap;

Listing 9: Arrays and Pointers

More illustration in the figure below.

In a 2D array the pointers in the array to the first element(s) of respective columns. Due to equivalence rule, an array of pointers decays to a pointer to pointer i.e a multilevel pointer. Similar to single pointers outlined in Subcripting section, multilevel pointers can also be accessed with subscript operators, depending on the number of dimensions. Multidimensional arrays can be allocated in a single contiguous block or non-contiguous blocks.

5.1.1 Non-contiguous block

With this method of allocation, each column can be on a different block in memory. This is advantageous when memory is limited and its hard to get space for all rows and columns. Listing 10 shows they are allocated, manipulated and deleted. On deletion, each of the columns has to be deallocated before freeing the double pointer.

const int rows = 3;
const int cols = 4;
// array of pointers
int **arr1 = new int *[rows];
for (size_t i = 0; i < rows; i++)
    arr1[i] = new int[cols];
cout << "Array 1:" << endl;
for (size_t i = 0; i < rows; i++)
{
    for (size_t j = 0; j < cols; j++)
        cout << arr1[i][j] << " ";
    cout << endl;
}

for (size_t i = 0; i < rows; i++)
    delete arr1[i];
delete arr1;

Listing 10: Non-contigous 2D array

arr1 can be visualized as follows:

5.1.2 Contiguous block

As the title suggests, here all the array elements are allocated in one contiguous block. Since the array is allocated in one go, it has the advantage of easier cleanup but it may not be possible to find a continuous block big enough for the whole array. Listing 11 shows how such an array is manipulated. Elements can be accessed by subscripting and deletion is simply done by deallocating the block, then the array of pointers.

const int rows = 3;
const int cols = 4;
// array of pointers, continous
int **arr2 = new int *[rows];
arr2[0] = new int[rows * cols];
for(size_t i = 1; i < rows; i++)
     arr2[i] = arr2[0] + i * cols;

cout << "Array 2:" << endl;
for (size_t i = 0; i < rows; i++)
{
   for (size_t j = 0; j< cols; j++)
      cout << arr2[i][j] << " ";
   cout << endl;
}
delete arr2[0];
delete arr2;

Listing 11: Contiguous 2D array

arr2 can be visualized as follows.

Multilevel pointers provide an intuitive way of thinking about multidimensional arrays but it can be hard to visualize with higher dimensions. In such cases, it is recommended to
avoid three star programmership by using type aliases.

5.2 Pointers to arrays

Since multidimensional arrays decay to pointers to arrays, they can be declared as such. They can be declared as pointers to outer or inner arrays. Similar to multilevel pointers, elements can be accessed with subscript operators.

5.2.1 Pointer to inner array

In listing 12, a 2D array is allocated and assigned to pointer to inner array (columns).

const int rows = 3;
const int cols = 4;
int (*arr3)[cols] = new int[rows][cols];
cout << "Array 3:" << endl;
for (size_t i = 0; i < rows; i++)
{
    for (size_t j = 0; j < cols; j++)
        cout << arr3[i][j] << " ";
    cout << endl;
}
delete arr3;

Listing 12: Pointer to inner array

arr3 can be visualized as follows.

There are rows number of pointers starting from arr3 pointing to arrays. Note that these pointers point to arrays, not int elements. With more dimensions, the pointers will point to higher dimensional arrays though they made harder to visualize. The disadvantage to this is that the length of outer array (rows in above case) can not derived from the pointer, it has to be stored separately.

5.2.2 Pointer to outer array

This method is used if we need to preserve the sizes of all dimensions in the pointers. It is achieved by adding one dimension of size 1 to desired array. The additional dimension requires dereference or one more subscript operator when accessing array elements. arr4 shows one such example in Listing 13.

const int rows = 3;
const int cols = 4;
int (*arr4)[rows][cols] = new int[1][rows][cols];
cout << "Array 4:" << endl;
for (size_t i = 0; i < rows; i++)
{
    for (size_t j = 0; j < cols; j++)
        cout << (*arr4)[i][j] << " "; // or arr4[0][i][j]
    cout << endl;
}

cout << "Number of rows: " << sizeof(*arr4) << endl;
cout << "Number of columns: " << sizeof((*arr4)[0]) << endl;
delete arr4;

Listing 13: Pointer to outer array

arr4 can be visualized as follows:

5.3 Single level pointer

Although a single level pointer allocates one dimensional array, it can simulate a multidimensional array. Such arrays are also known as flattened arrays. They have the advantage of being easier to visualize and deallocate. On the other hand, since it is a 1D array, multidimensional subscript syntax can not used to access the elements; the pointer to each elements has to be manually calculated. Listing 14 illustrates how that is done. In addition to that, it requires availability of a contiguous block to fit it all and the pointer doesn't preserve dimension size information.

const int rows = 3;
const int cols = 4;
int *arr5 = new int[rows * cols];
cout << "Array 5:" << endl;
for (size_t i = 0; i < rows; i++)
{
    for (size_t j = 0; j < cols; j++)
        cout << arr5[i * cols + j] << " ";
    cout << endl;
}
delete arr5;

Listing 14: Flattened array

arr5 can be visualized as follows:

Pointers are not arrays

Having seen these scenarios, we can confidently conclude that although arrays and pointers can be equivalent, they are very different. Arrays can be pointers but pointers are not necessarily arrays. And that, friends, is the knotty relationship between arrays and pointers.

DEV Community