Usually, when you start your programming journey, the first data types you come up against are arrays and lists. Arrays are a fundamental data type, many more complex data types are built on top of arrays. Let's have a look at arrays:
var arr = new int[4];
arr[0] = 10;
arr[1] = 20;
We have just created an array with a length of 4. Arrays use the contiguous memory allocation technique for memory allocation. This means that all the fields must be continuous next to each other:
There were not four consecutive memory blocks free in the first row, so the required memory for the array was allocated in the second row.
So far so good. Now let's imagine we want to resize our array because we want to store e.g. 6 values. What we would have to do is to create a new array with a length of 6 and copy the values of the previous array into the new array:
var arr = new int []{ 10, 20, 30, 40 };
var biggerArr = new int[6];
arr.CopyTo(biggerArr, 0);
If we now think of lists, lists have a key characteristic: you don't have to specify a fixed length in advance, lists can dynamically allocate more memory. For lists, however, you can specify a capacity value; if no value is specified, a default value is used. This default value varies depending on the programming language. Lists also use arrays in the background, so the given capacity size represents the size of the initial array. For example:
var list = new List<int>(2);
for (var i = 10; i <= 90; i+=10)
{
list.Add(i);
Console.WriteLine($"Added {i} to list. Current list capacity is {list.Capacity}");
}
The output looks like this:
Added 10 to list. Current list capacity is 2
Added 20 to list. Current list capacity is 2
Added 30 to list. Current list capacity is 4
Added 40 to list. Current list capacity is 4
Added 50 to list. Current list capacity is 8
Added 60 to list. Current list capacity is 8
Added 70 to list. Current list capacity is 8
Added 80 to list. Current list capacity is 8
Added 90 to list. Current list capacity is 16
The initial capacity is 2. As soon as this was exceeded, the capacity doubled, i.e. it was 4. As soon as the capacity of 4 was exceeded, it doubled again, and so on.
When we resized our array in our previous example, we basically did something similar than the list. With one exception. If we remember, the occupied memory of an array must always be contiguous. But this is not true for lists, here the allocated memory is not contiguous and may looks like this:
The list consists of many small, not directly contiguous memory blocks. What is immediately noticeable is that this allows the list to make better use of the available memory. If, as in the chosen example, we need to allocate memory for an array with a length of 16, a memory block with an appropriate length must first be found.
But don't get me wrong, this doesn't mean that arrays are inefficient data structures, quite the opposite! Arrays are one of the most efficient data structures ever, especially when we access values via indexes.
I know this looks a little bit messy, but stay with me! The two diagrams show how our memory could be distributed. Our program or our process probably does not have one large continuous block, but many small blocks, in between other processes or programs also occupy memory. On the left side we see in blue our array and on the right side in purple our list.
If we now imagine we want to access any element of our array. Because arrays always allocate a continuous block of memory, we always know exactly where an array begins and ends. This makes it extremely easy to access elements using indexes.
This is different for lists. Elements of a list are also accessed via indexes, but since the elements are not on a continuous memory block, such accesses can be very costly. Of course, this depends primarily on the size of the list, but with large amounts of data, the array will always win.
Does this mean that arrays are better than lists? As always, it depends. If we want to increase our array size, this is an extremely expensive task. As shown before, we need to create a new array with the desired length and copy all the values of the original array.
Lists, on the other hand, simply allocate new memory dynamically. Even if elements are removed, the list can free memory dynamically. So for many add and delete operations, the list clearly has the upper hand. In addition, an array always occupies a fixed amount of memory, regardless of whether all fields are assigned values or not.
Summary:
Array | List | |
---|---|---|
Length | Arrays always have a fixed length. The length of an array cannot be changed, a new array must always be created with a copy of the previous values. | Lists can be resized dynamically. |
Memory consumption | Arrays have a fixed length and the memory is contiguous. Generally speaking, arrays are more memory efficient than lists, unless the array is oversized. | Lists have a non-contiguous memory space, therefore the internal implementation of the list is more complex and needs additional structures that consume more memory. |
Usage | Arrays are pointed if the number of elements is known. In addition, due to the contiguous memory, elements can be quickly accessed via indexes. | Because lists can resize themselves automatically, they are particularly suitable for frequent add and remove operations and can thus also achieve a better memory performance than arrays under certain circumstances. |
Conclusion
Finally, I would like to emphasize that there is no clear winner here, it simply depends on the use-case. In most cases the list is the better choice, most of the time you don't know how big or small the amount of data is and lists can allocate the needed size dynamically.
I hope I could bring you a little bit closer to how arrays and lists work in the background. I tried to stay as general as possible without going into a specific programming language, in essence this should work similarly for all programming languages.
I hope you enjoyed my blogpost, thanks for reading! If you have any suggestions or comments, let me know!
Top comments (0)