Glad you enjoyed the post :)
You are right there are many details I did not talk about but they are equally important. Would you mind to elaborate a bit on the spatial locality and how arrays are stored in memory? I think that'd be super valuable for the discussion!
Thank you for commenting!
Simply put, spatial locality states that the data elements near an element that was just accessed have a high probability of being accessed as well. As for arrays, single dimension ones are stored in memory in a contiguous fashion (for arrays with more dimensions, you need to know the size at compile time, i.e. they have to be fixed size. If you don't know, the rows' elements are still stored contiguously, but there's no guarantee that the rows themselves will be close to each other. There are some tricks you can do with pointers to bypass this, though).
So, the caching process takes both of these ideas into consideration, being that contiguous array elements are usually cached together. This is why mult2 had a better performance - You were traversing the columns as if they were rows (which, as said above, are single dimension arrays). In other words, you were accessing contiguous elements in memory, and not jumping who knows how many memory addresses to access each element like in mult1. Hence the cache hits and cache misses on both examples.
Knowing this, some people even choose to represent their matrices through single dimension arrays. For instance, with a 2D matrix, you would have a single dimension matrix. You would access each element through matrix[width * row + col], and thus, due to spatial locality and contiguous storage, take full advantage of your cache.
Hope that was clear :)
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.