I encountered a weird phenomenon regarding ruby's compact
method, and I thought I'd share my findings with you.
When working on "The Grid" dashboard here at monday.com, I got to a point where I had an array of hashes, and some of the hashes were empty.
I needed to get rid of those.
Bottom line first
To get the result I wanted I just did a.reject { |v| v.blank? }
instead of a.compact
.
Expectation VS Reality.
When I used compact
on the array I was a bit surprised by the result.
a = [{ key: 'value' }, { key: 'value2' }, {}]
puts a.compact
#[{ key: 'value' }, { key: 'value2' }, {}]
I expected the compact
method to just go over each element in the array, check if it's present?
and if it is return it.
Something like this:
def compact
select { |value| value.present? }
end
So when I had an array with empty hashes that wouldn't go away I had to find the explanation.
The implementations behind the scenes
As usual, I dug deeper for the implementations.
From past research I know ruby's Hash also has a compact
method and I know it's pretty straightforward, for each key in the hash we check if it's nil
, if it's not we return it.
Super easy. Similar to what I suggested above, easy to understand, although it wouldn't solve the issue (because we use nil?
instead of blank?
or !present?
)*
activesupport/lib/active_support/core_ext/hash/compact.rb, line 12
def compact
select { |_, value| !value.nil? }
end
In arrays however the implementation is more complex. It is implemented purely in C.
static VALUE
rb_ary_compact(VALUE ary)
{
ary = rb_ary_dup(ary);
rb_ary_compact_bang(ary);
return ary;
}
C is love, C is life
If you don't have any background in C, this next part might be less interesting to you. I'll try to make it as understandable as possible, but I highly recommend learning C if you have the time!
Pointers background
In case you don't know, the way we access our program's memory (the place where the variables are stored) is by using pointers.
These pointers are basically just an address in the memory where we can find the beginning of our variable (each variable takes a different amount of memory, we just get the required number of bytes starting at the pointer's position)
Arrays work similarly (actually, an array is literally a pointer), let's say we have an array of 3 characters, each character consists of one byte, all 3 bytes are stored one after the other in memory.
We can go to the first address in the array (or the pointer to that array), and take 1 byte, that's the first character. Then we take another, that's the second, and then the third.
char arr[3] = {'m', 'i', 'p'}
/*
Behind the scenes it's stored like so:
... 6d 69 70 ...
^ arr[0] == *arr
... 6d 69 70 ...
^ arr[1] == *(arr + 1)
... 6d 69 70 ...
^ arr[2] == *(arr + 2)
*/
Another thing to note: in C, if we have a pointer stored in a variable, let's say ptr
, this variable will contain the address to the value. To get to the value itself we have to use the "dereference" operator (an asterisk before the variable name) like so *ptr
.
Back to code
Above we saw the C implementation of ruby's array compact
, we can see a function that receives an array, duplicates it (using the rb_ary_dup
function), compact
s it using the rb_ary_compact_bang
function ("bang" is the same as the "!" symbol, so it basically runs compact!
), and then returns the duplicated, compacted array.
To find that rb_ary_compact_bang
function I went to ruby's source code:
ruby/array.c, line 5022
static VALUE
rb_ary_compact_bang(VALUE ary)
{
VALUE *p, *t, *end;
long n;
rb_ary_modify(ary);
p = t = (VALUE *)RARRAY_CONST_PTR_TRANSIENT(ary); /* WB: no new reference */
end = p + RARRAY_LEN(ary);
while (t < end) {
if (NIL_P(*t)) t++;
else *p++ = *t++;
}
n = p - RARRAY_CONST_PTR_TRANSIENT(ary);
if (RARRAY_LEN(ary) == n) {
return Qnil;
}
ary_resize_smaller(ary, n);
return ary;
}
Don't close the post! This code isn't friendly for people with no experience in C, but most of it is memory management, some games of allocating memory correctly, resizing arrays to prevent memory leaks etc. I can do a deeper dive in a seperate post if anyone wants.
The interesting part is this one:
while (t < end) {
if (NIL_P(*t)) t++;
else *p++ = *t++;
}
Line by line
As I explained above, an array is just a set of variables stored one after the other in our program's memory, so we can get a pointer to the first variable, and then just increase it by 1 every time to get to the next one (t++
increases t
by 1)
What we have here is a loop that goes from the beginning of the array, with our current position in the t
variable (a pointer to the array), and we iterate over every element until we reach the end
of the array (the address of the beginning of the array + the length of the array)
Every time, we check whether the current value is nil
(using the NIL_P
macro defined in include/ruby/ruby.h line 482)
If it is nil
we just increase the pointer, advancing one spot in our array.
If it's not nil
we put the current value from the original t
(remember, the *
dereferences the pointer) into another array we keep stored in the pointer p
, and then advance the pointer one spot.
Bottom line
In a nutshell, it's the same as doing a.reject { |v| v.nil? }
but more memory aware and efficient.
Just to show the difference in efficiency I benchmarked both ways against an array of 20,000,000 elements with alternating integers and nil
s.
As you can see, array's compact
method using C null pointer elimination is 6.6 times faster than the reject
method!
I hope you enjoyed learning a bit about ruby's internal mechanisms!
* This is the activesupport implementation of Hash compact
, if you'd like to see the C implementation you can go to ruby/hash.c line 4110, but it's pretty similar!
Top comments (0)