loading...

Storing C data in an MRuby class

roryo profile image Rory O'Connell ・4 min read

Quick Friday hit for the 0 people following along my project and the dozen people discovering this years later from Google.

There aren't any good examples on creating a Ruby class with mruby that encapsulates data not used within the VM. This is something you need when you're using mruby to interface with a C library that provides it's own data types. You want to call methods in Ruby which run C code calling out to the external library. Examples include database systems, graphics APIs and interacting with the operating system.

It's a different though simpler process with mruby than mainline Ruby. C Ruby requires a separate allocate method before initialize. With mruby we can perform all operations with creating and storing C data all within the initialize method.

We'll use a contrived Foo_data struct.

struct Foo_data {
  uint32_t first[16];
  uint32_t second[16];
};

For demonstration we'll create a constructor for a Foo class taking one integer parameter. This parameter will instruct how to fill out the first and second members of a Foo_data struct that's a part of an instance of Foo. Building a Ruby Foo class in C like usual

mrb_state *state = mrb_open();
RClass *Foo = mrb_define_class(state, "Foo", state->object_class);
MRB_SET_INSTANCE_TT(Foo, MRB_TT_DATA);

mrb_define_method(state, Foo, "initialize", Foo_initialize, MRB_ARGS_REQ(1));
mrb_define_method(state, Foo, "check", Foo_check, MRB_ARGS_NONE());

There's a highly important new thing here. The MRB_SET_INSTANCE_TT sets the class object table type to the one specified. In this case we're setting this to a special MRB_TT_DATA. I neglected to do this step and encountered erratic difficult to debug program behavior. The program would follow pointers to nowhere and quit outright, not throwing the normal Access Violation error normally seen when following wild pointers.

Next we create a mrb_data_type. This is a struct which informs the mruby VM of the data type and the function to call when the GC comes for the object. It has two members, name and a function pointer. Name is a unique char* identifier, this is unique and identifies the data type. The name of the class, "Foo" is fine. The function pointer is what the GC calls when it destroys the object. If you're not doing anything special you can use the default built in function mrb_free

static const mrb_data_type Foo_type = {
  "Foo", mrb_free
};

Filling out the initialize method. We'll allocate a new Foo_data and save it to the instance on initialize.

mrb_value
Foo_initialize(mrb_state* state, mrb_value self) {
  mrb_int n;
  mrb_get_args(state, "i", &n);
  Foo_data *foo = (Foo_data *)DATA_PTR(self);
  if(foo) { mrb_free(state, foo); }
  mrb_data_init(self, nullptr, &Foo_type);
  foo = (Foo_data *)malloc(sizeof(Foo_data));

  for(uint32_t i = 0; i < 16; ++i) {
    foo->first[i] = i * n;
    foo->second[i] = i * n * n;
  }

  mrb_data_init(self, foo, &Foo_type);
  return self;
}

Explaining a couple things here. DATA_PTR pulls a void * out from the object specified, in this case self. We then cast it to what we want, a Foo_data *. For reasons I don't entirely understand, though recommended in this discussion and used in the mruby time gem we see if there already is a pointer associated with the instance. If so we free it.

We call mrb_data_init first to initialize the void * destined for holding the Foo_data *. Then a normal C heap allocation, and then filling out the data with the integer passed into the constructor. Calling mrb_data_init with the populated Foo_data * saves the data to the instance.

We extract it again at check

mrb_value
Foo_check(mrb_state* state, mrb_value self) {
  Foo_data *foo;
  Data_Get_Struct(state, self, &Foo_type, foo);
  mrb_assert(foo != nullptr);
  return mrb_nil_value();
}

Data_Get_Struct is a macro which will pull out and type cast the void * saved to the instance. All of the definitions and implementations are in data.h and data.c in the mruby code.

Now all that's left is create instances of Foo with different data and confirm with a debugger that the data is what we expect.

mrb_load_string(state, "a = Foo.new(10); a.check; b = Foo.new(50); b.check");

And that's it! It seems daunting at first understanding how to create a class within mruby for saving arbitrary C data with the lack of information. I hope this information makes the process clear and helps someone wanting to do the same.

The full program

#include <stdlib.h>
#include "mruby.h"
#include "ext\mruby\data.h"
#include "ext\mruby\class.h"
#include "ext\mruby\compile.h"

struct Foo_data {
  uint32_t first[16];
  uint32_t second[16];
};

static const mrb_data_type Foo_type = {
  "Foo", mrb_free
};

mrb_value
Foo_initialize(mrb_state* state, mrb_value self) {
  mrb_int n;
  mrb_get_args(state, "i", &n);
  Foo_data *foo = (Foo_data *)DATA_PTR(self);
  if(foo) { mrb_free(state, foo); }
  mrb_data_init(self, nullptr, &Foo_type);
  foo = (Foo_data *)malloc(sizeof(Foo_data));

  for(uint32_t i = 0; i < 16; ++i) {
    foo->first[i] = i * n;
    foo->second[i] = i * n * n;
  }

  mrb_data_init(self, foo, &Foo_type);
  return self;
}

mrb_value
Foo_check(mrb_state* state, mrb_value self) {
  Foo_data *foo;
  Data_Get_Struct(state, self, &Foo_type, foo);
  mrb_assert(foo != nullptr);
  return mrb_nil_value();
}

int main() {
  mrb_state *state = mrb_open();
  RClass *Foo = mrb_define_class(state, "Foo", state->object_class);
  MRB_SET_INSTANCE_TT(Foo, MRB_TT_DATA);

  mrb_define_method(state, Foo, "initialize", Foo_initialize, MRB_ARGS_REQ(1));
  mrb_define_method(state, Foo, "check", Foo_check, MRB_ARGS_NONE());

  mrb_load_string(state, "a = Foo.new(10); a.check; b = Foo.new(50); b.check");
  return 0;
}

Posted on by:

Discussion

pic
Editor guide
 

That's a pretty useful tutorial!

One question though: here you are calling mrb_data_init on the mrb_value self in the Foo constructor (Foo_initialize), so I suppose this mrb_value is already setup to receive a data pointer. How do you initialize an mrb_value with custom C data outside of a constructor, e.g. from within a normal function (doing the equivalent of a Foo.new inside a C function)?

I have tried this:

mystruct* val = ... ; // allocation
mrb_value v;
mrb_data_init(v, (void*)val, &mydatatype);

but the mrb_data_init is producing a segfault. I suppose there is something to do with mrb_data_object_alloc first, but this function returns an RData*, not an mrb_value, and I haven't found any function to convert an RData* into an mrb_value.

 

I think you're running into confusion on what an mrb_value is, which is understandable. There is little documentation about mruby and the api is inconsistent with either using the mrb_value or the raw pointers. An mrb_value type is temporary. It's a convenience wrapper for interpreting a region of allocated memory. The tt member of the mrb_value informs you, the programmer, how to proceed with the mrb_value For instance, if the tt is MRB_TT_FIXNUM then you read the value.i member directly. However, with other values you cast the value.p void * based upon the tt type. Another example, if the tt is MRB_TT_CLASS you'd do RClass *Foo = (RClass *)obj.value.p. The same for MRB_TT_DATA, the value.p is an RData * and should be cast as such RData *foo = (RData *)obj.value.p. (Note, strings are really weird in that they have lots of hidden optimizations underneath, don't ever work with an mruby string directly no matter what even if it looks like a normal char *)

In the few months of working with mruby I've never constructed an mrb_value myself. I let the API or convenience macros build them and do the right thing instead. Most API functions return an mrb_value pre-populated. When I need to create an mrb_value based upon C data I use a provided API function or C macro. boxing_no.h contains some C macros for making an mrb_value from C data. Otherwise the type specific header file contains C macros for creating an mrb_value for that type, like string.h

As to your question when defining an mruby method in C the calling signature is (mrb_state*, mrb_value). The mruby VM boxes up the current self object into an mrb_value and passes it along as the second parameter of the C function. In the case of the initialize method the mruby VM already created and allocated the object and passed it along as the self giving you a chance to work with it more.

Knowing that you can achieve what you're after. You cannot call mrb_data_init on an unpopulated mrb_value. It doesn't point to anything useful in your example and has the classic C problem of uninitialized garbage memory data. mrb_data_init is following the v.value.p to some random location and then crashing.

Instead, use the self value passed into every function which the VM populates properly. If you set the class type using MRB_SET_INSTANCE_TT then the self mrb_value is a properly constructed mrb_value wrapping an RData *. I hope that I helped your understanding and you got a little farther.

One bit of note. Calling a method on an object other than new and having the object allocate memory for itself is surprising to me. If you think about it in pure Ruby land, calling methods on a constructed object will only construct other objects, which their constructors allocate memory, and optionally save the references as instance variables. I'd never expect doing something like foo = Foo.new; foo.some_operation to allocate more memory, I'd expect Foo.new to do all the allocations required for an instance of Foo. Something to consider that may simplify your designs and help your understanding.

 

Thanks for your answer! I also asked the question as an issue on the mruby repo and got an answer about using the C macros. My code works, now.

My use-case for allocating an object in a function that isn't a constructor was not to allocate more memory for "self", but to allocate a new object. Imagine the following code:

class Foo
...
end

class Bar
...
def do_something
...
return Foo.new
end
end

Now imagine Foo and Bar are actually not defined in Ruby but in C, and do_something is a C function, you would need to create an instance of Foo from inside a C function.