DEV Community

Cover image for What's in a C (NaN)box?
Remo Dentato
Remo Dentato

Posted on • Edited on

What's in a C (NaN)box?

Most modern scripting languages use dynamic types: values carry a type (an integer, a decimal number, a string, ...) but variables don't. So you can write code like:

   x = 2.3
   x = "my string"
Enter fullscreen mode Exit fullscreen mode

This is especially useful when you have an array whose elements can be of different types:

   arr[0] = 2.3
   arr[1] = "my string"
Enter fullscreen mode Exit fullscreen mode

Alas, we C programmers do not have this luxury. Variables are typed and if, for example, your array need to contain values that can be integers, doubles or pointers, you are on your own.

Typically, this is solved by squeezing multiple values into a union:

  typedef union {
     double d;   // 8 bytes
     int    i;   // 4 bytes (most likely)
     void  *p;   // 8 bytes on 64-bits arch
  } myval_t;     // The entire union will occupy 8 bytes

  myval_t x;
  x.d = 2.3;
  x.i = 9; // The previous value is overwritten
Enter fullscreen mode Exit fullscreen mode

The union will be big enough to contain the biggest type it needs to contain; in the example above the union will be (most likely) 8 bytes. I say "most likely" because the C standard does not mandate how unions will be packed but overlapping the values it's the common way to go.

To set or get a value, you will access the proper field: v.d for the double, v.p for the pointer and so on.

The problem is: "How can I know the type of the value?" The answer is simple: "You can't!"

If you need to do it, usually the type is stored explictly together with the value:

  #define VALTYPE_DOUBLE  1
  #define VALTYPE_INTEGER 2
  #define VALTYPE_POINTER 3

  typedef struct {
     int t;  // type
     union {
       double d;
       int    i;
       void  *p;
     } v;   // value
  } myval_t;

  myval_t x;
  x.v.d = 2.3;
  x.t = VALTYPE_DOUBLE;
  ...
Enter fullscreen mode Exit fullscreen mode

The issue with this approach, besides having to update multiple fields, is that it's way to wasteful! An additional integer for each value you want to store!

In fact, this is not the way modern scripting languages do it. They use a neat trick called NaNBoxing which, to my shame, I was completely ignorant of until recently!

Once I got to know it, I had to implement a small header library C (available on Github).
It allows writing code like this:

  val_t A[10]; 

  A[0] = val(2.3);
  A[1] = val("my string"); // of course just the pointer, this is C!
  A[2] = val(35);
Enter fullscreen mode Exit fullscreen mode

And later:

  if (valisdouble(A[k]))  {
    // do some double-y thing
  }
  else if (valisinteger(A[k])) {
    // do some int-ey thing
  }
Enter fullscreen mode Exit fullscreen mode

I'll describe here how the library is used, the implementation details will come on a future post (hopefully).

The val_t type

After including the header:

  #include "val.h"
Enter fullscreen mode Exit fullscreen mode

you will have a new data type val_t that can store a value of different types:

  • Signed or unsigned integers (up to 48 bits)
  • Double-precision floating-point numbers
  • Booleans. There are two constants defined: valtrue and valfalse which are different from any integer.
  • Nil. A constant different from any integer or boolean.
  • Generic pointers
  • Pointers to strings (char *)

The library provides a val() function to store values into a val_t variable:

val_t f = val(3.2);          // Stores a double value
val_t s = val("a string");   // Stores a pointer to a string
Enter fullscreen mode Exit fullscreen mode

All the details on storing the type, etc are handled by val().

Retrieve values

Once your data is safely stored inside a val_t variable, you can retrieve it utilizing specific type conversion functions:

  • void * valtopointer(val_t v);
  • double valtodouble(val_t v);
  • float valtofloat(val_t v);
  • _Bool valtobool(val_t v);
  • long valtointeger(val_t v);
  • char * valtostring(val_t v);

and cast it as needed. For example:

  float a = 3.14;
  float b = 0.0;
  val_t x;

  s = val(a);                 // stored a float
  b = (float)valtodouble(s); // retrieved a float
Enter fullscreen mode Exit fullscreen mode

Constants for Common Scenarios

The val library defines certain constants of type val_t to handle common scenarios and default values gracefully:

  • valfalse
  • valtrue
  • valnil
  • valnilpointer
  • valnilstr

Identifying the Stored Type

Determining the type of data stored within a val_t variable can be done using the valtype() function:

int valtype(val_t);
Enter fullscreen mode Exit fullscreen mode

It returns one of these constants, each indicative of the nature of the stored data:

  • VALDOUBLE
  • VALINTEGER
  • VALBOOL
  • VALNIL
  • VALPOINTER
  • VALSTRING

For more context-specific checks, you might utilize a suite of helper functions:

  • int valisnil(val_t v);
  • int valisinteger(val_t v);
  • int valissigned(val_t v);
  • int valisbool(val_t v);
  • int valisdouble(val_t v);
  • int valispointer(val_t v);
  • int valisstring(val_t v);

Conclusion

The val library opens the doors to a world where C programmers can bask in the flexibility that is common of dynamically-typed languages like JavaScript.

Through utilization of the val library, developers can store and manage diverse data types without the need for complex, memory-consuming data structures, thereby achieving more with less.

Next post will deep dive into the NaNboxing details.

Top comments (0)