Remo Dentato

Posted on Oct 1, 2023 • Edited on Oct 7, 2023

What's in a C (NaN)box?

#c #nanboxing

Most modern scripting languages use dynamic types: values carry a type (an integer, a decimal number, a string, ...) but variables don't. So you can write code like:

   x = 2.3
   x = "my string"

This is especially useful when you have an array whose elements can be of different types:

   arr[0] = 2.3
   arr[1] = "my string"

Alas, we C programmers do not have this luxury. Variables are typed and if, for example, your array need to contain values that can be integers, doubles or pointers, you are on your own.

Typically, this is solved by squeezing multiple values into a union:

  typedef union {
     double d;   // 8 bytes
     int    i;   // 4 bytes (most likely)
     void  *p;   // 8 bytes on 64-bits arch
  } myval_t;     // The entire union will occupy 8 bytes

  myval_t x;
  x.d = 2.3;
  x.i = 9; // The previous value is overwritten

The union will be big enough to contain the biggest type it needs to contain; in the example above the union will be (most likely) 8 bytes. I say "most likely" because the C standard does not mandate how unions will be packed but overlapping the values it's the common way to go.

To set or get a value, you will access the proper field: v.d for the double, v.p for the pointer and so on.

The problem is: "How can I know the type of the value?" The answer is simple: "You can't!"

If you need to do it, usually the type is stored explictly together with the value:

  #define VALTYPE_DOUBLE  1
  #define VALTYPE_INTEGER 2
  #define VALTYPE_POINTER 3

  typedef struct {
     int t;  // type
     union {
       double d;
       int    i;
       void  *p;
     } v;   // value
  } myval_t;

  myval_t x;
  x.v.d = 2.3;
  x.t = VALTYPE_DOUBLE;
  ...

The issue with this approach, besides having to update multiple fields, is that it's way to wasteful! An additional integer for each value you want to store!

In fact, this is not the way modern scripting languages do it. They use a neat trick called NaNBoxing which, to my shame, I was completely ignorant of until recently!

Once I got to know it, I had to implement a small header library C (available on Github).
It allows writing code like this:

  val_t A[10]; 

  A[0] = val(2.3);
  A[1] = val("my string"); // of course just the pointer, this is C!
  A[2] = val(35);

And later:

  if (valisdouble(A[k]))  {
    // do some double-y thing
  }
  else if (valisinteger(A[k])) {
    // do some int-ey thing
  }

I'll describe here how the library is used, the implementation details will come on a future post (hopefully).

The `val_t` type

After including the header:

  #include "val.h"

you will have a new data type val_t that can store a value of different types:

Signed or unsigned integers (up to 48 bits)
Double-precision floating-point numbers
Booleans. There are two constants defined: valtrue and valfalse which are different from any integer.
Nil. A constant different from any integer or boolean.
Generic pointers
Pointers to strings (char *)

The library provides a val() function to store values into a val_t variable:

val_t f = val(3.2);          // Stores a double value
val_t s = val("a string");   // Stores a pointer to a string

All the details on storing the type, etc are handled by val().

Retrieve values

Once your data is safely stored inside a val_t variable, you can retrieve it utilizing specific type conversion functions:

void * valtopointer(val_t v);
double valtodouble(val_t v);
float valtofloat(val_t v);
_Bool valtobool(val_t v);
long valtointeger(val_t v);
char * valtostring(val_t v);

and cast it as needed. For example:

  float a = 3.14;
  float b = 0.0;
  val_t x;

  s = val(a);                 // stored a float
  b = (float)valtodouble(s); // retrieved a float

Constants for Common Scenarios

The val library defines certain constants of type val_t to handle common scenarios and default values gracefully:

valfalse
valtrue
valnil
valnilpointer
valnilstr

Identifying the Stored Type

Determining the type of data stored within a val_t variable can be done using the valtype() function:

int valtype(val_t);

It returns one of these constants, each indicative of the nature of the stored data:

VALDOUBLE
VALINTEGER
VALBOOL
VALNIL
VALPOINTER
VALSTRING

For more context-specific checks, you might utilize a suite of helper functions:

int valisnil(val_t v);
int valisinteger(val_t v);
int valissigned(val_t v);
int valisbool(val_t v);
int valisdouble(val_t v);
int valispointer(val_t v);
int valisstring(val_t v);

Conclusion

The val library opens the doors to a world where C programmers can bask in the flexibility that is common of dynamically-typed languages like JavaScript.

Through utilization of the val library, developers can store and manage diverse data types without the need for complex, memory-consuming data structures, thereby achieving more with less.

Next post will deep dive into the NaNboxing details.

DEV Community

What's in a C (NaN)box?

The `val_t` type

Retrieve values

Constants for Common Scenarios

Identifying the Stored Type

Conclusion

Top comments (0)

Read next

Building Simple Shell in C

Starting with C: All the Fundamentals in One Guide

🚀 Common C# Performance Optimization Myths

TIL emalloc() auto-exits on out-of-memory errors

The val_t type

Retrieve values

Constants for Common Scenarios

Identifying the Stored Type

Conclusion

Read next

Building Simple Shell in C

Starting with C: All the Fundamentals in One Guide

🚀 Common C# Performance Optimization Myths

TIL emalloc() auto-exits on out-of-memory errors

The `val_t` type