Most modern scripting languages use dynamic types: values carry a type (an integer, a decimal number, a string, ...) but variables don't. So you can write code like:
x = 2.3
x = "my string"
This is especially useful when you have an array whose elements can be of different types:
arr[0] = 2.3
arr[1] = "my string"
Alas, we C programmers do not have this luxury. Variables are typed and if, for example, your array need to contain values that can be integers, doubles or pointers, you are on your own.
Typically, this is solved by squeezing multiple values into a union
:
typedef union {
double d; // 8 bytes
int i; // 4 bytes (most likely)
void *p; // 8 bytes on 64-bits arch
} myval_t; // The entire union will occupy 8 bytes
myval_t x;
x.d = 2.3;
x.i = 9; // The previous value is overwritten
The union
will be big enough to contain the biggest type it needs to contain; in the example above the union
will be (most likely) 8 bytes. I say "most likely" because the C standard does not mandate how unions will be packed but overlapping the values it's the common way to go.
To set or get a value, you will access the proper field: v.d
for the double
, v.p
for the pointer and so on.
The problem is: "How can I know the type of the value?" The answer is simple: "You can't!"
If you need to do it, usually the type is stored explictly together with the value:
#define VALTYPE_DOUBLE 1
#define VALTYPE_INTEGER 2
#define VALTYPE_POINTER 3
typedef struct {
int t; // type
union {
double d;
int i;
void *p;
} v; // value
} myval_t;
myval_t x;
x.v.d = 2.3;
x.t = VALTYPE_DOUBLE;
...
The issue with this approach, besides having to update multiple fields, is that it's way to wasteful! An additional integer for each value you want to store!
In fact, this is not the way modern scripting languages do it. They use a neat trick called NaNBoxing which, to my shame, I was completely ignorant of until recently!
Once I got to know it, I had to implement a small header library C (available on Github).
It allows writing code like this:
val_t A[10];
A[0] = val(2.3);
A[1] = val("my string"); // of course just the pointer, this is C!
A[2] = val(35);
And later:
if (valisdouble(A[k])) {
// do some double-y thing
}
else if (valisinteger(A[k])) {
// do some int-ey thing
}
I'll describe here how the library is used, the implementation details will come on a future post (hopefully).
The val_t
type
After including the header:
#include "val.h"
you will have a new data type val_t
that can store a value of different types:
- Signed or unsigned integers (up to 48 bits)
- Double-precision floating-point numbers
- Booleans. There are two constants defined:
valtrue
andvalfalse
which are different from any integer. - Nil. A constant different from any integer or boolean.
- Generic pointers
- Pointers to strings (
char *
)
The library provides a val()
function to store values into a val_t
variable:
val_t f = val(3.2); // Stores a double value
val_t s = val("a string"); // Stores a pointer to a string
All the details on storing the type, etc are handled by val()
.
Retrieve values
Once your data is safely stored inside a val_t
variable, you can retrieve it utilizing specific type conversion functions:
void * valtopointer(val_t v);
double valtodouble(val_t v);
float valtofloat(val_t v);
_Bool valtobool(val_t v);
long valtointeger(val_t v);
char * valtostring(val_t v);
and cast it as needed. For example:
float a = 3.14;
float b = 0.0;
val_t x;
s = val(a); // stored a float
b = (float)valtodouble(s); // retrieved a float
Constants for Common Scenarios
The val
library defines certain constants of type val_t
to handle common scenarios and default values gracefully:
valfalse
valtrue
valnil
valnilpointer
valnilstr
Identifying the Stored Type
Determining the type of data stored within a val_t
variable can be done using the valtype()
function:
int valtype(val_t);
It returns one of these constants, each indicative of the nature of the stored data:
VALDOUBLE
VALINTEGER
VALBOOL
VALNIL
VALPOINTER
VALSTRING
For more context-specific checks, you might utilize a suite of helper functions:
int valisnil(val_t v);
int valisinteger(val_t v);
int valissigned(val_t v);
int valisbool(val_t v);
int valisdouble(val_t v);
int valispointer(val_t v);
int valisstring(val_t v);
Conclusion
The val
library opens the doors to a world where C programmers can bask in the flexibility that is common of dynamically-typed languages like JavaScript.
Through utilization of the val
library, developers can store and manage diverse data types without the need for complex, memory-consuming data structures, thereby achieving more with less.
Next post will deep dive into the NaNboxing details.
Top comments (0)