DEV Community ๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘จโ€๐Ÿ’ป

Cover image for Compile-Time Hash in Plain C (Not Only C++) is Now Possible!
Athaariq Ardhiansyah
Athaariq Ardhiansyah

Posted on

Compile-Time Hash in Plain C (Not Only C++) is Now Possible!

Hello folks, I'm back! After months of disappearance, restless night, rollercoaster-mood, finally I have something to share for you ๐Ÿ˜ Let's dig down to the rabbit hole again ๐Ÿ•ณ๏ธ๐Ÿ‡

a Little Background (I guess...)

Imagine a case where you're trying to identify multiple sections of data with labels like "Acceleration X" then use them for data transfer between devices. Normally, programmers rather create a convention of integers like table below.

ID Label
1 Acceleration X
2 Acceleration Y
3 Orientation
4 Temperature
... ...

While transferring the data, all devices use the ID instead of label because of hardware constraint (we're talking about Arduino/ESP32/AVR for now). This approach will work, but what if you're working with large group of teams or with community here at the internet? Well... That's tough

Instead of defining an integer for each label manually, why don't we try to automate it? Sure we can, but here's the catch:

  • IDs must different if labels different, otherwise they collide
  • Accept diverse length of labels but ID size (in bits) must consistent
  • Fast as possible

Okay but what's the solution?

In Computer Science, we learn what hash function do and that solves the problems mentioned above. It works as below.

CRC32 converts vary length of text to 32 bits integer

This approach is a common practice in computing world. Usually for error checking. But have you ever wondered how it actually works behind the scene? It's... complicated for me to say. So complicated that nobody ever bother to create its implementation that run on compilation time with Macro in C language!

Why bother to run it on compilation time?

Previously, I did a project which involves AVR microcontrollers (Arduino Uno to be exact) and an Orange Pi board, and stumbled with that case. I stupidly asked microcontrollers to calculate CRC32 for ยฑ40 strings on runtime then they went mayhem. That approach ate 2/3 of SRAM and took multiple seconds to finish the setup() function. I started to think what if I can do that in compile-time instead of runtime. That would saves a lot of resource on microcontroller, right? Well... not that easy.

Trials and Failures

C++ Constant Expression

By the power of C++, we can explicitly tell the compiler to compute CRC32 at compile-time with constexpr keyword! Somebody at StackOverflow gave us a snippet of compile-time CRC32. It works as expected when I simulate it on my Linux laptop. Except... I forgot that my OrangePi board uses C instead of C++.

Facepalm

Anyway it's not a big deal, that board has multiple CPU with gigahertz speed. Nothing went wrong, I guess...

Memory Usage is Bigger than Before ๐Ÿ’€

...Until I realized Arduino IDE failed to compile:

Sketch uses 4448 bytes (13%) of program storage space. Maximum is 32256 bytes.
Global variables use 2290 bytes (111%) of dynamic memory, leaving -242 bytes for local variables. Maximum is 2048 bytes.
Not enough memory; see https://support.arduino.cc/hc/en-us/articles/360013825179 for tips on reducing your footprint.
Error compiling for board Arduino Uno.
Enter fullscreen mode Exit fullscreen mode

no, noo, please god, noo!!

After hours of troubleshooting, I realized that:

  1. The CRC32 table was uselessly stored in flash by compiler. I realized there was a table after saw this gist and for some reason the compiler store it into the flash.
  2. Why on earth the strings still stored in flash by compiler?!

Turns out I can't simply use constexpr for this purpose. Even though I followed this advice, the constant expression still not behaves as "constant expression".

Now what?

Let's use an alternative: Macro

CRC32 is Too Complex for Macro

Now this is the most interesting thing, how do we write CRC32 with Macro? The answer is "it can't". Just imagine put an elephant into a fridge alive. That won't work even if you chop the elephant to pieces!

Interesting thing? You said "it can't" as interesting??

Well that's not the "interesting thing" I meant, sorry for the exaggeration. What I meant interesting is when I figured out an alternative, yet still working hash algorithm. Introducing my homebrewed simple XOR hash function:

f(s,i,x) = (s[i] ^ x) * 17

Where:
    s = Input string
    i = Current character index
    x = Hash result of previous character, if i == 0 then x = 11
Enter fullscreen mode Exit fullscreen mode

That function is being executed for each character of a string, from left to right (first to last).

I call this algorithm SIX Hash because that function's parameter looks funny. The future seems bright, isn't it? Now let's write the darn macro!

Attempting to Use Boost Preprocessor

For those who didn't know what is Boost, it's a C++ library that helps to prevent re-inventing the wheel while trying to program something quite complex as example looping only with macro, Boost Preprocessor. Fortunately, Boost Preprocessor Repeat also works with plain C, not only C++. So, my OrangePi board can calculate hash at compile-time. Unfortunately, my SIX Hash algorithm requires sizeof(input) and Boost... won't... work... with it. Hours of workarounds, no luck.

Visible Confusion

Now what? Cries??

Well no, I think I have to re-invent the wheel again :)

Loop The C Macro with PHP (seriously)

What if I tell you, PHP can be Macro of C Macro? Confused? Let me explain.

a Diagram where PHP file being read and outputs C file

The goal of using PHP is automatically copy-paste lines of #define because that's what Boost Preprocessor do. If you ever heard PHP supposed to preprocess HTML, think again, think outside the box. PHP can conveniently convert from this...

#include <limits.h>

#define __IDENTITY_HASH_FUNC(s, n, i, x) (s[i < n ? n - 1 - i : 0] ^ x) * 17ull

<?php
    $max_iter = getenv("MAX_ITERATION");
    $last_iter = $max_iter - 1;

    echo "#define __IDENTITY_ITER_$last_iter(s, x, n) x\n";

    for ($i = $last_iter - 1; $i >= 0; $i--)
    {
        $next_iter = $i + 1;
        echo "#define __IDENTITY_ITER_$i(s, x, n) ($i < n ? __IDENTITY_HASH_FUNC(s, n, $i, __IDENTITY_ITER_$next_iter(s, x, n)) : x)\n";
    }
?>

#define IDENTITY(text) ((identity_t) /* */ __IDENTITY_ITER_0(text, 11ull, (sizeof text - 1)))
Enter fullscreen mode Exit fullscreen mode

...to this

#include <limits.h>

#define __IDENTITY_HASH_FUNC(s, n, i, x) (s[i < n ? n - 1 - i : 0] ^ x) * 17ull

#define __IDENTITY_ITER_63(s, x, n) x
#define __IDENTITY_ITER_62(s, x, n) (62 < n ? __IDENTITY_HASH_FUNC(s, n, 62, __IDENTITY_ITER_63(s, x, n)) : x)
#define __IDENTITY_ITER_61(s, x, n) (61 < n ? __IDENTITY_HASH_FUNC(s, n, 61, __IDENTITY_ITER_62(s, x, n)) : x)
#define __IDENTITY_ITER_60(s, x, n) (60 < n ? __IDENTITY_HASH_FUNC(s, n, 60, __IDENTITY_ITER_61(s, x, n)) : x)
#define __IDENTITY_ITER_59(s, x, n) (59 < n ? __IDENTITY_HASH_FUNC(s, n, 59, __IDENTITY_ITER_60(s, x, n)) : x)
#define __IDENTITY_ITER_58(s, x, n) (58 < n ? __IDENTITY_HASH_FUNC(s, n, 58, __IDENTITY_ITER_59(s, x, n)) : x)
#define __IDENTITY_ITER_57(s, x, n) (57 < n ? __IDENTITY_HASH_FUNC(s, n, 57, __IDENTITY_ITER_58(s, x, n)) : x)
/* 55 Lines collapsed, imagine how long the code if this expanded! */
#define __IDENTITY_ITER_1(s, x, n) (1 < n ? __IDENTITY_HASH_FUNC(s, n, 1, __IDENTITY_ITER_2(s, x, n)) : x)
#define __IDENTITY_ITER_0(s, x, n) (0 < n ? __IDENTITY_HASH_FUNC(s, n, 0, __IDENTITY_ITER_1(s, x, n)) : x)

#define IDENTITY(text) ((identity_t) /* */ __IDENTITY_ITER_0(text, 11ull, (sizeof text - 1)))
Enter fullscreen mode Exit fullscreen mode

It works! Finally!!

Hooray!

Also nice to mention that anyone can set the maximum input length by set the MAX_ITERATION environment variable. Customizable, no excessive copy-paste, easy revision.

Conclusion

That was a thrilling journey to be honest, just to make compile-time hash work. Anyway, you don't have to re-invent the wheel like what I did because I created a library:

identity.h

It's still not perfect unfortunately. If you can make it perfect, pull a request or open an issue. That would helps. Thank you for joining my journey down to the rabbit hole. See ya later ๐Ÿ‘‹

Top comments (1)

Collapse
 
seedyrom profile image
Zack Kollar

This is hilariously awesome. Great example of thinking out of the box.

Here is a post you might want to check out:

Regex for lazy developers

regex for lazy devs

Sorry for the callout ๐Ÿ˜†