DEV Community

Cover image for Can you regex this? A 10MB+ regex file of the entire Indonesian internet blocklist!
Reinhart Previano K.
Reinhart Previano K.

Posted on

Can you regex this? A 10MB+ regex file of the entire Indonesian internet blocklist!

A while ago we decided to publish our new gigantic regular expression files containing the whole Indonesian internet blocklist with 99.99% accuracy (as tested against the official one).

Well, why? We’re bored and just want to experiment with graphs. Back to the old days of C, oh wait, it's another Go program!

Our experiments inside a M1 MacBook Air shows that even Go’s default regexp library can’t handle this big that we had to switch to regexp2—and that still only works for the ~10MB regex-reversed.txt, not the larger regex.txt.

So now, we have a challenge for you: can your favorite regular expression library handle this, 14+ MB of pure regex? I’m personally interested with Intel’s Hyperscan engine, optimized for their x86 platform, of course, to see whether they can handle this big.

Because who knows, we accidentally made a regex performance benchmark tool. (#_ )

Top comments (0)