DEV Community

loading...
Cover image for Improving the regex in route patterns (Teeny PHP route system)

Improving the regex in route patterns (Teeny PHP route system)

Guilherme Nascimento
Inphinit, Victoy.css, HTML5, CSS3, JavaScript, PHP, Laravel, Lumen, C++, Qt, Python, C#, Golang, PostgreSQL, mysql, sqlite, Android, Java, Angular, Vue.js
・4 min read

As I have already presented, Teeny is my proposal for an extremely light and small route Teeny, a route system for PHP

However, a colleague from here on the site proposed that I improve the system using the grouping of regular expressions, in order to decrease mainly operations on that.

It wasn't that simple, I had to change the approach, isolating the routes with callback patterns, because in this case it seemed to me that working with indexes would be easier.

Another situation is that the gropus are named by the user of the route system and it is not something automatic, so I had to be careful with that, to avoid that the gropus with duplicate names conflict, I used (?J) in regex, an example of use:

^((?J)(/foo/(?P<user>.*?)/(?P<test>\d+))|(/bar/(?P<user>.*?)/(?P<test>\d+))|(/baz/(?P<name>.*?)/(?P<id>\d+)))$
Enter fullscreen mode Exit fullscreen mode

In this example, the regex can matches to:

  • /foo/username/10
  • /bar/username/10
  • /baz/username/10

For this test I created three very simple scripts, after all object orientation and advanced methodologies for this case seemed to me an exaggeration, so the scripts are very simple:

generate-routes.php

<?php

$patterns = array(
    'alnum' => '[\da-zA-Z]+',
    'alpha' => '[a-zA-Z]+',
    'decimal' => '\d+\.\d+',
    'num' => '\d+',
    'noslash' => '[^\/]+',
    'nospace' => '\S+',
    'uuid' => '[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}',
    'version' => '\d+\.\d+(\.\d+(-[\da-zA-Z]+(\.[\da-zA-Z]+)*(\+[\da-zA-Z]+(\.[\da-zA-Z]+)*)?)?)?'
);

$paramRoutes = array();
$paramCallbacks = array();

for ($i = 0; $i < 300; ++$i) { 
    $paramRoutes[] = "/{$i}/<user>/<test:num>";
    $paramCallbacks[] = "callback_{$i}";
}

$paramRoutes[] = '/foo/<name>/<id:num>';
$paramCallbacks[] = 'callback_success1';

$paramRoutes[] = '/aaa/bbb/ccc';
$paramCallbacks[] = 'callback_success2';

// To simulate $_SERVER['PATH_INFO']
$pathinfo = '/foo/bar/1234';
Enter fullscreen mode Exit fullscreen mode

As the script above, the desired route is the penultimate, that is, in a loop it should take to be obtained in comparison to the first ones, in the script 300 routes for testing are also generated.

ungroup-routes.php

This script works similarly to what Teeny already does today

<?php

require 'generate-routes.php';

$start = microtime(true);

foreach ($paramRoutes as $index => $value) {
    $pRegex = preg_replace($getParams, '(?P<$1><$3>)', $value);
    $pRegex = str_replace('<>)', '.*?)', $pRegex);

    foreach ($patterns as $pattern => $regex) {
        $pRegex = str_replace('<' . $pattern . '>)', $regex . ')', $pRegex);
    }

    if (preg_match('#^(' . $pRegex . ')$#', $pathinfo, $matches)) {
        foreach ($matches as $key => $value) {
            if ($value === '' || is_int($key)) {
                unset($matches[$key]);
            }
        }

        var_dump($matches);

        break;
    }
}

echo round(microtime(true) - $start, 6), "s\n";
Enter fullscreen mode Exit fullscreen mode

group-routes.php

This script is the optimization proposal

<?php

require 'generate-routes.php';

$start = microtime(true);

// Max regex test
$limit = 15;

for ($i = 0, $j = count($paramRoutes); $i < $j; $i += $limit) { 
    $slice = array_slice($paramRoutes, $i, $limit);

    $pRegex = implode(')|(', $slice);

    $pRegex = preg_replace($getParams, '(?P<$1><$3>)', $pRegex);
    $pRegex = str_replace('<>)', '.*?)', $pRegex);

    foreach ($patterns as $pattern => $regex) {
        $pRegex = str_replace('<' . $pattern . '>)', $regex . ')', $pRegex);
    }

    if (preg_match('#^((?J)(' . $pRegex . '))$#', $pathinfo, $matches)) {
        foreach ($matches as $key => $value) {
            if ($value === '' || is_int($key)) {
                unset($matches[$key]);
            }
        }

        var_dump($matches);
        break;
    }
}

echo round(microtime(true) - $start, 6), "ms\n";
Enter fullscreen mode Exit fullscreen mode

Performing the tests

Although many tests are done directly in the environment I chose to use ApacheBench, because I needed to be sure of the effects on multiple HTTP requests occurring almost at the same time.

The first test was to run the ungroup

ab -n 1000 -c 10 http://localhost/benchmark-routes/ungroup-routes.php
Enter fullscreen mode Exit fullscreen mode

the following result was obtained:

Requests per second:    1566.68 [#/sec] (mean)
Time per request:       6.383 [ms] (mean)
Time per request:       0.638 [ms] (mean, across all concurrent requests)
Transfer rate:          472.54 [Kbytes/sec] received
Enter fullscreen mode Exit fullscreen mode

The first test was to run the ungroup

ab -n 1000 -c 10 http://localhost/benchmark-routes/group-routes.php
Enter fullscreen mode Exit fullscreen mode

the following result was obtained:

Requests per second:    3995.64 [#/sec] (mean)
Time per request:       2.503 [ms] (mean)
Time per request:       0.250 [ms] (mean, across all concurrent requests)
Transfer rate:          1209.19 [Kbytes/sec] received
Enter fullscreen mode Exit fullscreen mode

In tests like this, it is evaluated how many operations on average it is possible to perform in a second, in other words, more HTTP requests per second is better, of course, we should consider benefits when it is possible to have much more per second compared to two different tests, if the difference is minimal, it can really be overkill.

According to the test we got about ~2400 (approximately, the tests always vary) more HTTP requests (in a single second), that is, we got more than twice as many requests, which was totally beneficial.

However, I needed to make sure that this apparent improvement would not harm a system with few routes, because what can work well for many requests can sometimes be harmful for a few requests. So I reduced the number of routes in group-routes.php to 30, and ran the tests again:

ungroup-routes.php:

Requests per second:    5361.82 [#/sec] (mean)
Time per request:       1.865 [ms] (mean)
Time per request:       0.187 [ms] (mean, across all concurrent requests)
Transfer rate:          1617.39 [Kbytes/sec] received
Enter fullscreen mode Exit fullscreen mode

group-routes.php:

Requests per second:    5361.82 [#/sec] (mean)
Time per request:       1.865 [ms] (mean)
Time per request:       0.187 [ms] (mean, across all concurrent requests)
Transfer rate:          1617.39 [Kbytes/sec] received
Enter fullscreen mode Exit fullscreen mode

And to my surprise, the new structure did not affect systems with few routes having practically the same performance.

The new version is now available for download at: https://github.com/inphinit/teeny/releases/tag/0.2.8 or using composer:

composer create-project inphinit/teeny <project name>
Enter fullscreen mode Exit fullscreen mode

Thanks

I thank @hbgl, as it was his suggestions that helped with this improvement.

Discussion (0)

Forem Open with the Forem app