Content Trends with PHP

Internet loudmouth since 1996
Once upon a time I lived in a share house with a rotating cast of gnarly geeks and one cable modem, so I got pretty well acquainted with the pfSense firewall. Now one of the strange things about pfSense was that it had much of its system scripts rewritten in PHP; seeing rc.d scripts rewritten in PHP made me appreciate the idea of a batteries-included scripting language.

One of the more oddly-shaped batteries in PHP is metaphone(), an algorithm similar to (but more precise than) than the venerable soundex for generating approximately phonetic pronunciations of strings. Because Metaphone implicitly ignores numerals and punctuation it's a handy tool for fuzzy matching of text strings.

Here's a fun example of finding content trends in a systems or application log:


    $keys = array();
    $counts = array();
    $total = 0;

    foreach (file($_SERVER['argv'][1]) as $line) {
        $lphone = metaphone($line);
        $keys[$lphone] = trim($line);
        $counts[$lphone] = $counts[$lphone] + 1;


    $topten = array_slice($counts, 0, 10, true);

    foreach ($topten as $comm => $count) {
        print round((($count * 100) / $total), 2)."%\t\"".$keys[$comm]."\"\n";

And this will let you get the "shape" of files like /var/log/messages:

guidance: ~/trend.php /var/log/messages

64.82%  "May 24 13:52:29 guidance sshd[30083]: Did not receive identification string from"
11.55%  "May 24 13:49:34 guidance sshd[31567]: Received disconnect from 11: disconnected by user"
10.27%  "May 24 13:12:55 guidance sshd[175458]: Accepted publickey for root from port 49141 ssh2"
2.84%   "May 24 13:52:24 guidance sshd[29567]: Connection closed by [preauth]"
1.21%   "May 24 13:48:13 guidance sshd[174773]: Received disconnect from 11: Closed due to user request. [preauth]"
1.11%   "May 24 13:44:13 guidance altsshd[125993]: Received disconnect from 11: disconnected by user"
0.96%   "May 24 13:39:43 guidance altsshd[105719]: Accepted publickey for op from port 44578 ssh2"
0.95%   "May 24 13:50:32 guidance sodad[173222]: sodad-ipc_temp (PID 173222) exiting"
0.95%   "May 24 13:51:01 guidance sodad[194844]: sodad (PID 194844) starting"
0.42%   "May 24 13:41:59 guidance sshd[115807]: Authorized to lbonanomi, krb5 principal bonanomi@DEV.TO (krb5_kuserok)"

