Cuckoo Filters provide the ability to REMOVE entries from their hash table. This is something that has been attempted through extensions to Bloom Filters. But the attempts usually end up with a complicated, or sub-optimal solution. In addition to being able to delete entries, Cuckoo Filters have a small memory foot print, comparative false positive rates, and are intuitively easy to understand.
A cuckoo filter is a compact variant of a cuckoo hashtable [21] that stores only fingerprints—a bit string derived from the item using a hash function—for each item inserted
cuckoo filters use partial-key cuckoo hashing to find an item’s alternate location based on only its fingerprint
For an item x, our hashing scheme calculates the indexes of the two candidate buckets as follows: h1(x)=hash(x),h2(x)=h1(x)⊕hash(x’s fingerprint). The xor operation...ensures an important property: h1(x) can also be calculated from h2(x) and the fingerprint using the same formula. In other words, to displace a key originally in bucket i (no matter if i is h1(x) or h2(x)), we directly calculate its alternate bucket j from the current bucket index i...Hence, an insertion only uses information in the table, and never has to retrieve the original item x.
I recommend reading the whole paper. Its well written, pretty easy to grok, and throws some serious shade at Bloom filters.
P.S. Dig this nifty visualization
Better searching of Bash history
I have known about ctrl+r for a while as a way to reverse search your bash history. But, I did not know I could press ctrl+r again to scroll FURTHER back in history. I was just making my reverse search more specific all these years when I could just be pressing ctrl+r again! Also, ctrl+s allows you to search forward in history as well. No more history | grep <my search> for me :D.
How the Linux auditing system logs events
At my current company we have a project called auditbeat that tails the Linux Audit log and parses into a useful format for data shipping. Curiosity struck and I looked into how audit logs are written and formatted. As with the majority of interesting and useful things in Linux related tech; there is an amazing Digital Ocean article explaining the auditing system. I ❤️ Digital Ocean and their community tutorials.
How to calculate the Coefficient of determination (R2)
I was recently tasked to write up an efficient way to calculate the R2 metric for evaluating regression type models. While trying to get this done with existing Elasticsearch aggregation, I found out that R^2 = 1 - (ResidualSumOfSquares/TotalSumOfSquares). Both of these variables can be calculated in two unique aggregations:
Latest comments (2)
This —> dev.to/subramanya4shenoy/how-to-as...
How Cuckoo Filters work
Cuckoo filters are a probabilistic data structure named after the cuckoo bird due to the propensity for some sub-species stealing other birds nests.
Cuckoo Filters provide the ability to REMOVE entries from their hash table. This is something that has been attempted through extensions to Bloom Filters. But the attempts usually end up with a complicated, or sub-optimal solution. In addition to being able to delete entries, Cuckoo Filters have a small memory foot print, comparative false positive rates, and are intuitively easy to understand.
You can read the paper here. Some important implementation highlights:
I recommend reading the whole paper. Its well written, pretty easy to grok, and throws some serious shade at Bloom filters.
P.S. Dig this nifty visualization
Better searching of Bash history
I have known about
ctrl+r
for a while as a way to reverse search your bash history. But, I did not know I could pressctrl+r
again to scroll FURTHER back in history. I was just making my reverse search more specific all these years when I could just be pressingctrl+r
again! Also,ctrl+s
allows you to search forward in history as well. No morehistory | grep <my search>
for me :D.How the Linux auditing system logs events
At my current company we have a project called auditbeat that tails the Linux Audit log and parses into a useful format for data shipping. Curiosity struck and I looked into how audit logs are written and formatted. As with the majority of interesting and useful things in Linux related tech; there is an amazing Digital Ocean article explaining the auditing system. I ❤️ Digital Ocean and their community tutorials.
How to calculate the Coefficient of determination (R2)
I was recently tasked to write up an efficient way to calculate the R2 metric for evaluating regression type models. While trying to get this done with existing Elasticsearch aggregation, I found out that
R^2 = 1 - (ResidualSumOfSquares/TotalSumOfSquares)
. Both of these variables can be calculated in two unique aggregations:Now, all you need is
1 - sum_of_residuals.value / statistics.variance * statistics.count
sinceTotalSumOfSquares(X) = variance(X) * |X|