DEV Community

Cover image for Why SafeLine is better than traditional WAF?
Lulu
Lulu

Posted on

Why SafeLine is better than traditional WAF?

Drawbacks of traditional WAFs

Traditional WAFs typically use regular expressions to define attack patterns. Taking the well-known ModSecurity engine as an example, 80% of WAFs in the world are powered by it. Let's analyze what his rules are like.

  • union[\w\s]?select: This rule defined an SQL injection attack pattern while the traffic contains the words "union" and "select".
  • \balert\s(:This rule defined an XSS attack pattern while the traffic contains the - word "alert" followed by a left parenthesis "(".

Real attackers they can easily bypass these keywords, thus circumventing the protection of the WAF. Using the rules mentioned above, let's look at some examples of false negatives:

  • union /**/ select: By inserting a comment character between "union" and "select," the keyword pattern is disrupted, making the attack undetectable.
  • window'\x61lert': By replacing the letter "a" with "\x61," the keyword pattern is disrupted, making the attack undetectable.

From these examples, we can conclude that traditional regex-based WAFs cannot effectively prevent attacks as they can always be bypassed by hackers.

Furthermore, regular expressions also cause a high rate of false positives, resulting in genuine website users being affected. Let's look at some examples of false positives:

  • The union select members from each department to form a committee: It triggers the above-mentioned rule and gets mistakenly identified as an SQL injection attack, while it is just a simple English sentence.
  • Her down on the alert(for the man) and walked into a world of rivers: It triggers the above-mentioned rule and gets mistakenly identified as an XSS attack, while it is just a simple English sentence.

Here, we share two readings to see how the masters from the Black Hat conference automate bypassing regex-based WAF protections:

How to use syntax analysis in WAF

Syntax analysis algorithm is the core capability of SafeLine WAF. Instead of using simple regex patterns to match the attack traffic, it truly understands the user inputs in the traffic and deeply analyzes potential attack behaviors.

Taking SQL injection as an example, attackers need to meet two conditions to successfully carry out SQL injection attacks:

  • The traffic contains an SQL statement, and it must be a syntactically valid SQL statement fragment.
    • union select xxx from xxx where is a syntactically valid SQL statement fragment.
    • union select xxx from xxx xxx xxx xxx xxx where is not a syntactically valid SQL statement fragment.
    • 1 + 1 = 2 is a syntactically valid SQL statement fragment.
    • 1 + 1 is 2 is not a syntactically valid SQL statement fragment.
  • SQL statements necessarily have malicious behavior, not just meaningless statements.
    • union select xxx from xxx where has the potential for malicious behavior.
    • 1 + 1 = 2 has no practical meaning.

SafeLine WAF conducts attack detection based on the essence of SQL injection attacks, following a process similar to the one below:

  1. Parsing the HTTP traffic to find positions with potential inputs.
  2. Deeply recursive decoding of the parameters, embracing the most primitive user input.
  3. Checking if the user input conforms to SQL syntax.
  4. Detecting the possible intentions behind the SQL syntax.
  5. Scoring the malicious intentions and deciding whether to intercept.

SafeLine WAF has built-in compilers covering common programming languages. By deeply decoding the payload content of HTTP, it matches the corresponding syntax compiler based on the language type and then matches the threat model to obtain the threat rating, allowing or blocking access requests.

Why semantic analysis is more powerful

Students majoring in computer science have studied compiler principles, where Chomsky's grammar system is mentioned. He divides formal languages in the computer world into four types:

  • Type 0 Grammar (Unrestricted Grammar): Recognizable by Turing Machines
  • Type 1 Grammar (Context-Sensitive Grammar): Recognizable by Linear Bounded Automata
  • Type 2 Grammar (Context-Free Grammar): Recognizable by Pushdown Automata
  • Type 3 Grammar (Regular Grammar): Recognizable by Finite State Automata

The expressive power of these four grammars weakens from level 0 to level 3. The programming languages we commonly use, such as SQL, HTML, and JavaScript, are usually Type 2 grammars (even including some elements of Type 1 grammars). On the other hand, regular expressions correspond to the weakest expressive power of Type 3 grammars.

To what extent is the expressive power of regular expressions weak? A classic example is that regular expressions cannot count. You cannot even use a regular expression to recognize a valid string of matched parentheses.

Using the weak expressive power of Type 3 grammars to match dynamically changing attack payloads is impossible. The reason lies in the inherent limitations of rule-based attack recognition methods. From a comparison of grammar expressive power, Type 3 grammars are included within Type 2 grammars. Rule-based descriptions based on regular expressions cannot fully cover attack payloads based on programming languages. This is the fundamental reason why rule-based attack recognition in WAFs has lower protection effectiveness than expected.

Therefore, compared to regex-based pattern matching threat detection methods, syntax analysis has the characteristics of high accuracy and low false positive rate.

Finally, I recommend you to try https://waf.chaitin.com/
Discord
GitHub

Top comments (0)