open-appsec is an open-source initiative that builds on Machine Learning. It utilizes a three-phase approach for detecting and preventing web application and API attacks.
This blog explains how these three phases deliver accurate results with a very low amount of false positives and how they protect the environment against known and unknown zero-day attacks with real-time protection.
Phase 1 – Payload Decoding
Effective machine learning requires a deep understanding of the underlying application protocols which is continuously evolving. The engine analyzes all fields of the HTTP request including the URLs, HTTP headers, which are critical in this case, JSON/XML extraction and payload normalization such as base64 and other decoding's. A set of parsers covering common protocols feeds the relevant data into phase 2.
For example, in the case of Log4Shell attacks, some exploit attempts were using base64 and escaping encoding so it was possible to pass a space character for applying parameters.
Phase 2 – Attack Indicators
Following parsing and normalization, the network payload input is fed into a high-performance engine which is looking for attack indicators. An attack indicator is a pattern of exploiting vulnerabilities from various families. We derive these attack patterns based on on-going off-line supervised learning of huge number of payloads that are each assigned a score according to the likelihood of being benign or malicious. This score represents the confidence level that this pattern is part of an attack. Since combinations of these patterns can provide a better indication for an attack a score is also calculated for the combination of patterns.
For example, in the case of Log4Shell and Spring4Shell attacks, open-appsec used several indicators from Command Injection / Remote Code Execution / Probing families that signaled payloads to be malicious in a very high score which was enough on its own, but to ensure accuracy and avoidance of false positives, the engine always moves to the third and last phase.
Phase 3 – Contextual Evaluation Engine
This contextual engine is using machine learning techniques to make a final determination whether the payload is malicious, in the context of a specific customer/environment, user, URL and field that in a weighted function sums up to a confidence score. If the score is larger than the threshold the request is dropped.
These are the factors that are considered by the engine:
Reputation factor
In each request, the request originator is assigned a score. The score represents the originator’s reputation based on previous requests. This score is normalized and used to increase or decrease the confidence score.
Application awareness
Often modern applications allow users to modify web pages, upload scripts, use elaborate query search syntax, etc. These provide a better user experience but without application awareness, these are detected as malicious attacks. We use ML to analyze and baseline the underlying application’s behavior.
Learn user input format
The system can identify special user input types that are known to cause false detection and apply ML to modify our detection process and allow legitimate behavior without compromising attack detection.
False detection factor
If there is an inconsistency in detection a factor is applied to the confidence score based on the reputation factor per detection location.
Supervised learning module
Optional module that shows administrators payload and ask them to classify them thus accelerating the learning process.
Top comments (0)