DEV Community

Lulu
Lulu

Posted on

SafeLine WAF: Performance Testing and Optimization

SafeLine's Community Edition WAF has been on the market for a while now, and its semantic analysis technology is impressive in both detection accuracy and low false positive rates. The Community Edition is said to use the same detection engine as the Enterprise version, although it might not be the latest release.

Performance Questions and Testing Approach

The key question remains: does the Community Edition compromise on detection performance? Additionally, how much resource allocation is required to ensure that the WAF doesn’t become a bottleneck under specific traffic conditions on my website? To address these concerns, we decided to conduct a stress test to gauge SafeLine's actual performance and share some data for reference. We also explored potential optimization methods to extract even better performance from the Community Edition within the available resources.

Test Setup

  • WAF Configuration: We set up a single site that forwards traffic to a business server running on the same machine.
    Image description

  • Business Server: A basic Nginx server returning a simple 200 OK page.
    Image description

Testing Tools

  • wrk: A basic HTTP performance testing tool.
  • wrk2: A modified version of wrk that allows testing with a fixed QPS (queries per second).

Testing Strategy

Our primary focus in this test was the performance of various services related to traffic detection, defined as the maximum QPS that can be supported by a single service occupying a single CPU core.

We used two types of requests: a simple GET request without a request body, and a GET request with a 1K JSON body. The core metric for WAF performance is the number of HTTP requests that can be inspected per second, making QPS a more relevant parameter than network layer throughput.

Test Process

  1. Assessing Service Functionality We started by sending a random load of 1000 QPS with a simple GET request to observe how the load on different services fluctuates with QPS. Image description

Load Distribution:
Image description

The results revealed three services with load directly tied to traffic:

  • safeline-tengine: A reverse proxy based on Alibaba's forked and modified version of Nginx.
  • safeline-detector: Likely the detection service, receiving requests from Nginx for inspection.
    Image description

  • safeline-mario: Presumably responsible for analyzing and persisting detection logs.
    Image description

  1. Baseline Performance Testing with Simple Requests We set the CPU usage limit for all services to 1 core and executed docker compose up -d to apply the changes (in SafeLine's default installation directory: /data/safeline).

Image description

Then, we used wrk to measure the maximum QPS.

Image description

The results showed a maximum QPS of 4175, with the detector service hitting 100% CPU usage, indicating it as the first bottleneck.

Image description

After analyzing the detector’s CPU usage, we noticed that the process, named snserver, was multi-threaded, with the number of threads roughly equal to the CPU core count. However, context switching overhead was high due to each thread getting only a small slice of CPU time.

Image description

To reduce context switching, we modified the detector’s configuration file (resources/detector/snserver.yml) to reduce the number of threads to 1.

After restarting the detector, we observed a significant performance increase, with QPS rising to over 17,000.

Image description

  1. Performance Testing with Complex Requests Next, we generated more complex requests using wrk’s Lua scripting:

Image description

After sending requests with a 1K body, we recorded a QPS of just over 10,000.

Image description

Image description

The detector remained the bottleneck, but Nginx and Mario both showed reduced CPU usage. This suggests that the detector's detection engine requires more CPU resources for larger, more complex requests.

Image description

We also tested Mario's single-core performance under load, noting that Mario’s memory usage would continue to increase under high load, posing an OOM (Out of Memory) risk. After fine-tuning, we found that Mario could handle around 11,000 QPS on a single core without significant memory buildup.

Image description

Testing Summary

The performance of the three critical services under load is summarized in the table below (requests include a 1K body):

Service Effect Single Core Max QPS
safeline-tengine Reverse proxy 28,000
safeline-detector Detection Services 10,000
safeline-mario Log Analysis and Persistence 11,000

Based on these results, we can estimate the overall single-core QPS capacity, giving an idea of the load SafeLine can handle when deployed on a machine with limited CPU resources.

Image description

Potential Optimization Points:

  1. Thread Contention in Detector: The multi-threaded nature of the detector service might introduce synchronization overhead, leading to higher CPU usage.
  2. Memory Usage in Mario: Mario's memory usage increases under high load, risking an OOM scenario. Addressing this issue could enable further performance tuning.

We hope Chaitin Technology addresses these optimization points in future updates. With these improvements, it would be interesting to see how much further we can push SafeLine's performance under full load, especially on machines with limited memory.

Top comments (0)