DEV Community

Comiscience
Comiscience

Posted on

How an open-source feature flagging service supports tens of thousands of DUs on a free cloud instance

I developed an open-source feature flagging service written in .NET 6 and Angular. I have created a load test for the real-time feature flag evaluation service to understand my current service's bottlenecks better.

The evaluation service receives and holds the WebSocket connections sent by APPs, evaluates the variation of feature flags for each user/device, and sends them back to users via WebSocket. It's the most important service which can easily reach performance bottlenecks.

Here are some load test details:

Environment

A commonly available AWS EC2 service was used to host the Evaluation Server service for the tests. The instance type selected was AWS t2.micro with 1 vCPU and 1 GiB RAM, which is free tier eligible.

To minimize the network impact on the results, the load test service (K6) runs on another EC2 instance in the same VPC.

General Test Conditions

The tests were designed to simulate real-life usage scenarios. The following test conditions were considered:

  • Number of new WebSocket connections established (including data-sync (1)) per second
  • The average P99 response time (2)
  • User actions: make a data synchronization request after the connection is established

(1) data-sync (data synchronization): the process by which the evaluation server evaluates all of the user's feature flags and returns variation results to the user via the WebSocket.

(2) response time: the time between sending the data synchronization request and receiving the response

Tests Performed

  • Test duration: 180 seconds
  • Load type: ramp-up from 0 to 1000, 1100, 1200 new connections per second
  • Number of tests: 10 for each of the 1000, 1100 and 1200 per second use case

Test Results

The results of the tests showed that the Evaluation Server met the desired quality of service only up to a certain limit load. The service was able to handle up to 1100 new connections per second before P99 exceeded 200ms.

The response time

Number of new connections per second Avg (ms) P95 (ms) P99 (ms)
1000 5.42 24.7 96.70
1100 9.98 55.51 170.30
1200 34.17 147.91 254.60

Peak CPU Utilization %

Number of new connections per second Ramp-up stage Stable stage
1000 82 26
1100 88 29
1200 91 31

Peak Memory Utilization %

Number of new connections per second Ramp-up stage Stable stage
1000 55 38
1100 58 42
1200 61 45

how we run the load test

You can find how we run the load test (including code source and test dataset) on our GitHub repo:

https://github.com/featbit/featbit/tree/main/benchmark

Could you give us a star if you like it?

Conclusion

The Evaluation Server was found to be capable of providing a reliable service for up to 1100 new connections per second using a minimum hardware setting: AWS EC2 t2.micro (1 vCPU + 1 G RAM). The maximum number of connections held for a given time was 22000, but this is not the limit.

We believe the reported performance is sufficient for small businesses at negligible cost (free tier). Capacity can easily be multiplied by horizontally scaling the service as the business grows.

NOTE

All questions and feedbacks are welcome. You can join our Slack community to discuss. Or visit FeatBit GitHub Repo to get more information.

Blog Original article address: https://www.featbit.co

Top comments (0)