Let's start with a question: When you loaded this page did you experience the value the server's requests per second provided?
The web browser sent a request to the domain. Which, for a standard stack, will look something like:
- domain name lookup to one or more addresses
- one of these addresses is selected
- request is sent to the selected address
- arrives at load balancer
- dispatched to one or more nodes
- that specific node(s) processes the request and replies
- load balancer replies with a selected request
Where in that did the user experience "requests per second?".
They didn't directly experience this for a (single) request. The user experience was of the latency. They did not directly experience maximum throughput or even per-node throughput.
The service provider does receive value from per node throughput: Typically the higher throughput the more value per node in the system. Which, being dependent on value, depends on the customer. If the value per request warrants the cost then the throughput can be scaled.
The service provider also receives value from the max throughput of the system: A higher max throughput the more customers can be supported without queueing.
Imagine two scenarios:
- A service that provides a security alarm system at 10k requests/sec. For one customer. With a latency of 20 days per request.
That would probably not provide the experience the customer expects from a security system :) Even if each request is high value; with that latency the value is unlikely to be realized.
- A service that provides bank transactions at a latency of 1 second. With a per-node throughput of 1 request per second; the entire node processes only a single request at a time.
That might be entirely fine. Suppose each node cost 10$ for the duration of the request... but the request provided the company a value of 100$. I'd happily take that per-node throughput! The customer still experiences a fast transaction and the company can afford to horizontally scale and still profit.
Certainly don't forget throughput when decided the value of an implementation. Good performance engineering practice would be to record this performance data. Even if not explicitly optimizing for this. Designing systems to provide value to customers is always the priority. Which likely means a focus on latency first and throughput second.