the system grows for a while (increasing λ), or we reduce th...

the system grows for a while (increasing λ), or we reduce the number of servers (decreasing μ) to realize our efficiency gains. That causes ⍴ to pop back up, and latency to return to where it was. This often leads people to be disappointed about the long-term effects of efficiency work, and sometimes under-invest in it.

The system we consider above is a gross simplification, both in complexity, and in kinds of systems. Streaming systems will behave differently. Systems with backpressure will behave differently. Systems whose clients busy loop will behave differently. These kinds of dynamics are common, though, and worth looking out for.

The bottom line is that high-percentile latency is a bad way to measure efficiency, but a good (leading) indicator of pending overload. If you must use latency to measure efficiency, use mean (avg) latency. Yes, average latency4.