Load shedding usually enters the conversation late. The system is already slow, the queue is already growing, and someone says, "Can we just reject some work?" Sometimes the answer is yes. More often the system has been rejecting work for weeks, just in the worst possible way.
A timeout is load shedding. A full queue is load shedding. A client that retries five times is load shedding with extra traffic. A database connection pool that makes every request wait behind one expensive report is also load shedding, only now the policy is accidental.
The question is not whether the system sheds load. It will. The question is who gets to decide what happens first.
In one service, the failure mode looked like a general slowdown. Under batch load, live API requests would wait behind expensive jobs. Nothing crashed. That made it harder to argue about. The service was "up" while users watched spinners and support collected screenshots.
The first useful change was separating classes of work. Live requests, background jobs, retries, and operator actions stopped sharing one moral universe. They got different queues, different limits, and different failure messages. The second change was making rejection cheap. If the system knows it cannot do the work, it should say so before it allocates half the world trying to be polite.
This is where product language matters. A 503 with a retry-after header is not just an HTTP detail. It tells the caller what kind of failure happened. A stale-but-marked response may be better than a slow perfect response. A disabled export button during overload may be kinder than letting every click join a queue that will never drain.
Good load shedding feels almost boring. The important path stays usable. The expensive path slows down or says no. Operators can see which policy fired. Customers get a clear failure instead of random behavior. Nobody has to pretend that infinite demand is an engineering requirement.
High-load work is often sold as making systems faster. Sometimes that is true. But the more valuable work is making the system honest under pressure. When it cannot do everything, it should still know what matters.