STONITH definition - MSMSoft glossary

STONITH stands for “Shoot The Other Node In The Head”. The name is harsh because the job is harsh: when a cluster cannot trust a node, it must make sure that node is not still changing shared state. In Pacemaker-style clusters, STONITH is often implemented through power controllers, IPMI, cloud APIs, hypervisor controls, or storage-level isolation.

Imagine a two-node service where node A stops responding to cluster heartbeats but still runs the application. Node B is ready to take over. If B promotes without STONITH, both nodes may handle traffic or write to the same storage. If STONITH succeeds first, B can take over knowing A is off or isolated. The user-visible outage may be a little longer, but the system avoids ambiguous ownership.

The operational detail matters: a configured STONITH resource is not the same as a reliable one. It needs credentials, network reachability, timeout settings, and regular drills that prove it still works after firmware, cloud, or access-policy changes.