SPOF definition - MSMSoft glossary

The acronym means single point of failure. In production it is rarely just one server with no backup. It can be a DNS zone, one NAT gateway, one database primary, one human approval path, or one cron job that quietly prepares data for everything else. The important question is not “is there a spare?” but “what still has to be true for users to keep working?”

A concrete example is a web application with three application nodes behind a load balancer but only one Redis instance for sessions. The frontend looks redundant, yet a Redis restart logs everyone out or blocks every request waiting for session state. The failure mode is surprising because the architecture diagram has many boxes while the real dependency graph has one narrow bridge.

Removing a SPOF means making the dependency explicit, choosing whether it needs replication, failover, caching, graceful degradation, or a simpler operational rule, and then testing the failure path before an incident proves it for you.