Systems seem to run at the very edge of failure much of the time. The combination of high workload, limited resources, pressure for additional features and capability, and inherent software, hardware, and network fragility is a noxious kettle of stuff always about to boil over in the form of outages, degraded response, or functional breakdowns.
Uploaded image
For insiders the surprising thing about our systems is not that they fail so often but that they fail so rarely! This good performance in the face of adverse conditions is called resilience. An important conclusion from resilience studies is that it depends critically on human operators and their ability to anticipate and monitor the system, react to threats, and sacrifice some goals to protect others.
YOUTUBE PGLYEDpNu60 Published Oct 15, 2013.
This talk will introduce resilience and a model of system dynamics useful in analyzing failed and successful event management and offer an explanation for why our systems run at the edge of failure.
.
Risk management in a dynamic society: a modeling problem. Jens Rasmussen. elsevier
Also called normalization of deviance in analysis of the Challenger launch decision. wikipedia
Rasmussen's often replicated phase diagram illustrating the drift toward danger. google