Stratus- Architecture Of Fault Tolerant System

How will you decide a system is fault tolerant or not. I have made it simple. If the below characteristics a system met then we can say it as fault tolerant system.

Modularity: The hardware and software are constructed of modules of fine granularity. These modules constitute units of failure, diagnosis, service, and repair. Keeping the modules as decoupled as possible reduces the probability that a fault in one module will affect the operation of another.

Fail-Fast Operation: A fail-fast module either works properly or stops. Thus, each module is self-checking and stops upon detecting a failure. Hardware checks (through error-detecting codes; and software consistency tests support fail-fast operation.

Single Failure Tolerance: When a single module (hardware or software) fails, another module immediately takes over. For processors, this means that a second processor is available. For storage modules, it means that the module and the path to it are duplicated.

Online Maintenance: Hardware and software modules can be diagnosed, disconnected for repair and then reconnected, without disrupting the entire system’s operation.

Also read

Advertisements

Author: Srini

Experienced software developer. Skills in Development, Coding, Testing and Debugging. Good Data analytic skills (Data Warehousing and BI). Also skills in Mainframe.