The MDCD nature of our approach makes it differ significantly from traditional software fault tolerance techniques. Most importantly, rather than prevent, by controlling and mediating the information flow, erroneous information from affecting a component, the MDCD approach allows the interacting processes to talk to each other without restriction but keeps track of potential error contamination to enable recovery actions. Accordingly, we provide correctness validation only at the system boundary, and make use of the validation result to adjust our confidence in individual processes in the system and to enable message-driven confidence-driven checkpoint establishments. Furthermore, we make use of multiple software versions that are inherently available to us and non-dedicated hardware redundancy, keeping both development and performance costs low.
There are a number of factors other than upgrading, such as complexity, testability, and test coverage, that may lead us to discriminate among interacting software components in a distributed system with respect to our confidence in their trustworthiness. Those factors suggest that the MDCD approach can be utilized as a general-purpose low-cost software fault tolerance technique for distributed embedded computing. Accordingly, we have accomplished an algorithm extension that permits checkpoint establishments to be based on fine-grained confidence adjustment and thus enables the MDCD protocol to serve a general class of distributed embedded systems. Moreover, we have successfully extended the algorithms so that the MDCD protocol becomes able to coordinate with an existing time-based checkpointing protocol in a synergistic fashion for simultaneous tolerance of software and hardware faults.
The algorithm generalization and extension make the GSU methodology feasible for, in addition to NASA's long-life missions, various commercial applications which are subject to online software upgrading and require high availability and/or safety, such as transportation systems, airline reservation systems, telephone systems, and financial services.