Reliability of Computer Systems and Networks
Fault Tolerance, Analysis, and Design
1. Edition February 2002
XXIV, 528 Pages, Hardcover
Textbook
Short Description
Fault-tolerant computing is the use of redundant elements in a system's design which allow the system to continue functioning when a component fails. These safety-critical systems are built into transportation systems and network servers, among other items. The author provides comprehensive coverage of this technology with heavy emphasis on the fundamentals.
With computers becoming embedded as controllers in everything from network servers to the routing of subway schedules to NASA missions, there is a critical need to ensure that systems continue to function even when a component fails. In this book, bestselling author Martin Shooman draws on his expertise in reliability engineering and software engineering to provide a complete and authoritative look at fault tolerant computing. He clearly explains all fundamentals, including how to use redundant elements in system design to ensure the reliability of computer systems and networks.
Market: Systems and Networking Engineers, Computer Programmers, IT Professionals.
Introduction.
Coding Techniques.
Redundancy, Spares, and Repairs.
N-Modular Redundancy.
Software Reliability and Recovery Techniques.
Networked Systems Reliability.
Reliability Optimization.
Appendix A: Summary of Probability Theory.
Appendix B: Summary of Reliability Theory.
Appendix C: Review of Architecture Fundamentals.
Appendix D: Programs for Reliability Modeling and Analysis.
Name Index.
Subject Index.
"...a useful reference." (IEEE Computer-Review, August 2002)
"The author has created a wonderful toolbox for systems engineers. So much is right here in one place, and organized effectively. I recommend this book to anyone working on networks or systems where reliability is a concern." (IIE Transactions on Quality and Reliability Engineering)
"...very good practical hints...recommended for everyone who wants to learn either reliability fundamentals or know about the computer applications of reliability..." (Comsoc.org, April 2003)