Hostname: page-component-cd9895bd7-dk4vv Total loading time: 0 Render date: 2024-12-27T07:22:17.209Z Has data issue: false hasContentIssue false

Heuristic Assignments of Redundant Software Versions and Processors in Fault-tolerant Computer Systems for Maximum Reliability

Published online by Cambridge University Press:  27 July 2009

Soo Kar Leow
Affiliation:
Graduate Program in Operations Research North Carolina State University Raleigh, North Carolina 27695
David F. Mcallister
Affiliation:
Department of Computer Science North Carolina State University Raleigh, North Carolina 27695

Abstract

We address the problem of assigning multiple copies of n independently developed versions of a program to a set of m(m > n) possibly heterogeneous processors to maximize system reliability. This problem is viewed as a partition and assignment problem. We first partition the set of processors into n clusters or subgroups. A program version is then assigned to be executed on all the processors in the cluster. This means that each processor in the cluster will execute a copy of the assigned version. The cluster's unreliability is the probability of failure of all its processors. Component i of this system is composed of the copies of version i and the assigned cluster of processors.

Type
Articles
Copyright
Copyright © Cambridge University Press 1987

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Al-Khalili, A.J. & El-Hakeem, A.K. (1984). A computer control system for minimization of fuel consumption in urban traffic network. IEEE Real Time Systems Symposium, pp. 249254.Google Scholar
Avizienis, A. (1978). Fault-tolerance: The survival attribute of digital systems. Proc. IEEE, pp. 11091125.CrossRefGoogle Scholar
Avizienis, A. (1979). Toward a discipline of reliable computing. Proc. Euro. IFIP, pp. 701705.Google Scholar
Chen, L. & Avizienis, A. (1978). N-Version programming: A fault-tolerance approach to reliability of software operation. Digest FTCS-8, Eighth Annual Intl. Conference of Fault-Tolerant Computing. Toulouse, France, pp. 39.Google Scholar
Chen, P.Pin-Shan, & Akoka, J. (1980). Optimal design for distributed information systems. IEEE Trans. on Computers C-29 (12): 10681080.CrossRefGoogle Scholar
Denardo, E.V. (1982). Dynamic programming models and applications. New Jersey: Prentice Hall, Inc.Google Scholar
Dolny, L.J., Fleming, R.E., & De, Hoff R.L. (1981). Fault-tolerant computer system design using GRAMP. Proceedings of the IEEE Annual Reliability and Maintainability Symposium, pp. 417422.Google Scholar
Grnarov, A., Arlat, J., & Avizienis, A. (1980). On the performance of software fault-tolerance strategies. 10th Intl. Symp. on Fault Tol. Comp., pp. 251253.Google Scholar
Hecht, H. (1976). Fault-tolerant software for real-time applications. Computing Surveys 8 (4):391407.CrossRefGoogle Scholar
IEEE. (1978). Special issue on fault-tolerant digital systems: Is Hal going to join us before 2001? Proc. IEEE 66(10):11071108.Google Scholar
Leow, S.K. (1986). Heuristic and optimal assignments of redundant software versions and processors in fault-tolerant computer systems for maximum reliability. Ph.D. Dissertation, Graduate Program in Operations Research, North Carolina State University, Raleigh, North Carolina.Google Scholar
Liu, C.L. (1968). Introduction to combinatorial mathematics. New York: McGraw Hill.Google Scholar
Ma, Perng-Yi R., Lee, E.Y.S., & Tsuchiya, M. (1982). A task allocation model for distributed computing systems. IEEE Trans. on Computers C-31(1):4147.Google Scholar
Makam, S.V. & Avizienis, A. (1984). An event-synchronized system architecture for integrated hardware and software fault-tolerance. Fourth International Conference on Distributed Computing Systems, pp. 357365.Google Scholar
Nijenhuis, A. & Wilf, H.S. (1978). Combinatorial algorithms for computers and calculators, 2nd ed.New York: Academic Press.Google Scholar
Randell, B. (1975). System structure for software fault tolerance. IEEE Trans. on Software Eng. SE−l(2):220232.CrossRefGoogle Scholar
Rennels, D.A. (1978). Architectures for fault-tolerant spacecraft computers. Proc. IEEE 66(10): 12551268.CrossRefGoogle Scholar
Scott, R.K., Gault, J.W., & McAllister, D.F. (1983). Modeling fault-tolerant software reliability. Proc. of the Third Symposium on Reliability in Distributed Software and Database Systems, pp. 1527.Google Scholar
Seban, R.R., Siegel, H.J., & Meyer, D.G. (1984). Data communications in a real-time distributed signal processing system: A case study. IEEE Real-Time Systems Symposium, pp. 263272.Google Scholar