A Probabilistic Framework for Evaluating Reliability in Distributed Computing Environments with Uncertain State Transitions

Authors

  • Elizabeth Danielle Stochastic Systems Analyst, United States Author
  • George William Probabilistic Modeling Engineer, United States Author

Keywords:

Distributed computing, probabilistic modeling, Markov model, Bayesian inference, system reliability, state transitions, failure analysis

Abstract

Reliability assessment in distributed computing environments is a critical challenge, particularly when systems exhibit uncertainty in state transitions due to dynamic workloads, unpredictable failures, and non-deterministic behavior. This paper introduces a probabilistic modeling framework that incorporates stochastic state transitions to evaluate system reliability in distributed settings. Leveraging Markov models and Bayesian inference, the framework enables estimation of transition probabilities and failure likelihoods under various operational scenarios. The proposed methodology facilitates more robust reliability planning and risk assessment by capturing non-deterministic system behavior through probabilistic abstraction. Simulation results illustrate the effectiveness of the framework in modeling complex distributed environments, highlighting its adaptability and computational efficiency across varying topologies and failure rates.

References

1. Trivedi, K.S., Bobbio, A.: Reliability and Availability Engineering. Springer, vol. 2, iss. 1

2. Dugan, J.B., Bavuso, S.J., Boyd, M.A.: Dynamic fault-tree models for fault-tolerant computer systems. IEEE Transactions on Reliability, vol. 41, iss. 3

3. Sahner, R.A., Trivedi, K.S.: Performance and reliability analysis using stochastic Petri nets. IEEE Transactions on Computers, vol. 36, iss. 10

4. Xie, M., Dai, Y.S., Poh, K.L.: Computing system reliability: Models and analysis. Springer, vol. 4, iss. 2

5. Lollini, P., Bondavalli, A.: A methodology for modeling and evaluating the dependability of critical systems. Journal of Systems and Software, vol. 82, iss. 1

6. Bondavalli, A., Simoncini, L.: Dependability modeling and evaluation of fault-tolerant systems. Computer, vol. 25, iss. 7

7. Bouissou, M., Bon, J.L.: A new formalism that combines advantages of fault-trees and Markov models. Reliability Engineering and System Safety, vol. 82, iss. 2

8. Koren, I., Krishna, C.M.: Fault-Tolerant Systems. Morgan Kaufmann, vol. 3, iss. 4

9. Laprie, J.C.: Dependable computing: Concepts, limits, challenges. Special Issue on Dependable Computing, vol. 5, iss. 3

10. Kwiatkowska, M., Norman, G., Parker, D.: PRISM: Probabilistic symbolic model checker. ACM SIGMETRICS, vol. 4, iss. 1

11. Avizienis, A., Laprie, J.C., Randell, B.: Fundamental concepts of dependability. IARP/IEEE Workshop, vol. 3, iss. 2

12. Bechta Dugan, J., Bavuso, S., Boyd, M.: Fault trees and Markov models for reliability analysis. IEEE Transactions on Reliability, vol. 41, iss. 3

13. Wang, W., Trivedi, K.S.: Reliability modeling of cloud computing systems. Journal of Cloud Computing, vol. 1, iss. 1

14. Meyer, J.F.: On evaluating the performability of degradable computing systems. IEEE Transactions on Computers, vol. 29, iss. 8

15. Musliner, D.J., Durfee, E.H., Shin, K.G.: CIRCA: A cooperative intelligent real-time control architecture. IEEE Transactions on Systems, Man, and Cybernetics, vol. 23, iss. 6

Downloads

Published

2020-02-20

How to Cite

A Probabilistic Framework for Evaluating Reliability in Distributed Computing Environments with Uncertain State Transitions. (2020). International Journal of Computing Science and Systems (IJCSS), 1(1), 1-7. https://ijcss.com/index.php/about/article/view/IJCSS_01012020