A Probabilistic Framework for Evaluating Reliability in Distributed Computing Environments with Uncertain State Transitions
Keywords:
Distributed computing, probabilistic modeling, Markov model, Bayesian inference, system reliability, state transitions, failure analysisAbstract
Reliability assessment in distributed computing environments is a critical challenge, particularly when systems exhibit uncertainty in state transitions due to dynamic workloads, unpredictable failures, and non-deterministic behavior. This paper introduces a probabilistic modeling framework that incorporates stochastic state transitions to evaluate system reliability in distributed settings. Leveraging Markov models and Bayesian inference, the framework enables estimation of transition probabilities and failure likelihoods under various operational scenarios. The proposed methodology facilitates more robust reliability planning and risk assessment by capturing non-deterministic system behavior through probabilistic abstraction. Simulation results illustrate the effectiveness of the framework in modeling complex distributed environments, highlighting its adaptability and computational efficiency across varying topologies and failure rates.
References
1. Trivedi, K.S., Bobbio, A.: Reliability and Availability Engineering. Springer, vol. 2, iss. 1
2. Dugan, J.B., Bavuso, S.J., Boyd, M.A.: Dynamic fault-tree models for fault-tolerant computer systems. IEEE Transactions on Reliability, vol. 41, iss. 3
3. Sahner, R.A., Trivedi, K.S.: Performance and reliability analysis using stochastic Petri nets. IEEE Transactions on Computers, vol. 36, iss. 10
4. Xie, M., Dai, Y.S., Poh, K.L.: Computing system reliability: Models and analysis. Springer, vol. 4, iss. 2
5. Lollini, P., Bondavalli, A.: A methodology for modeling and evaluating the dependability of critical systems. Journal of Systems and Software, vol. 82, iss. 1
6. Bondavalli, A., Simoncini, L.: Dependability modeling and evaluation of fault-tolerant systems. Computer, vol. 25, iss. 7
7. Bouissou, M., Bon, J.L.: A new formalism that combines advantages of fault-trees and Markov models. Reliability Engineering and System Safety, vol. 82, iss. 2
8. Koren, I., Krishna, C.M.: Fault-Tolerant Systems. Morgan Kaufmann, vol. 3, iss. 4
9. Laprie, J.C.: Dependable computing: Concepts, limits, challenges. Special Issue on Dependable Computing, vol. 5, iss. 3
10. Kwiatkowska, M., Norman, G., Parker, D.: PRISM: Probabilistic symbolic model checker. ACM SIGMETRICS, vol. 4, iss. 1
11. Avizienis, A., Laprie, J.C., Randell, B.: Fundamental concepts of dependability. IARP/IEEE Workshop, vol. 3, iss. 2
12. Bechta Dugan, J., Bavuso, S., Boyd, M.: Fault trees and Markov models for reliability analysis. IEEE Transactions on Reliability, vol. 41, iss. 3
13. Wang, W., Trivedi, K.S.: Reliability modeling of cloud computing systems. Journal of Cloud Computing, vol. 1, iss. 1
14. Meyer, J.F.: On evaluating the performability of degradable computing systems. IEEE Transactions on Computers, vol. 29, iss. 8
15. Musliner, D.J., Durfee, E.H., Shin, K.G.: CIRCA: A cooperative intelligent real-time control architecture. IEEE Transactions on Systems, Man, and Cybernetics, vol. 23, iss. 6
Downloads
Published
Issue
Section
License
Copyright (c) 2020 Elizabeth Danielle, George William (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.