Figure 1: The Musa-Okumoto Model Fit To Failure Data |
I was partly inspired for this post by a dispute between Matheus Grasselli and Philip Pilkington over the applicability of Bayesian statistics to non-ergodic processes. I happen to know of certain parametric software reliability models that describe software failures as non-stationary stochastic processes. Bev Littlewood is a prominent proponent of Bayesian models in software reliability engineering. And he has also recommended a certain approach to accessing the goodness of fit for software reliability models, including ones not originally put forth as Bayesian models.
Here I provide an example of an application of this approach for assessing goodness of fit. I do not see what Bayesianism has to do with this approach. It is not expounded in terms of prior and posterior distributions. I do see that this approach considers the model(s) being assessed as part of a larger system for making predictions, with the parameter estimates on which those predictions are based being updated with each new observation. Is this approach, therefore, Bayesian?
2.0 Failure DataThe data used for illustration here consists of 367 failures observed over 73 weeks, and they were provided by the Jet Propulsion Laboratory (JPL). The system under test was:
"A facility for tracking and acquiring data from earth resource satellites in high-inclination orbits... About 103,000 uncommented Source Lines Of Code in a mixture of C, Fortran, EQUEL, and OSL... Failure data were obtained from the development organization's anomaly reporting system during software integration and test." -- Nikora and Lyu (1996).
The blue line in Figure 1 above represents the data. This is a step function, where each step occurs after an interval of one week. The height of each step represents the number of failures observed in the week ending at the start of the step. Figure 1 also shows a point estimate of the mean value function, and 90% confidence intervals, based on fitting the Musa-Okumoto software reliability model to the data.
3.0 The Musa-Okumoto ModelAs in other software reliability models, the Musa-Okumoto model relies on a distinction between failures and faults. A failure is behavior, including unexpected termination, observed while the software system under test is running. A failure occurs when the system is observed to behave other than specified. A fault, also known as a bug, is a static property of the system. A failure occurs when the inputs to the system cause altered behavior from the software entering a state in which a bug is tripped.
The Musa-Okuomoto model is an example of a software reliability model based on the assumption that the system under test contains an infinite number of faults. No matter how many faults have been found and removed, more faults always exist. Faults found later, however, tend to have an exponentially decreasing impact on the failure rate. More formally, the assumptions of the model are:
- The numbers of failures observed in non-overlapping time intervals are stochastically independent.
- The probability of one failure occurring in a small enough time interval is roughly proportional to the length of the time interval.
- The probability of more than one failure occurring in a small enough time interval is negligible.
- Detected faults are immediately corrected.
- No new faults are introduced during the fault-removal process.
- Aside from such repair of faults, the software is unchanged during test. No new functionality is introduced.
- Testing is representative of operational usage.
- The failure rate decreases exponentially with the number of removed faults.
The first three assumptions imply that failures occur as a Poisson Process, where a Poisson Process describes events happening randomly in time, in some sense. The sixth and seventh assumptions are unlikely to be met for the data being analyzed here. For best use of these kinds of software reliability models, system test should be designed to randomly select inputs from a specified operational profile. Nevertheless, these sorts of models have often been applied to test data observed naturally. The remaining assumptions imply that failures occur as a Non-Homogeneous Poisson Process (NHPP), where the failure rate is the following function of the mean value function:
λ(t) = λ0 e-θ m(t)
where:
- λ(t) is the failure rate, in units of failures per week. (The distinction between the failure rate and the failure intensity is moot for Poisson Processes.)
- m(t) is the mean value function, that is, the number of failures expected by week t.
- λ0 is the initial failure rate.
- θ is the failure rate decay parameter.
Figure 2 plots the above function for the data for the system under test, with parameter estimates being based on all of the data. The mean value function is plotted with the data along the abscissa for each week by taking the cumulative number of failures observed by the end of that week. The corresponding ordinate value for the data is the quotient of the number of failures observed in each time interval (namely, a week) and the length of the time interval. The graph shows quite a bit of variation on this scale, clustered around the model estimate. A slight improvement in software reliability is noticeable, but the convexity in the estimate from the model is hard to see.
Figure 2: The Failure Rate As A Function Of The Mean Value Functon |
4.0 The U-Test: How Am I Doing
An attempt was made to estimate the model parameters after each week. For this model, maximum likelihood estimates are found by an iterative algorithm. The iterative algorithm did not converge at the end of week 49, leaving the twenty four parameter estimates shown in the table in Figure 3. Figure 3 also shows percent variability of the parameter estimates. (Percent variability is the ratio of the absolute difference of the estimates for two successive weeks to the estimate for the earlier of the two weeks.) The model estimates seem to have settled down towards the end of test period. They are unlikely to vary much over succeeding weeks if testing is to continue.
Figure 3: Percent Variability In Parameter Estimates |
In the Musa-Okumoto model, the number of failures observed in a given week is a realization of a random variable from a Poisson distribution. The mean of this distribution, for given model parameters, varies with the week. The expected number of failures declines in later weeks. In the approach illustrated here, the model parameters are re-estimated in each successive week. Specifically, define ui; i = 1, 2, ..., 23; as the value of the CDF for the number of failures in week 50 + i, where:
- The CDF is evaluated for the number of failures actually observed in the given week.
- The CDF is calculated with estimates of the model parameters based on the number of failures observed in all weeks up to, but not including the given week.
For example, u1 is an estimate of the CDF for the number of failures found in week 51, based on parameter estimates found with the number of failures observed in weeks 1 through 50. This procedure yields a sequence of (temporally ordered) numbers:
u1, u2, ..., u23
Each ui, being a probability, is between zero and one. These numbers, in effect, provide a measure of how well the model, with its estimation procedure, is predicting failures of the system under test. Under the null hypothesis that the model describes the test process generating the failure data, the sequence of uis is the realization of a random sample from a probability distribution uniformly distributed on the interval [0, 1].
The Kolmogorov-Smirnov statistic provides a formal test that the uis are realizations of identical and independently distributed random variables from a uniform distribution. Figure 4 compares and contrasts the CDF from a uniform distribution with the empirical probability distribution found from the data. The theoretical distribution for the uniform distribution is merely a 45 degree line segment sloping upward from the origin to the point (1, 1). That is, the probability that a realization of a random variable uniformly distributed on [0, 1] will be less than some given value u is merely that given value.
Figure 4: The U-Test For The Example |
The empirical CDF found from the Musa-Okumoto model applied to the data is a step function. The probability that a random variable from this distribution will be less than a given value u is the proportion of uis in the data less than that given value. The exact location of the step function is found as follows. First, sort the sample values in increasing order:
u(1) ≤ u(2) ≤ ... ≤ u(23)
The sorted values are known as order statistics. (Note the temporal order is lost by this sorting; this test does not reject the model if, for example, overestimations of failures in earlier weeks average out underestimations of failures in later weeks.) For this data, u(1) is approximately 0.007271, and u(23) is approximately 0.939. Plot these order statistics along the abscissa. A step occurs at each plotted order statistic, and each step is, in this case, of size 1/(23 + 1) (approximately 0.042) up.
The Kolmogorov-Smirnov statistic is the maximum vertical distance between the theoretical CDF, specified by the null hypothesis, and the empirical CDF, as calculated from the data. In the figure, this maximum vertical distance is 0.18869, found for the second step up, at u(2) (which is approximately 0.23). The p-value of 34.2% is the probability that the empirical CDF, for a random sample of 23 points drawn from a uniform sample, will be at least this far above or below the empirical CDF. Since the p-value exceeds the pre-selected conventional level of statistical significance of 10%, the data does not allow one to conclude that the model does not fit the data. (One should set a fairly low level of statistical significance so as to increase the power of the test.) Thus, one accepts that the Musa-Okumoto model is successful in predicting failures in weeks one step ahead.
5.0 ConclusionsThe above analysis suggests that, despite the violation of model assumptions in the test process, JPL could have used the Musa-Okumoto model as an aid in managing this test phase of the tracking and data acquisition system considered here.
In what sense, if any, is the method illustrated here for analyzing the model's goodness-of-fit Bayesian?
References- Sarah Brocklehurst and Bev Littlewood (1996). Techniques for Prediction Analysis and Recalibration, in Handbook of Software Reliability (edited by M. R. Lyu), McGraw-Hill.
- Grasselli, Matheus (2013). Keynes, Bayes, and the law.
- Nikora, Allen P. and Michael R. Lyu (1996). Software Reliability Measurement Experience, in Handbook of Software Reliability (edited by M. R. Lyu), McGraw-Hill.
- Pilkington, Philip (2013). Holey Models: A Final Response to Matheus Grasselli.