|This report is an archived publication and may contain dated technical, contact, and link information|
Publication Number: N/A
Producing Correct Software
Verification and Validation for Prediction Software
This web page accomplishes the following:
Predicting future or hypothetical events is an important application for computers. Computers are used, for example, to predict future traffic flows. When the domain has a precise theory by which predictions are made and the predictions have been extensively confirmed by experiment, one can apply the techniques for verifying software with algorithmic or logical specifications to the prediction software.
However, many prediction software programs make predictions when theory, past experimental verification, or both is missing. An example of this is prediction using a neural net. The following describes many of these applications:
When there is no precise, experimentally verified predictive theory, the only way to establish the correctness of prediction software is by experimentally verifying that the software predicts accurately.
In addition, the reliability of predictive software, particularly when experimentally verified theory is lacking, can be improved by doing the following:
The Required Level of Accuracy
For most predictive programs, it is known that the predictions made are not completely accurate because of the following causes, among others:
However, a prediction may be accurate enough for a given application, depending on the particular application. The application determines whether the prediction is accurate enough, but once a tolerance in the application domain is established, that tolerance can be changed to a required tolerance on that prediction.
Many decisions based on predictions depend on knowing the value of a function in another prediction. For example, finding the best ratio of green time for a freeway ramp meter, based on simulated traffic flows given traffic volume and road geometry, can be solved by finding the minimum waiting time for various green time ratios. If the waiting time function is accurately established from the simulation, one can find the approximate minimum of the function in the real world and make a sound traffic planning decision.
Let f be a real-world function of interest to a user of a prediction program P. Generally f depends on a number of variables:
For the y variables, P(yi) will denote the predicted value of yi, and R(yi) will denote the actual, real-world value. Likewise Rf and Pf will denote values of f based on predictions or real-world data.
The user wants the real-world value of f, Rf(X,Y,Z), which is the value of f computed at real-world values of the input variables, i.e.,
Rf(X,Y,Z) = f(X,RY,Z)
What the user actually gets from the prediction program is Pf, the value of f at the data point generated by the prediction program, i.e.,
Pf(X,Y,Z) = f(X,PY,Z)
If Pf is close enough to Rf to be within the application’s tolerance for f, Pf can be used in place of Rf. The problem for the user of P is to know when P is within the application tolerance.
One approach to this is to estimate the errors in f using a first order Taylor series approximation to R. Let Rf-Pf = E, the error of the prediction. To a first order approximation,
Rf = Pf + SUM df/dyi*ei, for 1<= i <=m.
where df/dyi is the partial derivative of f with respect to yi. From this equation one sees that Pf is a good approximation of Rf when the following conditions are met:
Wrapping a Prediction to Maintain Tolerance
The above formula provides a method for an application to wrap the input from a prediction program. P can be used for estimating f when fabs( f(P) - f(P+E)) < T, or when approximately SUM fabs(df/dyi)*fabs(ei) < T, if the following occurs:
If the errors in estimating the yi’s are known, the last inequality lets one estimate if P is useful for estimating f.
To wrap the prediction program and the application so that P is only used when it provides a close enough estimate of f, one can do the following:
Estimating the Error of a Prediction
To estimate the error of a prediction program P, one can compute the sample mean and variance for the errors of P for points in the following instances:
The generalized regression neural net provides a procedure for drawing such a local sample from a large database and for computing a sample mean and variance for the error.
Handling Uncertainties in P and E
Many prediction programs, particularly simulations, use random variables, so that the value of each yi on a particular run is a random variable. Also, the error in P will generally only be known until it reaches some accuracy represented by a sample variance. To insure that these errors do not exceed the tolerance for f, the application program should determine the region for possible values of Y+E and that the change in f in that region is within the tolerance.
Topics: Research, operations.
Keywords: Research, operations, software development, validation.
TRT Terms: Research, operations, Information organization, Information management, Data processing, Software, validation.