U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
2023664000
Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations
This report is an archived publication and may contain dated technical, contact, and link information 

Publication Number: N/A
Date: 1999 
Producing Correct SoftwareApplying Experimental ResultsSummary This web page presents techniques for applying a database of previously experimentally measured inputoutput pairs to test the correctness of software. The Problem Experimental values can be used to wrap a computer program. Each data record in the set of experimental values should contain the following:
When the program is asked to compute the outputs at a new point, the known experimental values are used to predict the outputs, using some interpolation techniques. Only rarely is a record for the new input point already in the database; usually known experimental results for inputs closest to the new points have to be combined to make an experimentally based prediction of the program output. If the output of the program agrees with the prediction based on experimental data within an acceptable tolerance, the results of the program are accepted. Otherwise the results of the program are rejected, i.e., not used without further examination, perhaps by alerting an expert to the discrepancy between the experimentally predicted value and computed value. To wrap a program with predictions based on observations, the following problems must be addressed:
Some Interpolation Techniques
Deciding Between Interpolation Techniques
Comparing Estimators Given a prediction function, p(x), one measures the error in the prediction when compared to measured experimental values. This error function is abs_err(x), the absolute value of the error. [Taking the absolute value prevents positive and negative errors from canceling each other out in the mean and other statistics.] For each sample test point x in S, one can obtain abs_err(pa,x) and abs_err(pb,x) from the measured experimental value ex(x) and the predictions pa(x) and pb(x). The sets {abs_err(pa,x)x in S} and {abs_err(pb,x)x in S} are matched pair data, i.e., measurements of two random variables are made at the same sample points. One can compare functions on a matched pair sample by looking at the difference of the functions on the sample, i.e., {err_diff(x)  x in S}, where err_diff(x) = abs_err(pb,x)  abs_err(pa,x). The mean of err_diff on S is a measure of how much better pb predicts than pa predicts. If this mean is positive, pb is a better predictor of ex than pa. If, on the other hand, mean(err_diff) is nonpositive, pb is not a better predictor than pa. Therefor e, to test whether pb is a better predictor, one will find a confidence level for mean(err_diff) to be positive. In the usual method of statistics, one finds the confidence interval for mean(err_diff)0 by determining the probability of an observed statistic obtaining at least its observed value if the desired conclusion (i.e., pb is better than pa, or in statistica l terms, mean(err_diff)0) is false. The sample mean follows the t distribution centered around the population mean. The integral of the t distribution with mean 0 from the observed value on S to infinity is the probability of obtaining the observed diffe rence of means if the predictor that was supposed to be worse is actually better or at least no worse than the supposed better predictor. The t statistic with mean 0 is given by the formula t = mean(err_diff on S)/sd(err_diff on S) sd(err_diff) = (SUM(err_diff(xi)mean(err_diff on S)^2/(N1))^(1/2) for xi in S. [Note that a better computational formula is sd = (N*SUM(err_diff(xi))^2  SUM(err_diff(xi)^2/(N*(N1)))^(1/2).] The confidence that the expected predictor is better than the other predictor is given by the probability that t has a value less than or equal to the observed value. If t0 is the observed value of t, the confidence is the integral of t from infinity to t0. Standard tables of the t distribution in most statistics books give values of t0 which guarantee standard confidences, e.g., 90 percent, 95 percent, 99 percent, etc. By comparing the value of t0 on S with the values in the table, one can determine w hich of the standard confidence levels has been reached. The tables are for the t distribution with a standard deviation of 1; dividing the sample mean by the sample deviation converts the sample mean to this standard form. Use of Absolute Values From inspection of a table for the t distribution, it becomes apparent that the confidence level increases as the value of t increases. In turn, t increases as mean(err_diff) increases. Suppose that the error of a predictor was simply predicted value  observed value instead of the absolute value of that difference. If the worse predictor badly underpredicts a large positive observed value, while the better predictor has only a small error, err_diff would be a negative number. The t test only returns useful confidenc e intervals for t values near 2. Without taking absolute values of predictor errors, the t test can fail to detect significant differences in the magnitude of errors in predictors. Example Given
where
As a result, statistics for err_diff on the 5 sample points are
Properties of an Improved Approximation Note that the level of confidence in whether p1 improves p2 depends only on the t statistic, the ratio of the difference of means of the absolute errors divided by the standard error of this difference. This means that the uniformity by which a predictio n function predicts beyond its intended domain is a very desirable property for a prediction function, because it decreases the standard error of the error of the predictor, increasing the value of the t statistic. Using the Normal Approximation As the sample size increases, the t distribution approaches the normal distribution. The center of the distribution converges faster than the tail. For 30° of freedom, the value of t required for a 90 percent confidence level is about 2 percent greater t han the corresponding value of the normal distribution (1.310 vs. 1.282) while the value of t for 99.5 percent confidence is about 6 percent greater (2.750 vs. 2.576).

Topics: Research, operations. Keywords: Research, operations, software development, validation. TRT Terms: Research, operations, Information organization, Information management, Data processing, Software, validation. Updated: 04/25/2012
