This report is an archived publication and may contain dated technical, contact, and link information 

Publication Number: N/A
Date: 1999 
Producing Correct SoftwareCorrectness Given Mathematical SpecificationsSummary This web page presents ways to achieve software correctness for software that has, or could have, mathematically precise specifications. More detailed information about techniques introduced here will appear later in subdocuments linked to this document. Definition of Software Correctness This document is an introduction to some techniques for producing correct software. Correct software must accomplish the following:
Software Techniques to Produce Correctness G1 through G5 describe the desired outcome of a software project but do not explain how to reach that outcome. By following three tasks, G1 through G5 can be achieved:
Suggested techniques for addressing T3 (showing that software meets specifications) will be included in the following sections. This document will not address T1 (specifications) and T2 (system development). Although T1 and T2 are crucial to the success of the project, these topics are addressed in other sections.
On the other hand, while there has been considerable academic work on techniques for V&V, this work has two flaws:
Process Standards Are Insufficient A process standard is insufficient to insure software correctness (G1G5). This is because the behavior of a system depends on the state of the system at the time when it is used, not the history of its construction. If a process standard were used in civil engineering, one would require that a bridge be designed by a licensed professional engineer, and built by certified journeyman craftspersons, but one would not be required to do a structural analysis of the design nor a physical inspection of the bridge itself. Admittedly, it is more likely that quality personnel following careful procedures will produce a correct bridge or system; however, even the best professionals make mistakes, and less qualified personnel may produce a sound bridge or system. The built system itself determines its behavior. Top Level Subtasks of Verification and Validation Like the requirements for correctness (G1G5), the definition of verification and validation (V&V) does not indicate how it should be achieved. The following list of V&V subtasks breaks the V&V problem into parts that are more manageable:
The Best Situation The best situation is when the software is 100 percent correct. This is the case when T1 and VV1 through VV3 can be carried out. Together the two requirements, finding sufficient specifications (T1) and showing that the software satisfies them (T3), would imply that G1 through G4 are satisfied. Together VV1 (correctness on an ideal computer) and VV2 (actual approximation in the ideal computer), imply T3. VV3 is a more elaborate restatement of G5. Therefore, in the ideal case when T1 and VV1 through VV3 are achieved, theory suggests that the correctness conditions (G1 to G5) should always be satisfied. Note that in the section "Testing" below, it is strongly suggested that all theoretical predictions about software, including this one, be experimentally verified. Computations Using Formulas Software modules (e.g., functions in C) that plug numbers into a formula to compute a result are an important special example of VV1VV3 goals being achieved. This happens when the following occurs:
Therefore, to achieve correctness for programs that are based on formulas, it is sufficient to write the program in the following way:
If these two conditions are satisfied, and mathematical induction on the number of steps in the computation has been performed, the final results will approximate the ideal. However, in the real world, the closeness of approximation will decay as the number of steps increases. Therefore, it is necessary to show that the final result approximates the ideal computation closely enough for its intended application. Methods for writing numerical software that approximate the ideal on a domain that the software itself can recognize (VV2 and VV3) will be discussed in Safe Numerical Techniques [not yet available]. Software with Algorithmic Specifications Sometimes software specifications, as referred to in VV1, are stated as an algorithm. Such an algorithm is usually stated in some combination of natural language, mathematics, and/or pseudocode. For example, Euclid uses this method by stating an algorithm for finding the greatest common divisor. Euclid’s algorithm can be used as the specifications for a computer program to find the greatest common divisor. Euclid proved this algorithm correct and set a standard over 2000 years old, which is rarely matched by contemporary programmers. For software with algorithmic specifications, VV1 is established by showing that the computerized version of the algorithm is equivalent to the specification version of the algorithm. Two algorithms are equivalent if they compute the same function, i.e., for any point in the input domain of the specification algorithm, the specification and implementation algorithms compute the same value(s). One important way to do this, although not the only way, is to show that the computer implementation is an image of the original algorithm under a mapping that translates the syntax of the original into a programming language, while preserving the semantics of the original. Once VV1 is established, VV2 and VV3 can be established using the same techniques used in formulabased programs. Software with Logical Specifications Often the specifications of VV1 are stated as logical formulas containing both inputs and outputs. The built software is supposed to satisfy these formulas. One source of these formulas is a hazard analysis (see Ch. 14, Safeware). Another source is the application area for which the software is intended (e.g., knowledge about pavements for a pavement management system). A third source is basic science and mathematics. For example, a computed inverse matrix should satisfy the following formula of abstract algebra defining an inverse:
When logical specifications exist, there are two ways to establish them: wrapping and mathematical proof. Wrapping Wrapping is a technique that uses a modified version of a piece of software to compute its own correctness. To wrap a program P used in system S with a requirement R on P,
For example, let the following be a matrix inverse program that puts what it thinks the inverse of A into B: void inverse (matrix A, *matrix B) A wrapped version of this is boole wrapped_inverse(matrix A, *matrix B) { where
The other half of wrapping the matrix inverse is to make the program where it is used respond to the success or failure of the inverse program. Let inverse (A,&B); F(B); be a call to the matrix inverse program, where A and B are matrix variables, and F(B) is a code segment that uses B. A wrapped version of this is boole1 = wrapped_inverse (A,&B); if (boole1) F(B); else G; where boole1 is a boolean variable and G is a code segment that does not use G. Wrapping has several different V&V applications:
The following is an important theorem for wrapping: Let PROG be a program on which P(x) is to be proved. P(x) is a logical formula such that
Then if P(x) is approximately satisfied, P(x) is true for PROG. Mathematical Proofs About Programs Mathematical proofs about programs are sometimes useful for showing the correctness of programs with logical specifications. Verifying the correctness of a computation typically involves two steps:
To address the first step, one typically defines an ideal computer as a mathematical construct capable of running the program to be proved correct. The ideal computer often consists of a set of states and transitions, defined to mirror the semantics of a programming language. Program execution is simulated by transitions to new states in this ideal machine. A typical proof shows that for any inputs in the legal input domain, all computational paths reach end states where the program specifications are true. Verification & Validation for Prediction Software Predicting future or hypothetical events is an important application of computers, e.g., traffic simulation is used to predict traffic flows during the next hour. When the domain for the predictions has a precise theory by which predictions are made and the predictions have been extensively confirmed by experiment, the techniques for verifying software with algorithmic or logical specifications can be applied to the prediction software. However, many prediction software programs make predictions when theory, past experimental verification or both is missing. An example of this is prediction using a neural net. The following describes many neural net applications:
When there is no precise, experimentally verified predictive theory, the only way to establish the correctness of prediction software is by experimentally verifying that the software predicts accurately. In addition, the reliability of predictive software, particularly when experimentally verified theory is lacking, can be improved in these ways:
One must test the hypothesis of whether the program satisfies a logical formula. The logical formula does not have to be proved or used for wrapping. For example, for prediction programs, one must experimentally verify whether the formula prediction comes close to the observed values. If P is the logical formula to be verified, one designs an experiment to statistically test the hypothesis that P is satisfied by outputs of the software. To design this experiment, one must choose a confidence level (C) through which one wants to experimentally verify P. The traditional scientific values for C are 95 percent, 99 percent, or perhaps 99.9 percent. However for safetycritical computations, a higher confidence level, e.g., 99.9999 percent (only one failure in a million) may be appropriate. The result of testing is an error estimate, or an estimate of the difference between computed and observed values. This error estimate should only be applied to new data in the population sample. Furthermore, as the software runs on additional data, the original error estimate can be tested against results on these new runs. In addition, it is sometimes possible to improve the original error estimate by adding the new data points to the original sample, creating a new, larger, more reliable sample from which a more reliable error estimate can be computed. In addition, running the experiment produces an observed confidence level C', which is almost never exactly C (although C' may be close to C). Designing the experiment to achieve C does not cause the confidence level to be C, only that there are enough data points in the experiment to possibly achieve C.

Topics: Research, operations. Keywords: Research, operations, software development, validation. TRT Terms: Research, operations, Information organization, Information management, Data processing, Software, validation. Updated: 04/25/2012
