This report is an archived publication and may contain dated technical, contact, and link information

Publication Number: N/A
Date: 1999

Producing Correct Software

Correctness Given Mathematical Specifications

Summary

This web page presents ways to achieve software correctness for software that has, or could have, mathematically precise specifications. More detailed information about techniques introduced here will appear later in subdocuments linked to this document.

Definition of Software Correctness

This document is an introduction to some techniques for producing correct software. Correct software must accomplish the following:

G1: Compute accurate results.
G2: Operate safely and cause the system containing the software to operate safely.
G3: Perform the tasks required by the system containing the software, as specified in the software specifications.
G4: Achieve these goals for all inputs.
G5: Recognize inputs outside its domain.

Software Techniques to Produce Correctness

G1 through G5 describe the desired outcome of a software project but do not explain how to reach that outcome. By following three tasks, G1 through G5 can be achieved:

T1: Determine specification that is sufficient in the mathematical sense for G1-G5.
T2: Build the software.
T3: Demonstrate that the software satisfies the specifications from T1. This task is called verification and validation (V&V).

Suggested techniques for addressing T3 (showing that software meets specifications) will be included in the following sections. This document will not address T1 (specifications) and T2 (system development). Although T1 and T2 are crucial to the success of the project, these topics are addressed in other sections.

Systems analysis.
Knowledge engineering.
Safety engineering, e.g., see Leveson [references] for several techniques for producing specifications that imply system safety.
Software standards such as ISO 9001 and SEI CMM [see Hatton, p. 16-23 in refs.] focus on the software development process (T2).
Demonstrating that software satisfies specifications (T3) requires techniques not found in development process (T2) oriented standards (as discussed below).

On the other hand, while there has been considerable academic work on techniques for V&V, this work has two flaws:

It does not address all the problems faced by real-world software developers
Theory is used to check the completed design.
It is not in a form that is readily usable by actual software developers.

Process Standards Are Insufficient

A process standard is insufficient to insure software correctness (G1-G5). This is because the behavior of a system depends on the state of the system at the time when it is used, not the history of its construction. If a process standard were used in civil engineering, one would require that a bridge be designed by a licensed professional engineer, and built by certified journeyman craftspersons, but one would not be required to do a structural analysis of the design nor a physical inspection of the bridge itself. Admittedly, it is more likely that quality personnel following careful procedures will produce a correct bridge or system; however, even the best professionals make mistakes, and less qualified personnel may produce a sound bridge or system. The built system itself determines its behavior.

Top Level Subtasks of Verification and Validation

Like the requirements for correctness (G1-G5), the definition of verification and validation (V&V) does not indicate how it should be achieved. The following list of V&V subtasks breaks the V&V problem into parts that are more manageable:

VV1: Show that the software satisfies the desired conditions if the computation was performed in an ideal computer with infinite memory and precision.
VV2: Show that the actual computation approximates the ideal computation for inputs within the domain of application.
VV3: Show that the actual computation can identify inputs or computations for which the actual computation does not approximate the ideal computation of VV1.

The Best Situation

The best situation is when the software is 100 percent correct. This is the case when T1 and VV1 through VV3 can be carried out. Together the two requirements, finding sufficient specifications (T1) and showing that the software satisfies them (T3), would imply that G1 through G4 are satisfied. Together VV1 (correctness on an ideal computer) and VV2 (actual approximation in the ideal computer), imply T3. VV3 is a more elaborate restatement of G5. Therefore, in the ideal case when T1 and VV1 through VV3 are achieved, theory suggests that the correctness conditions (G1 to G5) should always be satisfied. Note that in the section "Testing" below, it is strongly suggested that all theoretical predictions about software, including this one, be experimentally verified.

Computations Using Formulas

Software modules (e.g., functions in C) that plug numbers into a formula to compute a result are an important special example of VV1-VV3 goals being achieved. This happens when the following occurs:

The required condition on the output is given by the formula.
In most current programming languages, the formula appears directly in the program. The version in the program is an image of the original formula under a syntactic mapping that preserves the semantics of the original.
The semantics of the programming language are such that if the formula were computed on an ideal computer, the result would satisfy the formula.

Therefore, to achieve correctness for programs that are based on formulas, it is sufficient to write the program in the following way:

All input data used in the computation approximate the corresponding ideal mathematical representations of the corresponding input data items.
Each computation approximates the computation on an ideal computer.

If these two conditions are satisfied, and mathematical induction on the number of steps in the computation has been performed, the final results will approximate the ideal. However, in the real world, the closeness of approximation will decay as the number of steps increases. Therefore, it is necessary to show that the final result approximates the ideal computation closely enough for its intended application. Methods for writing numerical software that approximate the ideal on a domain that the software itself can recognize (VV2 and VV3) will be discussed in Safe Numerical Techniques [not yet available].

Software with Algorithmic Specifications

Sometimes software specifications, as referred to in VV1, are stated as an algorithm. Such an algorithm is usually stated in some combination of natural language, mathematics, and/or pseudo-code. For example, Euclid uses this method by stating an algorithm for finding the greatest common divisor. Euclid’s algorithm can be used as the specifications for a computer program to find the greatest common divisor. Euclid proved this algorithm correct and set a standard over 2000 years old, which is rarely matched by contemporary programmers.

For software with algorithmic specifications, VV1 is established by showing that the computerized version of the algorithm is equivalent to the specification version of the algorithm. Two algorithms are equivalent if they compute the same function, i.e., for any point in the input domain of the specification algorithm, the specification and implementation algorithms compute the same value(s).

One important way to do this, although not the only way, is to show that the computer implementation is an image of the original algorithm under a mapping that translates the syntax of the original into a programming language, while preserving the semantics of the original.

Once VV1 is established, VV2 and VV3 can be established using the same techniques used in formula-based programs.

Software with Logical Specifications

Often the specifications of VV1 are stated as logical formulas containing both inputs and outputs. The built software is supposed to satisfy these formulas.

One source of these formulas is a hazard analysis (see Ch. 14, Safeware). Another source is the application area for which the software is intended (e.g., knowledge about pavements for a pavement management system). A third source is basic science and mathematics. For example, a computed inverse matrix should satisfy the following formula of abstract algebra defining an inverse:

A*B=B*A=I where
- A is the input matrix
- B is the output matrix
- I is the identity matrix

When logical specifications exist, there are two ways to establish them: wrapping and mathematical proof.

Wrapping

Wrapping is a technique that uses a modified version of a piece of software to compute its own correctness. To wrap a program P used in system S with a requirement R on P,

R is translated (using the techniques for translating formulas) into an equivalent representation R’ in the programming language of P.
R’ is inserted at the end of P, right before P returns its value(s). This insertion is done in such a way that the calling context of P can examine the result of R. The modified version of P will be called P’.
S is modified to
1. Test the value of R’.
2. Discard the values from P’ if R’ is FALSE.

For example, let the following be a matrix inverse program that puts what it thinks the inverse of A into B:

void inverse (matrix A, *matrix B)

A wrapped version of this is

boole wrapped_inverse(matrix A, *matrix B)

{
matrix X, Y;
boole result;
inverse(A,B);
matrix_mul(A,B,&X);
matrix_mul(A,B,&Y);

result = matrix_eq( X, I) && matrix_eq(Y, I);
return (result);
}

where

boole matrix_eq(matrix A, matrix B) is a test for matrix equality
void matrix_mul(matrix A, matrix B, *matrix X) puts the product of matrices into X.
I is the identity matrix (of the same number of dimensions as A and B)

The other half of wrapping the matrix inverse is to make the program where it is used respond to the success or failure of the inverse program.

Let

inverse (A,&B); F(B);

be a call to the matrix inverse program, where A and B are matrix variables, and F(B) is a code segment that uses B.

A wrapped version of this is

boole1 = wrapped_inverse (A,&B);

if (boole1) F(B);

else G;

where boole1 is a boolean variable and G is a code segment that does not use G.

Wrapping has several different V&V applications:

Wrapping a software module with a logical formula specification
Wrapping a software module with necessary but not sufficient conditions for correctness that the output must satisfy. This is useful when sufficient conditions for correctness are not available, e.g., with software that predicts values that have not yet been observed.
Wrapping inputs to all programs to insure that the inputs are in the domain for which the program is intended.

The following is an important theorem for wrapping:

Let PROG be a program on which P(x) is to be proved. P(x) is a logical formula such that

x is the only computed result in P(x).
All functions appearing in P(x) are continuous.
All computations in P(x) approximate ideal-computer computations when inputs are in domain D.
PROG detects when inputs are outside D.

Then if P(x) is approximately satisfied, P(x) is true for PROG.

Mathematical Proofs About Programs

Mathematical proofs about programs are sometimes useful for showing the correctness of programs with logical specifications. Verifying the correctness of a computation typically involves two steps:

Showing that the computation is correct on an ideal computer.
Showing that the ideal computation is approximated on an actual computer.

To address the first step, one typically defines an ideal computer as a mathematical construct capable of running the program to be proved correct. The ideal computer often consists of a set of states and transitions, defined to mirror the semantics of a programming language. Program execution is simulated by transitions to new states in this ideal machine. A typical proof shows that for any inputs in the legal input domain, all computational paths reach end states where the program specifications are true.

Verification & Validation for Prediction Software

Predicting future or hypothetical events is an important application of computers, e.g., traffic simulation is used to predict traffic flows during the next hour. When the domain for the predictions has a precise theory by which predictions are made and the predictions have been extensively confirmed by experiment, the techniques for verifying software with algorithmic or logical specifications can be applied to the prediction software.

However, many prediction software programs make predictions when theory, past experimental verification or both is missing. An example of this is prediction using a neural net. The following describes many neural net applications:

Nothing much is known about the function to be predicted.
The neural net uses an algorithm unrelated to the domain of application.
The domain on which predictions are to be made is poorly defined.

When there is no precise, experimentally verified predictive theory, the only way to establish the correctness of prediction software is by experimentally verifying that the software predicts accurately. In addition, the reliability of predictive software, particularly when experimentally verified theory is lacking, can be improved in these ways:

Wrapping the prediction software with tests on necessary conditions the prediction must satisfy (e.g., that the traffic flow not exceed the physical capacity of a road).
Combining several different prediction methods (e.g., by averaging or polling). Testing Even when there is a theoretical proof using the methods described above, a program should be statistically tested because there can be errors:
- In the proofs.
- In the wrapping code.
- In details beneath the level covered by the proofs or wrapping, e.g., hardware errors

One must test the hypothesis of whether the program satisfies a logical formula. The logical formula does not have to be proved or used for wrapping. For example, for prediction programs, one must experimentally verify whether the formula prediction comes close to the observed values.

If P is the logical formula to be verified, one designs an experiment to statistically test the hypothesis that P is satisfied by outputs of the software. To design this experiment, one must choose a confidence level (C) through which one wants to experimentally verify P. The traditional scientific values for C are 95 percent, 99 percent, or perhaps 99.9 percent. However for safety-critical computations, a higher confidence level, e.g., 99.9999 percent (only one failure in a million) may be appropriate.

The result of testing is an error estimate, or an estimate of the difference between computed and observed values. This error estimate should only be applied to new data in the population sample. Furthermore, as the software runs on additional data, the original error estimate can be tested against results on these new runs. In addition, it is sometimes possible to improve the original error estimate by adding the new data points to the original sample, creating a new, larger, more reliable sample from which a more reliable error estimate can be computed.

In addition, running the experiment produces an observed confidence level C', which is almost never exactly C (although C' may be close to C). Designing the experiment to achieve C does not cause the confidence level to be C, only that there are enough data points in the experiment to possibly achieve C.

[TOC] | [Next]