U.S. Department of Transportation
1200 New Jersey Avenue, SE
Washington, DC 20590
202-366-4000

Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations

 This report is an archived publication and may contain dated technical, contact, and link information
 Federal Highway Administration > Publications > Research Publications > Safety > 04080 > Software Reliability: A Federal Highway Administration Preliminary Handbook
Publication Number: FHWA-RD-04-080
Date: September 2004

# Software Reliability: A Federal Highway Administration Preliminary Handbook

PDF files can be viewed with the Acrobat® Reader®

## Chapter 5: Informal Proofs

This chapter:

• Defines a framework for informal proofs about programs.
• Introduces proof techniques for informal proofs about programs.
• Contains simple examples of informal proofs.
• Discusses the application of informal proofs.
• Lists the limitations of informal proofs.

The following simple examples provide some context for the framework.

### Introduction

Informal proofs are proofs similar to those of high school geometry or calculus for engineers. Informal proofs seek to retain the certainty of knowledge that comes from doing a proof while reducing the work in actually doing the proof.

### Simple Examples

Here is a pair of simple illustrative proofs about the factorial function, a function that often is used to illustrate the syntax of a programming language. Although these proofs are shorter and simpler than proofs used in a real software project, they illustrate the general style and level of proof that can be achieved using the symbolic evaluation framework presented later in this chapter.

#### The Iterative Factorial

The iterative function is:

It will be proved that factorial (n) = n* ...*1 for positive integers n. (Note: factorial(n) = n*(n-1)*...*1.)

The proof is by mathematical induction. This is a common proof technique for proofs about programs. In an inductive proof, the result is proved for the simplest cases, in this case for n = 1. Then it is also proved that if the desired result holds for all inputs at a complexity of a certain amount or less, it must hold for inputs of slightly greater complexity. In this case, the second part of the proof consists of assuming the result for positive integers k or less, and proving it for k+1. In other situations, the parameter on which induction is proved might be the depth or number of nodes in a tree, or the number of execution steps in executing a program.

Proof: When n = 1, the 2nd branch of the if-else applies, and 1 is returned.

Now assume that for n = k, the function returns k*...*1, where k>1. In this case, neither branch of the if-else returns a value, and the while loop is executed. From the inductive assumption and the fact that result = 1 before the while loop, we know that when the while loop is executed with n = k, it multiplies the current value of result by k*...*1.

For n = k+1, neither branch of the if-else applies, and the while loop is executed. Because k+1>1, the loop body is executed. The first loop body execution multiplies result by k+1 and decreases n to k. After this first loop body execution, the conditions of the inductive assumption occur. Using this assumption, result is multiplied by k*...*1 during the rest of the loop execution. Altogether:

• Result begins at 1.
• The first time through the loop multiplies it by k+1.
• The rest of the loop iterations multiply it by k*...*1.

The value of result is (k+1)*k*...*1 after the loop. Because factorial returns result, the inductive part of the proof is complete, and factorial computes n*...*1.

#### The Recursive Factorial

The recursive function is:

It will be proved that factorial(n) = n* ...*1 for positive integers n.

Proof: As with the iterative case, the proof is by mathematical induction. For n = 1, the second branch of the if-else is taken, and 1 is the value of the function.

Now assume that for n = k,

factorial(k) = k*...*1

With the goal of proving

factorial(k+1) = (k+1)*k*...*1

Because k+1>1, the third branch of the if-else is taken, and (k+1)*factorial(k) is returned. By the inductive assumption, this is (k+1)*k*...*1, and the theorem is proved.

### Advantages and Limitations of Informal Proofs

Informal proofs for computer programs date back at least to the 1970s. As shown by the simple examples above, proofs generally are too labor intensive for large programs. However, proofs can be useful for critical software modules because:

• The number of test cases needed to completely test a module can be very large (e.g., millions for a medium-sized expert system).
• A successful proof is a good argument that a program's logic is correct.
• The process of constructing a proof forces the prover to study the workings of a program in detail, often exposing bugs.

Proofs by themselves are not sufficient to guarantee software correctness, because:

• The proof can contain errors. In particular, it is easy to see in a program what is needed for a proof but is not really in the program.
• Proofs are usually conducted on abstractions of the program that can overlook code that can cause errors, e.g., I/O statements that crash the program.

Another limitation of proofs is that they require the program source code.

### Symbolic Evaluation

Most proofs about computer programs use some form of symbolic evaluation. For example, the two example proofs presented above used symbolic evaluation. This section defines the process of symbolic evaluation for use in informal proofs.

A symbolic evaluation is an abstraction of an actual computer computation. The purpose of a symbolic computation is to provide a simpler object on which to carry out a correctness proof. In addition, the symbolic computation abstracts a set of actual computations, so a single proof on a symbolic computation can be helpful in proving the correctness of an entire set of actual computations.

The following sections detail the differences and relationship between an actual and a symbolic computation.

#### Contents of Memory

The contents of an actual computer memory are finite precision numbers, bytes, and characters that represent the data manipulated by a computer program. They are addressed by an integer location of the first byte in memory.

The contents of a symbolic computation memory are pure mathematical objects (e.g., numbers of infinite precision, characters, strings, matrices, functions, geometric figures, and any other objects used in the algorithm being considered). Each of these is the value of some variable, either an explicitly stated variable name from the symbolic program, or a description of what the data item is. Symbolic objects are addressed by asking for the value of this variable.

#### Currently True Statements

In a symbolic computation, assumptions in the form of equations, inequalities, and logical statements often are made about the variables in the symbolic memory. These assumptions express what is known about:

• The real-world objects the software concerns, e.g., ranges for design parameters such as lane width.
• Mathematical properties of input parameters and other variables, e.g., that the input to the factorial is a non-negative integer.

During the course of a symbolic computation, the set of currently true statements:

• Is changed by actions of the symbolically executing computer program.
• May be supplemented by mathematically proved or, in some applications, empirical observations.

Two sets of currently true statements are considered equivalent if each statement in one set can be proved using statements in the other set.

#### Symbolic Programs

A symbolic program is a pseudocode-like abstraction of an actual program. A symbolic programming language contains a known finite set of programming operations such as assignment; blocks; if-then-else branching; loops such as while, do, and for; and perhaps try-catch exception handling. Each of these operations is defined by a transition rule that changes the contents of symbolic memory, the current point of execution of the symbolic program, and the set of currently true statements; examples are provided below after some further definitions.

The syntax of a symbolic program is not specified in detail, but must satisfy the requirement that a symbolic program can be parsed unambiguously into a semantic tree structure in which:

• Each node is a symbolic programming operation.
• The subnodes and their relation to the parent are identified, e.g., that the test and loop body of a while statement can be identified.

#### Point of Current Execution

At any time, the point of current execution of a symbolic program is known. The point of current execution specifies the next statement in the program, if any, to be executed, or that the program has terminated. At the start of program execution, the point of current execution is before the first statement of the program.

Together, the contents of symbolic memory, the set of currently true statements, and the point of current execution define the state of a single thread of a symbolic computation. As noted below, a symbolic computation sometimes splits into multiple threads. For a theorem to be proved for a symbolic computation, it must be proved for all the threads that split off successor states to the start state of the symbolic program.

#### Transition Rules

Following are the transition rules for the symbolic form of some common programming operations. They describe how to change the symbolic memory, currently true statements, and point of current operation after executing the different types of statement in a symbolic programming language, which models the typical procedural programming language.

The changes these statements make is what would be expected based on experience with these programming languages; our experience with programming languages allows us to describe the effects of a single statement on a symbolic computation representing a set of actual computations. By reasoning about the cumulative effect of these statements as they are executed, one after another, for the entire program, something can be proved about the effect of running an entire program that belongs to the set represented by the symbolic computation.

#### Source of the Transition Rules

The transition rules presented below derive from the definitions of the basic statement types of standard procedural programming languages, as a result of asking, "Given how a statement changes an actual computer running a single program starting with a single set of inputs, how does that same statement affect a symbolic computer running multiple instances of a program each of which starts with different inputs?"

#### Assignment

When a statement

x = y

is executed,

• The variable x is added to symbolic memory if it was not there.
• The variable y replaces any previous value of x in symbolic memory.
• The equality x = y is added to the currently known statements and replaces any previous statements of the form x = [whatever].
• Any statement in the currently true statement that depends on the old x = [whatever] is deleted.
• The point of current execution moves to the statement after the assignment, or to the end of the program if there is no such statement.

In the assignment statement, x must be a variable and y a mathematical expression for the type of object that is the values of x.

While stating all this about assignment makes it appear complicated, the above statements just express our intuition about what assignment in a procedural language means.

#### Blocks

A block is a sequence of statements enclosed in syntactic markers (e.g., {and} in C and JavaTM. This converts a sequence of statements into a single statement in the programming language.

When a block is executed, each of the statements in the block is executed in the sequence they appear in the block. As each statement is executed, the changes it makes in the symbolic memory and currently known statements are carried out. After the last statement of the block is executed, the point of current execution moves to the statement after the block or to the end of the program.

#### If-Else

An if-else statement consists of a Boolean-valued test, a statement executed when the test evaluates to true, and an optional statement executed when the test evaluates to false.

In symbolically evaluating an if-else, an attempt is made to prove the test from the currently true statements. If this attempt succeeds, the then statement (the statement intended for execution if the test is true) is executed symbolically.

If the test cannot be proved, but can be shown to contradict the currently true statements, the else statement (the statement intended for execution if the test is false) is executed symbolically, if an else statement is present.

If the test is sometimes true and sometimes false given the currently true statements, or if the test can neither be proved nor disproved from the currently true statements, the symbolic computation is split into two symbolic computations. In one, the test is added to the currently true statements. In the other, the negation of the test is added. For a statement to be proved for the original symbolic computation, it must be proved in each of these new symbolic computations. This situation is similar to that in pure mathematics, when a theorem is proved by breaking it into a set of special cases, each of which is proved in turn.

After determining which branch or branches of the if-else statement must be executed, those branches are symbolically executed, with the test added to the currently true statements when executing the then statement and the negation of the test added when executing the else statement, if there is an else statement.

After executing the then and/or else statements, or after executing none if there is no else statement and the test is symbolically false, the point of current execution is set to the statement after the if-else, or to the end of the program, in the one or possibly two symbolic computations that now exist.

After executing the if-else statement, the currently true statements are as follows:

• In the computation where the test was symbolically true:
• The test is added to the incoming currently true statements.
• The currently true statements are modified using the symbolic evaluation rules that apply to the then statement.
• In the computation where the test was symbolically false:
• The negation of the test is added to the incoming currently true statements.
• If an else statement is present, the currently true statements are then modified using the symbolic evaluation rules that apply to the else statement.

#### While

A while statement contains a test and a loop body. In symbolically executing the while statement, it must be proved that after a finite number of symbolic executions of the loop body, the test can be proved to be false. (If the test can be proved to be false with no loop repetitions, this condition is satisfied.) If the test cannot be proved to eventually fail, the symbolic computation ends in an error state.

If the while loop is proved to terminate after symbolically executing a while statement, the point of execution is the statement after the while statement or the end of the program.

If the test is not provably false, the contents of symbolic memory are changed in accordance with the symbolic evaluation rules for enough loop repetitions to make the test false. If the test is provably false, the contents of symbolic memory are unchanged.

The currently true statements after a while loop are computed by:

• Starting with the currently true statements that have accumulated up to the while loop.
• Changing the state of the symbolic computation (if the test is not provably false), including the currently true statements, as the result of enough loop executions to make the test false.
• Adding the negation of the test to the currently true statements.

For and do loops are executed in a similar way.

#### Function Calls

When a function is called, variables are added to symbolic memory representing function parameters. For pass-by value, the variable value is copied. For pass-by reference, the new variable name is added as a name to reference an existing object in memory. Variables declared within the function body are added when values are assigned to those variables.

After adding parameters to memory, statements in the function body are executed using the rules of symbolic evaluation.

When the function code is completed, t he variables representing function parameters or declared in the function body are removed from symbolic memory. Statements involving these variables are removed from the currently true statements.

After a function call, the point of current execution is just after the function call or at the end of the program.

#### Object Creation

In object-oriented languages, if an object is created, a variable name and the new object as value are added to symbolic memory. If a variable name for the object appears in the program, it is used; otherwise a name that unambiguously identifies the new object is created for it. The point of execution moves to after the new (object-creation) operation. The currently true statements are unchanged.

#### Object Destruction

When an object is destroyed because the program moves outside the scope of the object or because of an explicit-free operation, the object is removed from symbolic memory. Statements involving the object are removed from the currently true statements. Execution moves to after the destroy operation.

As with assign, a read operation adds a variable name and corresponding value to symbolic memory. The currently true statements are updated to include any assumptions made about input values. The point of execution moves to after the read statement.

#### Write

To model the write operation, an abstract output destination is assumed. It is assumed that the value of objects in symbolic memory, including symbolic values (those containing variables), can be written to this destination. It is also assumed that the capacity of the symbolic destination is infinite, and that the values can be read by the outside world in the order in which the values were written.

A write statement adds a constant or a value in symbolic memory to the symbolic output. The currently true statements are generally unchanged, although the truth of statements of the form "x has been written" may change. The point of current execution moves to after the write statement or to the end of the program.

#### Try-Catch

In a symbolic try-catch computation, the recommended procedure is to split the symbolic computation into one try and one catch thread. The first represents normal execution and the second represents the raising of an exception. Both threads must be considered because it cannot be proved that an exception will not occur. This is because every computation uses a wealth of underlying software and depends on physical devices.

However, in some cases, an argument that an exception will never occur may become part of the informal proof.

A third option may be to restrict what is being proved to cases where an exception does not occur. In this case, this assumption should be stated clearly.

The rules for blocks are applied to evaluate the two threads of a try-catch computation, using the try block for one thread and the catch block for the other. As with if-else statements, a desired theorem must be proved for both threads, unless it is proved that an exception cannot occur (doubtful), or exceptions are excluded explicitly in the theorem to be proved.

### Using a Proof in V&V

A proof shows that the logic of the algorithm implemented by a program is correct. However, to show that a program actually is correct, one must demonstrate that the program implementing the algorithm and the physical computer on which it runs actually carry out the algorithm with enough accuracy for the application.

The following items are among those that should be checked to ensure that an algorithm proved to be correct works when implemented:

• The program accurately implements the algorithm.
• Numbers are within the range of computer arithmetic. The factorial functions only have this property for the small integer values.
• Numerical calculations are free from serious numerical errors. A technique for doing this is presented in Chapter 5.
• The computer has enough memory to carry out the computation.
• The computation finishes in a reasonable time.
• The program does not crash because of operating system, file, or the Graphical User Interface (GUI) errors.
Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000
Turner-Fairbank Highway Research Center | 6300 Georgetown Pike | McLean, VA | 22101