Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations
|This report is an archived publication and may contain dated technical, contact, and link information|
Publication Number: FHWA-RD-04-080
Date: September 2004
Software Reliability: A Federal Highway Administration Preliminary Handbook
PDF Version (697 KB)
PDF files can be viewed with the Acrobat® Reader®
Chapter 3: Software Testing
This chapter discusses:
Definition of Software Testing
Software testing is the process of experimentally verifying that a program operates correctly. It consists of:
Testing is a necessary part of establishing software correctness, even software that has been proven correct. This is because testing catches:
Scope of Chapter
Only some highlights of the extensive literature on software testing are included in the handbook. Simple statistical procedures for analyzing test results are summarized in chapter 9 in Verification, Validation, and Evaluation of Expert Systems, an FHWA Handbook.(5)
What Constitutes Testing
From a statistical standpoint, testing is an experiment, and the conclusions one can draw are governed by the theorems of statistics. In particular, for the results of testing to be statistically valid:
For a sample to cover a population, every property of the inputs that might be expected to affect correctness should be represented sufficiently in the sample. "Sufficiently" in this context means that if a particular property causes the program to fail, inputs with that property occur frequently enough so that the failure is detectable statistically.
What to Include in the Test Sample
Wallace et al. have cataloged some of the kinds of inputs that should go into a test set.(4) These inputs must be in the test set because they have properties that might affect whether a program functions correctly.
How to Select the Test Sample
The test sample must be random, yet contain enough of the special inputs listed above (called a stratified sample in statistics). To ensure randomness, it is important that inputs in the test set not be used during program development. If inputs are used in program development, they are not random, but become inputs used during development. There is no way to be sure that a program does not behave differently on the inputs used during development than on truly random inputs. In fact, neural nets can be over-trained to behave better on their training set than on a randomly chosen input set.
The best way to ensure that test inputs are not used during program development is to choose a test set before implementation begins, and lock that test set away from the development team.
How Many Inputs Should Be Tested
The number of inputs to test depends on the purpose of the test. To obtain a certification of correct performance to a certain confidence level, a sample that is large enough to provide that confidence level must be chosen. (See the FHWA handbook referenced above, or most elementary statistics texts for more details.) If the program is designed to compute an estimate rather than an exact value of its output, a statistically significant sample must be used.
However, for programs that compute exact values of their outputs, an argument can be made for testing a single point in a region of the input space for which:
If a single input from the region is correct, this was either a fluke or an event that would be replicated with further tests. For it to be a fluke, the program would have to apply some set of formulas other than the intended ones and get an identical output to the intended output for a randomly chosen point. This is a very unlikely event.
Limitations of Testing
Rigorous testing of large software is not possible, because the number of test cases that must be run is impractically large. For example, the PAMEX pavement maintenance expert system contains approximately 325 rules; there are millions of different possible execution paths. For this reason, testing should be supplemented with techniques such as informal proofs and/or wrapping.