U.S. Department of Transportation
1200 New Jersey Avenue, SE
Washington, DC 20590
202-366-4000

Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations

 This report is an archived publication and may contain dated technical, contact, and link information
 Federal Highway Administration > Publications > Research Publications > Infrastructures > Pavements >   02095 > Optimal Acceptance Standards for Statistical Construction Specifications
 Publication Number: FHWA-RD-02-095

# Appendix G

As discussed in chapter 5, verification testing can be of two types: test method verification testing that is done on split samples, or process verification testing that is done on independent samples. The procedures are different for each of these types of verification testing.

### OC Curves for Test Method Verification

In chapter 5, two methods were considered for test method verification of split samples: the D2S method, which compares the contractor and agency results from a single split sample, and the paired t—test, which compares contractor and agency results from a number of split samples. OC curves, which plot the probability of detecting a difference versus the actual difference between the two populations, can be developed for either of these methods.

#### OC Curves for the D2S Verification Method

In the D2S method, a test is performed on a single split sample to compare agency and contractor test results. If we assume both of these samples are from normally distributed subpopulations, then we can calculate the variance of the difference and use it to calculate two standard deviation, or approximately 95 percent, limits for the sample difference quantity. Suppose the agency subpopulation has a variance and the contractor subpopulation has a variance . Since the variance of the difference in two random variables is the sum of the variances, the variance of the difference in an agency observation and a contractor observation is . The D2S limits are based on the test standard deviation provided. Let us call this test standard deviation σtest. Under an assumption that , this variance of a difference becomes . The D2S limits are set as two times the standard deviation (i.e., approximately 95 percent limits) of the test differences. This therefore sets that the D2S limits at , which is , or ±2.8284 σtest. Without loss of generality, we can assume σtest = 1, along with assumption of a mean difference of 0, and use the standard normal distribution with region between -2.8284 and +2.8284 as acceptance region for the difference in an agency test result and a contractor test result. With these two limits fixed, we can calculate power of this decision-making process relative to various true differences in the underlying subpopulation means and/or various ratios of the true underlying subpopulation standard deviations.

These power values can conveniently be displayed as a three-dimensional surface. If we vary the mean difference along the first axis and the standard deviation ratio along a second axis, we can show power on the vertical axis. The agency subpopulation, the contractor subpopulation, or both, could have standard deviations smaller, about the same, or larger than the supplied σtest value. Each of these cases is considered in the technical report for this project.[17] For simplicity, herein we will consider only the case where one of the two subpopulations has standard deviation equal to the supplied σtest. Figure 50 shows the OC curves for this case. Power values are shown where the ratio of the larger of agency or contractor standard deviation to the smaller of agency or contractor standard deviation is varied over the values 0, 1, 2, 3, 4, and 5. The mean difference given along the horizontal axis (values 0, 1, 2, 3) represents the difference in agency and contractor subpopulation means expressed as multiples of σtest.

As can be seen in the figure, even when the ratio of the contractor and agency standard deviations is 5 and the difference between the contractor and agency means is 3 times the value for σtest, there is less than a 70 percent chance of detecting the difference based on the results from a single split sample.

As is the case with any method based on a sample of size one, the D2S method does not have much power to detect differences between the contractor and agency populations. The appeal of the D2S method lies in its simplicity rather than its power.

 Prob. Of Detecting a Difference, % Std. Dev. Ratio Mean Difference, in σtest Units Figure 50. OC Surface for the D2S Test Method Verification Method(Assuming the smaller σ= σtest)

### OC Curves for the Paired t—test Method

As noted in chapter 5, for the case in which it is desirable to compare more than one pair of split sample test results, the t—test for paired measurements can be used. But the question arises, how many pairs of test results should be used? This is where an OC curve is helpful. The OC curve, for a given level of α, plots on the vertical axis either the probability of not detecting, β, or detecting, 1 - β, a difference between two populations. The standardized difference between the two population means is plotted on the horizontal axis.

For a t—test for paired measurements, the standardized difference, d, is measured as:

 where: = the true absolute difference between the mean of the contractor's test result population (which is unknown) and the mean of the agency's test result population (which is unknown). σd = the standard deviation of the true population of signed differences between the paired tests (which is unknown).

The OC curves are developed for a given level of significance, α. It is evident from the OC curves that for any probability of not detecting a difference, β, (value on the vertical axis), the required n will increase as the difference, d, decreases (value on the horizontal axis). In some cases the desired β or d may require prohibitively large sample sizes. In that case a compromise must be made between the discriminating power desired, the cost of the amount of testing required, and the risk of claiming a difference when none exists.

OC curves for paired t—tests for α values of 0.05 and 0.01 appear in figures 51 and 52, respectively.

To use these OC curves the true standard deviation of the signed differences, σd, is assumed to be known, (or approximated based on published literature). After experience is gained with the process, σd can be more accurately defined and a better idea of the required number of tests determined.

Example 1. The number of pairs of split sample tests for verification of laboratory-compacted air voids using the Superpave Gyratory Compactor (SGC) is desired. The probability of not detecting a difference, β, is chosen as 20 percent or 0.20. (Some OC curves use 1 - β, known as the power of the test, on the vertical axis, but the only difference is the scale change, with 1 - β, in this case, being 80 percent). Assume that the absolute difference between μc and μa should not be greater than 1.25 percent, that the standard deviation using the SGC is 0.5 percent, and that α is selected as 0.01. This produces a d value of 1.25 percent/0.5 percent = 2.5. Reading this value on the horizontal axis and a β of 0.20 on the vertical axis in figure 52 shows that about 5 paired split-sample tests are necessary for the comparison.

 Standardized Difference, d Figure 51. OC Curves for a Two—Sided t—Test (α = 0.05 ) (Source: Experimental Statistics, by M. G. Natrella, National Bureau of Standards Handbook 91, 1963)

 Standardized Difference, d Figure 52. OC Curves for a Two-Sided t—Test (α = 0.01) (Source: Experimental Statistics, by M. G. Natrella, National Bureau of Standards Handbook 91, 1963)

### OC Curves for Process Verification

In chapter 5, two methods were considered for process verification using independently obtained samples: the F—test and t—test method, which compares the variances and means of sets of contractor and agency test results, and the single agency test method, which compares a single agency test result with 5 to 10 contractor test results. OC curves, which plot the probability of not detecting a difference, β, or detecting a difference, the power or 1 - β, versus the actual difference between the two populations, can be developed for either of these methods.

### OC Curves for the F—test and t—test

One approach for comparing the contractor's test results with the agency's test results is to use the F—test and t—test comparisons of characteristics of the two data sets. To compare two populations that are assumed normally distributed, it is necessary to compare their means and their variabilities. An F—test is used to assess the size of the ratio of the variances, and a t—test is used to assess the degree of difference in the means. A question that needs to be answered is what power do these statistical tests have, when used with small to moderate size samples, to declare various differences in means and variances to be statistically significant differences. Some OC curves and examples of their use in power analysis follow.

F—test for Variances——Equal Sample Sizes. Suppose we have two sets of measurements assumed to come from normally distributed populations and wish to conduct a test to see if they come from populations that have the same variances, i.e., . Further suppose we select a level of significance of α = .05, meaning we are allowing up to 5 percent chance of incorrectly deciding the variances are different when they really are the same. If we assume these two samples are

x1, x2, …, xnx, and y1, y2 , …, yny,

calculate sample variances and , and construct

,

we would accept Ho : for values of F in the interval

.

For this two-sided or two-tailed test, figure 53 shows the probability we have accepted the two samples as coming from populations with the same variabilities. This probability is usually referred to as β, and the power of the test as 1 - β. Notice the horizontal axis is the quantity λ, where , the true standard deviation ratio. So for λ = 1, where the hypothesis of equal variance should certainly be accepted, it is accepted with probability 0.95, reduced from 1.0 only by the magnitude of our selected type I error risk, α. One major limiting factor for the use of figure 53 is the restriction that nx = ny = n.

Example 2. Suppose we have nx = 6 contractor tests and ny = 6 agency tests, conduct an α = 0.05 level test, and accept that these two sets of tests represent populations with equal variances. What power did our test have to discern if the populations from which these two sets of tests came were really rather different in variabilities? Suppose the true population standard deviation of the contractor tests (σx) was twice as large as that of the agency tests (σy), giving λ = 2. If we enter figure 53 with λ = 2 and nx = ny = 6, we find that β ≈ 0.74, or the power, 1 - β, is about 0.26. This tells us that with samples of nx = 6 and ny = 6, we only have 26 percent chance of detecting a standard deviation ratio of 2 (and correspondingly a four-fold difference in variance) as being different.

Example 3. Suppose we are not at all comfortable with the power of 0.26 in Example 1, and so subsequently we increase the number of tests used. Suppose we now have nx = 20 and ny = 20. If we again consider λ = 2, we can determine from figure 53 the power of detecting these sets of tests as coming from populations with unequal variances to be over 0.8, approximately 82 percent to 83 percent. If we proceed to conduct our F-test with these two samples, and conclude the underlying variances are equal, we certainly feel much more comfortable with our conclusions.

Figure 54 gives the appropriate OC curves to use if we choose to conduct an α = 0.01 level test. Again we see for equal variances σx2 and σy2, giving λ = 1, that β = 0.99, reduced from 1.0 only by the size of α.

F—test for Variances—Unequal Sample Sizes. Up to now the discussions and OC curves presented have been limited to the case when the two sample sizes are equal. Calculation routines were developed for this project for calculation of power for this test for any combination of sample sizes nx and ny. There are obviously an infinite number of possible combinations for nx and ny. So, it is not possible to present OC curves for every possibility. However, three sets of tables are provided herein which provide a subset of power calculations using some sample sizes that are of potential interest for comparing contractor and agency samples. These power calculations are presented in table form since there are too many variables to present in a single chart, and the data can be presented in a more compact form in tables than in a long series of charts. Table 37 gives power values for all combinations of sample sizes from 3 to 10, with the ratio of the two subpopulation standard deviations being 1, 2, 3, 4, and 5. Table 38 gives power values for the same sample sizes, but with the standard deviation ratios being 0.0, 0.2, 0.4, 0.6, 0.8, and 1.0. Table 39 gives power values for all combinations for sample sizes of 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, and 100, with the standard deviation ratio being 1, 2, or 3. An example below illustrates the use of the first of these tables to reference power for a hypothetical test.

 Figure 53. OC Curves for the Two—Sided F—Test for Level of Significance α = 0.05 (Source: Engineering Statistics by A. H. Bowker and G. J. Lieberman.)

 Figure 54. OC Curves for the Two—Sided F—Test for Level of Significanceα = 0.01 (Source: Engineering Statistics by A. H. Bowker and G. J. Lieberman.)

Table 37. F—test Power Values for n = 3 to 10 and s—ratio, λ = 1 to 5

λ ny nx Power
1 3 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
4 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
5 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
6 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
7 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
8 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
λ ny nx Power
1 9 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
10 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
2 3 3 0.09939
4 0.09753
5 0.09663
6 0.09620
7 0.09600
8 0.09590
9 0.09586
10 0.09585
4 3 0.14835
4 0.15169
5 0.15385
6 0.15544
7 0.15668
8 0.15767
9 0.15848
10 0.15915
5 3 0.19036
4 0.20240
5 0.21041
6 0.21622
7 0.22064
8 0.22413
9 0.22694
10 0.22926
6 3 0.22309
4 0.24464
5 0.25968
6 0.27093
7 0.27968
8 0.28669
9 0.29243
10 0.29722
λ ny nx Power
2 7 3 0.24820
4 0.27854
5 0.30055
6 0.31744
7 0.33086
8 0.34179
9 0.35087
10 0.35853
8 3 0.26768
4 0.30567
5 0.33401
6 0.35619
7 0.37410
8 0.38888
9 0.40129
10 0.41187
9 3 0.28308
4 0.32758
5 0.36144
6 0.38837
7 0.41036
8 0.42869
9 0.44421
10 0.45752
10 3 0.29549
4 0.34549
5 0.38414
6 0.41521
7 0.44081
8 0.46230
9 0.48060
10 0.49639
3 3 3 0.19034
4 0.19354
5 0.19556
6 0.19696
7 0.19798
8 0.19875
9 0.19934
10 0.19981
4 3 0.31171
4 0.33525
5 0.35007
6 0.36030
7 0.36777
8 0.37347
9 0.37795
10 0.38157

Table 37. F-test Power Values for n = 3 to 10 and s-ratio, λ = 1 to 5 (cont.)

λ ny nx Power
3 5 3 0.39758
4 0.44454
5 0.47603
6 0.49872
7 0.51588
8 0.52931
9 0.54011
10 0.54899
6 3 0.45403
4 0.51906
5 0.56396
6 0.59696
7 0.62225
8 0.64225
9 0.65846
10 0.67186
7 3 0.49230
4 0.57007
5 0.62436
6 0.66443
7 0.69516
8 0.71943
9 0.73906
10 0.75523
8 3 0.51945
4 0.60623
5 0.66693
6 0.71159
7 0.74565
8 0.77236
9 0.79378
10 0.81129
9 3 0.53955
4 0.63285
5 0.69797
6 0.74560
7 0.78161
8 0.80958
9 0.83177
10 0.84970
10 3 0.55494
4 0.65311
5 0.72136
6 0.77092
7 0.80803
8 0.83654
9 0.85890
10 0.87675
λ ny nx Power
4 3 3 0.29251
4 0.30367
5 0.31010
6 0.31427
7 0.31717
8 0.31930
9 0.32093
10 0.32222
4 3 0.46558
4 0.51179
5 0.54104
6 0.56126
7 0.57608
8 0.58742
9 0.59637
10 0.60363
5 3 0.56455
4 0.63665
5 0.68356
6 0.71649
7 0.74084
8 0.75955
9 0.77437
10 0.78638
6 3 0.62143
4 0.70759
5 0.76314
6 0.80150
7 0.82932
8 0.85027
9 0.86652
10 0.87943
7 3 0.65697
4 0.75074
5 0.81002
6 0.84993
7 0.87808
8 0.89866
9 0.91416
10 0.92613
8 3 0.68090
4 0.77901
5 0.83976
6 0.87961
7 0.90692
8 0.92628
9 0.94042
10 0.95100
λ ny nx Power
4 9 3 0.69798
4 0.79871
5 0.85988
6 0.89907
7 0.92520
8 0.94321
9 0.95598
10 0.96525
10 3 0.71073
4 0.81311
5 0.87423
6 0.91256
7 0.93751
8 0.95427
9 0.96583
10 0.97399
5 3 3 0.39165
4 0.41270
5 0.42481
6 0.43266
7 0.43815
8 0.44219
9 0.44530
10 0.44776
4 3 0.58713
4 0.64932
5 0.68814
6 0.71467
7 0.73394
8 0.74858
9 0.76007
10 0.76932
5 3 0.68068
4 0.76196
5 0.81171
6 0.84479
7 0.86811
8 0.88527
9 0.89836
10 0.90860
6 3 0.72975
4 0.81790
5 0.86956
6 0.90223
7 0.92409
8 0.93936
9 0.95041
10 0.95864

Table 37. F-test Power Values for n = 3 to 10 and s-ratio, λ = 1 to 5 (cont.)

λ ny nx Power
5 7 3 0.75893
4 0.84940
5 0.90024
6 0.93086
7 0.95030
8 0.96318
9 0.97201
10 0.97824
8 3 0.77800
4 0.86909
5 0.91845
6 0.94695
7 0.96423
8 0.97513
9 0.98225
10 0.98704
9 3 0.79133
4 0.88238
5 0.93024
6 0.95690
7 0.97244
8 0.98184
9 0.98772
10 0.99150
10 3 0.80115
4 0.89188
5 0.93838
6 0.96351
7 0.97767
8 0.98594
9 0.99092
10 0.99400

Table 38. F-test Power Values for n = 3 to 10 and s-ratio, λ = 0.0 to 1.0

λ ny nx Power
0.0 3 3 1.00000
4 1.00000
5 1.00000
6 1.00000
7 1.00000
8 1.00000
9 1.00000
10 1.00000
4 3 1.00000
4 1.00000
5 1.00000
6 1.00000
7 1.00000
8 1.00000
9 1.00000
10 1.00000
5 3 1.00000
4 1.00000
5 1.00000
6 1.00000
7 1.00000
8 1.00000
9 1.00000
10 1.00000
6 3 1.00000
4 1.00000
5 1.00000
6 1.00000
7 1.00000
8 1.00000
9 1.00000
10 1.00000
7 3 1.00000
4 1.00000
5 1.00000
6 1.00000
7 1.00000
8 1.00000
9 1.00000
10 1.00000
8 3 1.00000
4 1.00000
5 1.00000
6 1.00000
7 1.00000
8 1.00000
9 1.00000
10 1.00000
λ ny nx Power
0.0 9 3 1.00000
4 1.00000
5 1.00000
6 1.00000
7 1.00000
8 1.00000
9 1.00000
10 1.00000
10 3 1.00000
4 1.00000
5 1.00000
6 1.00000
7 1.00000
8 1.00000
9 1.00000
10 1.00000
0.2 3 3 0.39165
4 0.58713
5 0.68068
6 0.72975
7 0.75893
8 0.77800
9 0.79133
10 0.80115
4 3 0.41270
4 0.64932
5 0.76196
6 0.81790
7 0.84940
8 0.86909
9 0.88238
10 0.89188
5 3 0.42481
4 0.68814
5 0.81171
6 0.86956
7 0.90024
8 0.91845
9 0.93024
10 0.93838
6 3 0.43266
4 0.71467
5 0.84479
6 0.90223
7 0.93086
8 0.94695
9 0.95690
10 0.96351
λ ny nx Power
0.2 7 3 0.43815
4 0.73394
5 0.86811
6 0.92409
7 0.95030
8 0.96423
9 0.97244
10 0.97767
8 3 0.44219
4 0.74858
5 0.88527
6 0.93936
7 0.96318
8 0.97513
9 0.98184
10 0.98594
9 3 0.44530
4 0.76007
5 0.89836
6 0.95041
7 0.97201
8 0.98225
9 0.98772
10 0.99092
10 3 0.44776
4 0.76932
5 0.90860
6 0.95864
7 0.97824
8 0.98704
9 0.99150
10 0.99400
0.4 3 3 0.14221
4 0.22806
5 0.29564
6 0.34398
7 0.37868
8 0.40429
9 0.42380
10 0.43906
4 3 0.14250
4 0.24034
5 0.32488
6 0.38884
7 0.43614
8 0.47159
9 0.49879
10 0.52015

Table 38. F-test Power Values for n = 3 to 10 and s-ratio, λ = 0.0 to 1.0 (cont.)

λ ny nx Power
0.4 5 3 0.14291
4 0.24808
5 0.34448
6 0.42028
7 0.47749
8 0.52079
9 0.55411
10 0.58029
6 3 0.14332
4 0.25345
5 0.35863
6 0.44371
7 0.50889
8 0.55851
9 0.59674
10 0.62671
7 3 0.14369
4 0.25739
5 0.36934
6 0.46187
7 0.53357
8 0.58837
9 0.63057
10 0.66355
8 3 0.14399
4 0.26041
5 0.37772
6 0.47638
7 0.55351
8 0.61261
9 0.65804
10 0.69341
9 3 0.14424
4 0.26278
5 0.38447
6 0.48825
7 0.56996
8 0.63266
9 0.68076
10 0.71805
10 3 0.14445
4 0.26470
5 0.39001
6 0.49813
7 0.58375
8 0.64952
9 0.69984
10 0.73868
λ ny nx Power
0.6 3 3 0.07564
4 0.10273
5 0.12665
6 0.14614
7 0.16173
8 0.17425
9 0.18444
10 0.19283
4 3 0.07283
4 0.10212
5 0.13003
6 0.15430
7 0.17470
8 0.19170
9 0.20593
10 0.21791
5 3 0.07120
4 0.10174
5 0.13222
6 0.15988
7 0.18396
8 0.20461
9 0.22225
10 0.23736
6 3 0.07022
4 0.10157
5 0.13386
6 0.16407
7 0.19107
8 0.21472
9 0.23528
10 0.25314
7 3 0.06960
4 0.10153
5 0.13516
6 0.16736
7 0.19675
8 0.22292
9 0.24600
10 0.26628
8 3 0.06919
4 0.10155
5 0.13622
6 0.17003
7 0.20139
8 0.22972
9 0.25499
10 0.27741
λ ny nx Power
0.6 9 3 0.06891
4 0.10161
5 0.13711
6 0.17223
7 0.20526
8 0.23545
9 0.26265
10 0.28698
10 3 0.06870
4 0.10168
5 0.13786
6 0.17409
7 0.20854
8 0.24035
9 0.26925
10 0.29529
0.8 3 3 0.05467
4 0.06163
5 0.06758
6 0.07248
7 0.07649
8 0.07980
9 0.08255
10 0.08487
4 3 0.05202
4 0.05929
5 0.06587
6 0.07156
7 0.07642
8 0.08057
9 0.08412
10 0.08719
5 3 0.05017
4 0.05755
5 0.06448
6 0.07067
7 0.07612
8 0.08090
9 0.08508
10 0.08875
6 3 0.04883
4 0.05626
5 0.06340
6 0.06995
7 0.07584
8 0.08109
9 0.08577
10 0.08994

Table 38. F-test Power Values for n = 3 to 10 and s-ratio, λ = 0.0 to 1.0 (cont.)

λ ny nx Power
0.8 7 3 0.04785
4 0.05529
5 0.06258
6 0.06938
7 0.07560
8 0.08124
9 0.08633
10 0.09092
8 3 0.04709
4 0.05453
5 0.06193
6 0.06893
7 0.07541
8 0.08136
9 0.08680
10 0.09175
9 3 0.04650
4 0.05393
5 0.06141
6 0.06856
7 0.07527
8 0.08148
9 0.08721
10 0.09248
10 3 0.04603
4 0.05345
5 0.06099
6 0.06827
7 0.07516
8 0.08159
9 0.08757
10 0.09312
1.0 3 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
4 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
λ ny nx Power
1.0 5 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
6 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
7 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
8 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
9 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
10 3 0.05000
4 0.05000
5 0.05000
6 0.05000
7 0.05000
8 0.05000
9 0.05000
10 0.05000
λ ny nx Power
1 5 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05
10 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05
15 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05
λ ny nx Power
1 20 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05
25 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05
30 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05
λ ny nx Power
1 40 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05
50 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05
60 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05

Table 39. F-test Power Values for n = 5 to 100 and s-ratio, λ = 1 to 3 (cont.)

λ ny nx Power
1 70 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05
80 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05
90 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05
λ ny nx Power
1 100 5 0.05
10 0.05
15 0.05
20 0.05
25 0.05
30 0.05
40 0.05
50 0.05
60 0.05
70 0.05
80 0.05
90 0.05
100 0.05
2 5 5 0.21041
10 0.22926
15 0.23658
20 0.24043
25 0.24281
30 0.24442
40 0.24646
50 0.24770
60 0.24853
70 0.24913
80 0.24958
90 0.24993
100 0.25022
10 5 0.38414
10 0.49639
15 0.55109
20 0.58353
25 0.60501
30 0.62027
40 0.64053
50 0.65336
60 0.66221
70 0.66869
80 0.67363
90 0.67753
100 0.68068
λ ny nx Power
2 15 5 0.45487
10 0.62152
15 0.70573
20 0.75560
25 0.78820
30 0.81099
40 0.84054
50 0.85870
60 0.87092
70 0.87969
80 0.88626
90 0.89137
100 0.89545
20 20 0.49087
10 0.68548
15 0.78230
20 0.83747
25 0.87192
30 0.89495
40 0.92304
50 0.93906
60 0.94918
70 0.95606
80 0.96099
90 0.96468
100 0.96753
25 5 0.51241
10 0.72299
15 0.82516
20 0.88085
25 0.91389
30 0.93485
40 0.95864
50 0.97099
60 0.97817
70 0.98272
80 0.98578
90 0.98795
100 0.98955

Table 39. F-test Power Values for n = 5 to 100 and s-ratio, λ = 1 to 3 (cont.)

λ ny nx Power
2 30 5 0.52669
10 0.74730
15 0.85174
20 0.90637
25 0.93725
30 0.95585
40 0.97551
50 0.98476
60 0.98968
70 0.99256
80 0.99436
90 0.99556
100 0.99639
40 5 0.54439
10 0.77664
15 0.88220
20 0.93379
25 0.96067
30 0.97548
40 0.98924
50 0.99462
60 0.99702
70 0.99821
80 0.99886
90 0.99923
100 0.99945
50 5 0.55491
10 0.79358
15 0.89881
20 0.94770
25 0.97160
30 0.98387
40 0.99414
50 0.99757
60 0.99888
70 0.99943
80 0.99969
90 0.99982
100 0.99989
λ ny nx Power
2 60 5 0.56187
10 0.80456
15 0.90914
20 0.95588
25 0.97764
30 0.98820
40 0.99632
50 0.99869
60 0.99948
70 0.99977
80 0.99989
90 0.99995
100 0.99997
70 5 0.56683
10 0.81224
15 0.91614
20 0.96120
25 0.98137
30 0.99073
40 0.99745
50 0.99921
60 0.99972
70 0.99989
80 0.99996
90 0.99998
100 0.99999
80 5 0.57053
10 0.81791
15 0.92118
20 0.96490
25 0.98387
30 0.99235
40 0.99810
50 0.99947
60 0.99984
70 0.99994
80 0.99998
90 0.99999
100 1.00000
λ ny nx Power
2 90 5 0.57339
10 0.82226
15 0.92497
20 0.96762
25 0.98564
30 0.99345
40 0.99851
50 0.99962
60 0.99989
70 0.99997
80 0.99999
90 1.00000
100 1.00000
100 5 0.57568
10 0.82571
15 0.92793
20 0.96968
25 0.98696
30 0.99425
40 0.99879
50 0.99972
60 0.99993
70 0.99998
80 0.99999
90 1.00000
100 1.00000
3 5 5 0.47603
10 0.54899
15 0.57700
20 0.59187
25 0.60108
30 0.60736
40 0.61537
50 0.62026
60 0.62355
70 0.62593
80 0.62772
90 0.62911
100 0.63024

Table 39. F-test Power Values for n = 5 to 100 and s-ratio, λ = 1 to 3 (cont.)

λ ny nx Power
3 10 5 0.72136
10 0.87675
15 0.92836
20 0.95158
25 0.96404
30 0.97154
40 0.97985
50 0.98420
60 0.98681
70 0.98853
80 0.98973
90 0.99062
100 0.99130
15 5 0.78336
10 0.93786
15 0.97640
20 0.98918
25 0.99431
30 0.99669
40 0.99860
50 0.99928
60 0.99957
70 0.99972
80 0.99980
90 0.99985
100 0.99988
20 5 0.80975
10 0.95808
15 0.98816
20 0.99597
25 0.99841
30 0.99930
40 0.99982
50 0.99994
60 0.99998
70 0.99999
80 0.99999
90 1.00000
100 1.00000
λ ny nx Power
3 25 5 0.82417
10 0.96743
15 0.99254
20 0.99797
25 0.99936
30 0.99977
40 0.99996
50 0.99999
60 1.00000
70 1.00000
80 1.00000
90 1.00000
100 1.00000
30 5 0.83321
10 0.97267
15 0.99463
20 0.99877
25 0.99968
30 0.99990
40 0.99999
50 1.00000
60 1.00000
70 1.00000
80 1.00000
90 1.00000
100 1.00000
40 5 0.84390
10 0.97822
15 0.99654
20 0.99938
25 0.99987
30 0.99997
40 1.00000
50 1.00000
60 1.00000
70 1.00000
80 1.00000
90 1.00000
100 1.00000
λ ny nx Power
3 50 5 0.84999
10 0.98107
15 0.99738
20 0.99960
25 0.99993
30 0.99999
40 1.00000
50 1.00000
60 1.00000
70 1.00000
80 1.00000
90 1.00000
100 1.00000
60 5 0.85393
10 0.98279
15 0.99783
20 0.99971
25 0.99996
30 0.99999
40 1.00000
50 1.00000
60 1.00000
70 1.00000
80 1.00000
90 1.00000
100 1.00000
70 5 0.85668
10 0.98394
15 0.99812
20 0.99976
25 0.99997
30 1.00000
40 1.00000
50 1.00000
60 1.00000
70 1.00000
80 1.00000
90 1.00000
100 1.00000

Table 39. F-test Power Values for n = 5 to 100 and s-ratio, λ = 1 to 3 (cont.)

λ ny nx Power
3 80 5 0.85871
10 0.98476
15 0.99831
20 0.99980
25 0.99998
30 1.00000
40 1.00000
50 1.00000
60 1.00000
70 1.00000
80 1.00000
90 1.00000
100 1.00000
90 5 0.86026
10 0.98537
15 0.99844
20 0.99983
25 0.99998
30 1.00000
40 1.00000
50 1.00000
60 1.00000
70 1.00000
80 1.00000
90 1.00000
100 1.00000
100 5 0.86150
10 0.98584
15 0.99855
20 0.99985
25 0.99998
30 1.00000
40 1.00000
50 1.00000
60 1.00000
70 1.00000
80 1.00000
90 1.00000
100 1.00000

Example 4. Suppose we have nx = 10 contractor tests and ny = 6 agency tests, conduct an α = 0.05 level test, and accept that these two tests represent populations with equal variances. What power did our test have to discern if the populations from which these two sets of tests came were really rather different in variabilities? Suppose the true population standard deviation of the contractor's test population (σx) was twice as large as that of the agency's test population (σy), giving a standard deviation ratio value, λ = 2. If we enter table 37 with λ = 2, nx = 10, and ny = 6, we find the power to be 0.29722. This tells us that with samples of nx = 10 and ny = 6, we have slightly less than a 30 percent chance of detecting a standard deviation ratio of 2 (and correspondingly a four-fold difference in variances) as being different.

t—test for Means. Suppose we have two sets of measurements, assumed to be from normally distributed populations, and wish to conduct a two-sided or two-tailed test to see if these populations have equal means, i.e., μx = μy. Suppose we assume these two samples are from populations with unknown, but equal, variances. If these two samples are x1, x2, …, xnx with sample mean and sample variance , and y1, y2, …, yny with sample mean and sample variance , we can calculate

and accept Ho: μx = μy for values of t in the interval .

For this test, figure 51 or 52, depending upon the α value, shows the probability we have accepted the two samples as coming from populations with the same means. The horizontal axis scale is

where σ - σx = σy is the true common population standard deviation. We access the OC curves in figure 51 and 52 with a value for d of d* and a value for n of n' where

and

Example 5. Suppose we have nx = 8 contractor tests and ny = 8 agency tests, conduct an α = 0.05 level test, and accept that these two sets of tests represent populations with equal means. What power did our test really have to discern if the populations from which these two sets of tests came had different means? Suppose we consider a difference in these population means of 2 or more standard deviations as a noteworthy difference that we would like to detect with high probability. This would indicate that we are interested in d = 2. Calculating

and

We find from figure 51 that β ≈ 0.05 so that our power of detecting a mean difference of 2 or more σ would be approximately 95 percent.

Example 6. Suppose we consider an application where we still have a total of 16 tests, but with nx = 12 contractor tests and ny = 4 agency tests. Suppose that we are again interested in t—test performance in detecting a means difference of 2 standard deviations. Again

but now

We find from figure 51 that β ≈ 0.12 indicating a power of approximately 88 percent of detecting a mean difference of 2 or more standard deviations.

Figure 52 gives the appropriate OC curves for our use in conducting an α = 0.01 level test on means. This figure is accessed in the same manner as described above for figure 51.

### OC Curves for the Single Agency Test Method

This procedure involves comparing the mean of 5 to 10 contractor tests with a single agency test result. The two are considered to be similar if the agency test is within an allowable interval on either side of the mean of the contractor's test results. The allowable interval is determined by multiplying the sample range of the contractor's test results by a factor that depends on the number of contractor test results. The equations for computing the allowable intervals are shown in table 40.

This comparison method is adapted from an approach for calculating the confidence interval for estimating a population mean. A confidence interval for a population mean is calculated about a sample mean and defines an interval within which there is a given percent confidence that the true population mean falls. When the variability of the population is unknown, a t-distribution, rather than a normal distribution, is used to calculate the confidence interval for the population mean. The t-distribution is what is used to establish the critical values for the t-statistic that is used in the t—test procedure that was presented above.

When calculating a confidence interval for the population mean, the t-statistic, which is similar in general concept to the Z-statistic of a normal distribution, is used. The t-statistic depends upon the degrees of freedom, defined as n - 1 where n is the number of values used to obtain the sample mean. The confidence interval is defined by:

The value of t depends upon the number of degrees of freedom and the level of significance chosen for the confidence interval. For example, for a 98 percent confidence interval, the value of t would be the value such that 98 percent of a t-distribution with n - 1 degrees of freedom fell within the mean and ± t standard deviations.

The single agency test approach uses this 98 percent confidence interval to approximate the interval within which a single test result should fall if sampled from a population with mean and standard deviation equal to the sample mean and standard deviation of the contractor's test results. For simplicity, the sample range, R, instead of the sample standard deviation, is used to estimate the population standard deviation. The population standard deviation can be estimated by dividing the sample range by a factor known as d2. Therefore, R ÷ d2 is taken as an estimate of the population standard deviation.

The approach assumes that the population mean is equal to the sample mean of the contractor's tests and that the population standard deviation is equal to the contractor's sample range divided by d2. The interval within which the single agency test result must fall is defined by the interval within which 98 percent of the single test results should fall. The 98 percent confidence interval is calculated based on the t-statistic.

To arrive at the factors in the table for determining the interval around the contractor's test mean within which the agency test must fall, the t-statistic for a 98 percent confidence interval and n - 1 degrees of freedom is multiplied by (R ÷ d2). Since it is a two-sided confidence interval, a 98 percent confidence interval corresponds to the ± t-statistic, t.99, above or below which there is only 1 percent of the t-distribution. The values necessary to develop the interval factors for this comparison method are shown in table 40.

Table 40. Derivation of the Single Agency Test Method Allowable Intervals

Sample Size, n Degrees of Freedom,
n — 1
t—statistic for which there is a 1% chance of being exceeded, t.99 d2 Interval
10 9 2.821 3.078
9 8 2.896 2.970
8 7 2.998 2.847
7 6 3.143 2.704
6 5 3.365 2.534
5 4 3.747 2.326

To illustrate the lack of power that this method has to discern differences between populations, the computer program ONETEST was developed as part of FHWA Demonstration Project 89.[18] The ONETEST program assumes that the two sets of data have the same standard deviation value (an assumption that is part of the single test comparison method), and designates in standard deviation units the distance between the true means of the two datasets. The program then determines the probability of detecting the difference for various actual differences between the population means.

ONETEST was used to generate 6,000 comparisons for each of a number of different scenarios, i.e., comparing a single test result to samples of size 10, 9, 8, 7, 6, and 5. In each case, the two populations were assumed to have the same standard deviation, and the difference between the means of the two populations, stated in standard deviation units, Δ = (μ1 - μ2)/σ, varied from 0.0 to 3.0 in increments of 0.5. The results from this analysis are plotted as an OC curve in figure 55.

As can be seen in the OC curve in figure 55, even when the difference between population means was three standard deviations, the percentage of the time this procedure was able to determine a difference in populations ranged from only 58 percent for a sample size of 10 to 34 percent for a sample size of 5.

 n = 10n = 9n = 8n = 7n = 6n = 5 Difference in Population Means, = (μa — μb)/σFigure 55. OC Curves for the Single Agency Test Method