This report is an archived publication and may contain dated technical, contact, and link information

Federal Highway Administration >
Publications >
Research Publications >
02095 >
Appg.Cfm
Optimal Acceptance Standards for Statistical Construction Specifications

Publication Number: FHWA-RD-02-095

Appendix G

As discussed in chapter 5, verification testing can be of two types: test method verification testing that is done on split samples, or process verification testing that is done on independent samples. The procedures are different for each of these types of verification testing.

OC Curves for Test Method Verification

In chapter 5, two methods were considered for test method verification of split samples: the D2S method, which compares the contractor and agency results from a single split sample, and the paired t—test, which compares contractor and agency results from a number of split samples. OC curves, which plot the probability of detecting a difference versus the actual difference between the two populations, can be developed for either of these methods.

OC Curves for the D2S Verification Method

In the D2S method, a test is performed on a single split sample to compare agency and contractor test results. If we assume both of these samples are from normally distributed subpopulations, then we can calculate the variance of the difference and use it to calculate two standard deviation, or approximately 95 percent, limits for the sample difference quantity. Suppose the agency subpopulation has a variance and the contractor subpopulation has a variance . Since the variance of the difference in two random variables is the sum of the variances, the variance of the difference in an agency observation and a contractor observation is . The D2S limits are based on the test standard deviation provided. Let us call this test standard deviation σ_test. Under an assumption that , this variance of a difference becomes . The D2S limits are set as two times the standard deviation (i.e., approximately 95 percent limits) of the test differences. This therefore sets that the D2S limits at , which is , or ±2.8284 σ_test. Without loss of generality, we can assume σ_test = 1, along with assumption of a mean difference of 0, and use the standard normal distribution with region between -2.8284 and +2.8284 as acceptance region for the difference in an agency test result and a contractor test result. With these two limits fixed, we can calculate power of this decision-making process relative to various true differences in the underlying subpopulation means and/or various ratios of the true underlying subpopulation standard deviations.

These power values can conveniently be displayed as a three-dimensional surface. If we vary the mean difference along the first axis and the standard deviation ratio along a second axis, we can show power on the vertical axis. The agency subpopulation, the contractor subpopulation, or both, could have standard deviations smaller, about the same, or larger than the supplied σ_test value. Each of these cases is considered in the technical report for this project.^[17] For simplicity, herein we will consider only the case where one of the two subpopulations has standard deviation equal to the supplied σ_test. Figure 50 shows the OC curves for this case. Power values are shown where the ratio of the larger of agency or contractor standard deviation to the smaller of agency or contractor standard deviation is varied over the values 0, 1, 2, 3, 4, and 5. The mean difference given along the horizontal axis (values 0, 1, 2, 3) represents the difference in agency and contractor subpopulation means expressed as multiples of σ_test.

As can be seen in the figure, even when the ratio of the contractor and agency standard deviations is 5 and the difference between the contractor and agency means is 3 times the value for σ_test, there is less than a 70 percent chance of detecting the difference based on the results from a single split sample.

As is the case with any method based on a sample of size one, the D2S method does not have much power to detect differences between the contractor and agency populations. The appeal of the D2S method lies in its simplicity rather than its power.

Prob. Of Detecting a Difference, %		Std. Dev. Ratio
Mean Difference, in σ_test Units
Figure 50. OC Surface for the D2S Test Method Verification Method (Assuming the smaller σ= σ_test)

OC Curves for the Paired t—test Method

As noted in chapter 5, for the case in which it is desirable to compare more than one pair of split sample test results, the t—test for paired measurements can be used. But the question arises, how many pairs of test results should be used? This is where an OC curve is helpful. The OC curve, for a given level of α, plots on the vertical axis either the probability of not detecting, β, or detecting, 1 - β, a difference between two populations. The standardized difference between the two population means is plotted on the horizontal axis.

For a t—test for paired measurements, the standardized difference, d, is measured as:

D equals a quotient. The numerator of the quotient is the absolute value of Mu subscript C minus Mu subscript A. The denominator of the quotient is Sigma subscript D.

where:		=	the true absolute difference between the mean of the contractor's test result population (which is unknown) and the mean of the agency's test result population (which is unknown).
	σ_d	=	the standard deviation of the true population of signed differences between the paired tests (which is unknown).

The OC curves are developed for a given level of significance, α. It is evident from the OC curves that for any probability of not detecting a difference, β, (value on the vertical axis), the required n will increase as the difference, d, decreases (value on the horizontal axis). In some cases the desired β or d may require prohibitively large sample sizes. In that case a compromise must be made between the discriminating power desired, the cost of the amount of testing required, and the risk of claiming a difference when none exists.

OC curves for paired t—tests for α values of 0.05 and 0.01 appear in figures 51 and 52, respectively.

To use these OC curves the true standard deviation of the signed differences, σ_d, is assumed to be known, (or approximated based on published literature). After experience is gained with the process, σ_d can be more accurately defined and a better idea of the required number of tests determined.

Example 1. The number of pairs of split sample tests for verification of laboratory-compacted air voids using the Superpave Gyratory Compactor (SGC) is desired. The probability of not detecting a difference, β, is chosen as 20 percent or 0.20. (Some OC curves use 1 - β, known as the power of the test, on the vertical axis, but the only difference is the scale change, with 1 - β, in this case, being 80 percent). Assume that the absolute difference between μ_c and μ_a should not be greater than 1.25 percent, that the standard deviation using the SGC is 0.5 percent, and that α is selected as 0.01. This produces a d value of 1.25 percent/0.5 percent = 2.5. Reading this value on the horizontal axis and a β of 0.20 on the vertical axis in figure 52 shows that about 5 paired split-sample tests are necessary for the comparison.


Standardized Difference, d
Figure 51. OC Curves for a Two—Sided t—Test (α = 0.05 )
(Source: Experimental Statistics, by M. G. Natrella, National Bureau of Standards Handbook 91, 1963)


Standardized Difference, d
Figure 52. OC Curves for a Two-Sided t—Test (α = 0.01)
(Source: Experimental Statistics, by M. G. Natrella, National Bureau of Standards Handbook 91, 1963)

OC Curves for Process Verification

In chapter 5, two methods were considered for process verification using independently obtained samples: the F—test and t—test method, which compares the variances and means of sets of contractor and agency test results, and the single agency test method, which compares a single agency test result with 5 to 10 contractor test results. OC curves, which plot the probability of not detecting a difference, β, or detecting a difference, the power or 1 - β, versus the actual difference between the two populations, can be developed for either of these methods.

OC Curves for the F—test and t—test

One approach for comparing the contractor's test results with the agency's test results is to use the F—test and t—test comparisons of characteristics of the two data sets. To compare two populations that are assumed normally distributed, it is necessary to compare their means and their variabilities. An F—test is used to assess the size of the ratio of the variances, and a t—test is used to assess the degree of difference in the means. A question that needs to be answered is what power do these statistical tests have, when used with small to moderate size samples, to declare various differences in means and variances to be statistically significant differences. Some OC curves and examples of their use in power analysis follow.

F—test for Variances——Equal Sample Sizes. Suppose we have two sets of measurements assumed to come from normally distributed populations and wish to conduct a test to see if they come from populations that have the same variances, i.e., . Further suppose we select a level of significance of α = .05, meaning we are allowing up to 5 percent chance of incorrectly deciding the variances are different when they really are the same. If we assume these two samples are

x₁, x₂, …, x_nx_, and y₁, y₂ , …, y_ny,

calculate sample variances and , and construct

we would accept H_o : for values of F in the interval

For this two-sided or two-tailed test, figure 53 shows the probability we have accepted the two samples as coming from populations with the same variabilities. This probability is usually referred to as β, and the power of the test as 1 - β. Notice the horizontal axis is the quantity λ, where , the true standard deviation ratio. So for λ = 1, where the hypothesis of equal variance should certainly be accepted, it is accepted with probability 0.95, reduced from 1.0 only by the magnitude of our selected type I error risk, α. One major limiting factor for the use of figure 53 is the restriction that n_x = n_y = n.

Example 2. Suppose we have n_x = 6 contractor tests and n_y = 6 agency tests, conduct an α = 0.05 level test, and accept that these two sets of tests represent populations with equal variances. What power did our test have to discern if the populations from which these two sets of tests came were really rather different in variabilities? Suppose the true population standard deviation of the contractor tests (σ_x) was twice as large as that of the agency tests (σ_y), giving λ = 2. If we enter figure 53 with λ = 2 and n_x = n_y = 6, we find that β ≈ 0.74, or the power, 1 - β, is about 0.26. This tells us that with samples of n_x = 6 and n_y = 6, we only have 26 percent chance of detecting a standard deviation ratio of 2 (and correspondingly a four-fold difference in variance) as being different.

Example 3. Suppose we are not at all comfortable with the power of 0.26 in Example 1, and so subsequently we increase the number of tests used. Suppose we now have n_x = 20 and n_y = 20. If we again consider λ = 2, we can determine from figure 53 the power of detecting these sets of tests as coming from populations with unequal variances to be over 0.8, approximately 82 percent to 83 percent. If we proceed to conduct our F-test with these two samples, and conclude the underlying variances are equal, we certainly feel much more comfortable with our conclusions.

Figure 54 gives the appropriate OC curves to use if we choose to conduct an α = 0.01 level test. Again we see for equal variances σ_x² and σ_y², giving λ = 1, that β = 0.99, reduced from 1.0 only by the size of α.

F—test for Variances—Unequal Sample Sizes. Up to now the discussions and OC curves presented have been limited to the case when the two sample sizes are equal. Calculation routines were developed for this project for calculation of power for this test for any combination of sample sizes n_x and n_y. There are obviously an infinite number of possible combinations for n_x and n_y. So, it is not possible to present OC curves for every possibility. However, three sets of tables are provided herein which provide a subset of power calculations using some sample sizes that are of potential interest for comparing contractor and agency samples. These power calculations are presented in table form since there are too many variables to present in a single chart, and the data can be presented in a more compact form in tables than in a long series of charts. Table 37 gives power values for all combinations of sample sizes from 3 to 10, with the ratio of the two subpopulation standard deviations being 1, 2, 3, 4, and 5. Table 38 gives power values for the same sample sizes, but with the standard deviation ratios being 0.0, 0.2, 0.4, 0.6, 0.8, and 1.0. Table 39 gives power values for all combinations for sample sizes of 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, and 100, with the standard deviation ratio being 1, 2, or 3. An example below illustrates the use of the first of these tables to reference power for a hypothetical test.

Figure 53 (line graph). OC Curves for the Two-Sided F-Test (a=0.05)

Figure 53. OC Curves for the Two—Sided F—Test for Level of Significance
α = 0.05 (Source: Engineering Statistics by A. H. Bowker and G. J. Lieberman.)

Figure 54. OC Curves for the Two-Sided F-Test (a=0.01)

Figure 54. OC Curves for the Two—Sided F—Test for Level of Significance
α = 0.01 (Source: Engineering Statistics by A. H. Bowker and G. J. Lieberman.)

Table 37. F—test Power Values for n = 3 to 10 and s—ratio, λ = 1 to 5

λ	*n_y*	*n_x*	Power
1	3	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
	4	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
	5	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
	6	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
	7	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
	8	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000

λ	*n_y*	*n_x*	Power
1	9	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
	10	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
2	3	3	0.09939
		4	0.09753
		5	0.09663
		6	0.09620
		7	0.09600
		8	0.09590
		9	0.09586
		10	0.09585
	4	3	0.14835
		4	0.15169
		5	0.15385
		6	0.15544
		7	0.15668
		8	0.15767
		9	0.15848
		10	0.15915
	5	3	0.19036
		4	0.20240
		5	0.21041
		6	0.21622
		7	0.22064
		8	0.22413
		9	0.22694
		10	0.22926
	6	3	0.22309
		4	0.24464
		5	0.25968
		6	0.27093
		7	0.27968
		8	0.28669
		9	0.29243
		10	0.29722

λ	*n_y*	*n_x*	Power
2	7	3	0.24820
		4	0.27854
		5	0.30055
		6	0.31744
		7	0.33086
		8	0.34179
		9	0.35087
		10	0.35853
	8	3	0.26768
		4	0.30567
		5	0.33401
		6	0.35619
		7	0.37410
		8	0.38888
		9	0.40129
		10	0.41187
	9	3	0.28308
		4	0.32758
		5	0.36144
		6	0.38837
		7	0.41036
		8	0.42869
		9	0.44421
		10	0.45752
	10	3	0.29549
		4	0.34549
		5	0.38414
		6	0.41521
		7	0.44081
		8	0.46230
		9	0.48060
		10	0.49639
3	3	3	0.19034
		4	0.19354
		5	0.19556
		6	0.19696
		7	0.19798
		8	0.19875
		9	0.19934
		10	0.19981
	4	3	0.31171
		4	0.33525
		5	0.35007
		6	0.36030
		7	0.36777
		8	0.37347
		9	0.37795
		10	0.38157

Table 37. F-test Power Values for n = 3 to 10 and s-ratio, λ = 1 to 5 (cont.)

λ	*n_y*	*n_x*	Power
3	5	3	0.39758
		4	0.44454
		5	0.47603
		6	0.49872
		7	0.51588
		8	0.52931
		9	0.54011
		10	0.54899
	6	3	0.45403
		4	0.51906
		5	0.56396
		6	0.59696
		7	0.62225
		8	0.64225
		9	0.65846
		10	0.67186
	7	3	0.49230
		4	0.57007
		5	0.62436
		6	0.66443
		7	0.69516
		8	0.71943
		9	0.73906
		10	0.75523
	8	3	0.51945
		4	0.60623
		5	0.66693
		6	0.71159
		7	0.74565
		8	0.77236
		9	0.79378
		10	0.81129
	9	3	0.53955
		4	0.63285
		5	0.69797
		6	0.74560
		7	0.78161
		8	0.80958
		9	0.83177
		10	0.84970
	10	3	0.55494
		4	0.65311
		5	0.72136
		6	0.77092
		7	0.80803
		8	0.83654
		9	0.85890
		10	0.87675

λ	*n_y*	*n_x*	Power
4	3	3	0.29251
		4	0.30367
		5	0.31010
		6	0.31427
		7	0.31717
		8	0.31930
		9	0.32093
		10	0.32222
	4	3	0.46558
		4	0.51179
		5	0.54104
		6	0.56126
		7	0.57608
		8	0.58742
		9	0.59637
		10	0.60363
	5	3	0.56455
		4	0.63665
		5	0.68356
		6	0.71649
		7	0.74084
		8	0.75955
		9	0.77437
		10	0.78638
	6	3	0.62143
		4	0.70759
		5	0.76314
		6	0.80150
		7	0.82932
		8	0.85027
		9	0.86652
		10	0.87943
	7	3	0.65697
		4	0.75074
		5	0.81002
		6	0.84993
		7	0.87808
		8	0.89866
		9	0.91416
		10	0.92613
	8	3	0.68090
		4	0.77901
		5	0.83976
		6	0.87961
		7	0.90692
		8	0.92628
		9	0.94042
		10	0.95100

λ	*n_y*	*n_x*	Power
4	9	3	0.69798
		4	0.79871
		5	0.85988
		6	0.89907
		7	0.92520
		8	0.94321
		9	0.95598
		10	0.96525
	10	3	0.71073
		4	0.81311
		5	0.87423
		6	0.91256
		7	0.93751
		8	0.95427
		9	0.96583
		10	0.97399
5	3	3	0.39165
		4	0.41270
		5	0.42481
		6	0.43266
		7	0.43815
		8	0.44219
		9	0.44530
		10	0.44776
	4	3	0.58713
		4	0.64932
		5	0.68814
		6	0.71467
		7	0.73394
		8	0.74858
		9	0.76007
		10	0.76932
	5	3	0.68068
		4	0.76196
		5	0.81171
		6	0.84479
		7	0.86811
		8	0.88527
		9	0.89836
		10	0.90860
	6	3	0.72975
		4	0.81790
		5	0.86956
		6	0.90223
		7	0.92409
		8	0.93936
		9	0.95041
		10	0.95864

Table 37. F-test Power Values for n = 3 to 10 and s-ratio, λ = 1 to 5 (cont.)

λ	*n_y*	*n_x*	Power
5	7	3	0.75893
		4	0.84940
		5	0.90024
		6	0.93086
		7	0.95030
		8	0.96318
		9	0.97201
		10	0.97824
	8	3	0.77800
		4	0.86909
		5	0.91845
		6	0.94695
		7	0.96423
		8	0.97513
		9	0.98225
		10	0.98704
	9	3	0.79133
		4	0.88238
		5	0.93024
		6	0.95690
		7	0.97244
		8	0.98184
		9	0.98772
		10	0.99150
	10	3	0.80115
		4	0.89188
		5	0.93838
		6	0.96351
		7	0.97767
		8	0.98594
		9	0.99092
		10	0.99400

Table 38. F-test Power Values for n = 3 to 10 and s-ratio, λ = 0.0 to 1.0

λ	*n_y*	*n_x*	Power
0.0	3	3	1.00000
		4	1.00000
		5	1.00000
		6	1.00000
		7	1.00000
		8	1.00000
		9	1.00000
		10	1.00000
	4	3	1.00000
		4	1.00000
		5	1.00000
		6	1.00000
		7	1.00000
		8	1.00000
		9	1.00000
		10	1.00000
	5	3	1.00000
		4	1.00000
		5	1.00000
		6	1.00000
		7	1.00000
		8	1.00000
		9	1.00000
		10	1.00000
	6	3	1.00000
		4	1.00000
		5	1.00000
		6	1.00000
		7	1.00000
		8	1.00000
		9	1.00000
		10	1.00000
	7	3	1.00000
		4	1.00000
		5	1.00000
		6	1.00000
		7	1.00000
		8	1.00000
		9	1.00000
		10	1.00000
	8	3	1.00000
		4	1.00000
		5	1.00000
		6	1.00000
		7	1.00000
		8	1.00000
		9	1.00000
		10	1.00000

λ	*n_y*	*n_x*	Power
0.0	9	3	1.00000
		4	1.00000
		5	1.00000
		6	1.00000
		7	1.00000
		8	1.00000
		9	1.00000
		10	1.00000
	10	3	1.00000
		4	1.00000
		5	1.00000
		6	1.00000
		7	1.00000
		8	1.00000
		9	1.00000
		10	1.00000
0.2	3	3	0.39165
		4	0.58713
		5	0.68068
		6	0.72975
		7	0.75893
		8	0.77800
		9	0.79133
		10	0.80115
	4	3	0.41270
		4	0.64932
		5	0.76196
		6	0.81790
		7	0.84940
		8	0.86909
		9	0.88238
		10	0.89188
	5	3	0.42481
		4	0.68814
		5	0.81171
		6	0.86956
		7	0.90024
		8	0.91845
		9	0.93024
		10	0.93838
	6	3	0.43266
		4	0.71467
		5	0.84479
		6	0.90223
		7	0.93086
		8	0.94695
		9	0.95690
		10	0.96351

λ	*n_y*	*n_x*	Power
0.2	7	3	0.43815
		4	0.73394
		5	0.86811
		6	0.92409
		7	0.95030
		8	0.96423
		9	0.97244
		10	0.97767
	8	3	0.44219
		4	0.74858
		5	0.88527
		6	0.93936
		7	0.96318
		8	0.97513
		9	0.98184
		10	0.98594
	9	3	0.44530
		4	0.76007
		5	0.89836
		6	0.95041
		7	0.97201
		8	0.98225
		9	0.98772
		10	0.99092
	10	3	0.44776
		4	0.76932
		5	0.90860
		6	0.95864
		7	0.97824
		8	0.98704
		9	0.99150
		10	0.99400
0.4	3	3	0.14221
		4	0.22806
		5	0.29564
		6	0.34398
		7	0.37868
		8	0.40429
		9	0.42380
		10	0.43906
	4	3	0.14250
		4	0.24034
		5	0.32488
		6	0.38884
		7	0.43614
		8	0.47159
		9	0.49879
		10	0.52015

Table 38. F-test Power Values for n = 3 to 10 and s-ratio, λ = 0.0 to 1.0 (cont.)

λ	*n_y*	*n_x*	Power
0.4	5	3	0.14291
		4	0.24808
		5	0.34448
		6	0.42028
		7	0.47749
		8	0.52079
		9	0.55411
		10	0.58029
	6	3	0.14332
		4	0.25345
		5	0.35863
		6	0.44371
		7	0.50889
		8	0.55851
		9	0.59674
		10	0.62671
	7	3	0.14369
		4	0.25739
		5	0.36934
		6	0.46187
		7	0.53357
		8	0.58837
		9	0.63057
		10	0.66355
	8	3	0.14399
		4	0.26041
		5	0.37772
		6	0.47638
		7	0.55351
		8	0.61261
		9	0.65804
		10	0.69341
	9	3	0.14424
		4	0.26278
		5	0.38447
		6	0.48825
		7	0.56996
		8	0.63266
		9	0.68076
		10	0.71805
	10	3	0.14445
		4	0.26470
		5	0.39001
		6	0.49813
		7	0.58375
		8	0.64952
		9	0.69984
		10	0.73868

λ	*n_y*	*n_x*	Power
0.6	3	3	0.07564
		4	0.10273
		5	0.12665
		6	0.14614
		7	0.16173
		8	0.17425
		9	0.18444
		10	0.19283
	4	3	0.07283
		4	0.10212
		5	0.13003
		6	0.15430
		7	0.17470
		8	0.19170
		9	0.20593
		10	0.21791
	5	3	0.07120
		4	0.10174
		5	0.13222
		6	0.15988
		7	0.18396
		8	0.20461
		9	0.22225
		10	0.23736
	6	3	0.07022
		4	0.10157
		5	0.13386
		6	0.16407
		7	0.19107
		8	0.21472
		9	0.23528
		10	0.25314
	7	3	0.06960
		4	0.10153
		5	0.13516
		6	0.16736
		7	0.19675
		8	0.22292
		9	0.24600
		10	0.26628
	8	3	0.06919
		4	0.10155
		5	0.13622
		6	0.17003
		7	0.20139
		8	0.22972
		9	0.25499
		10	0.27741

λ	*n_y*	*n_x*	Power
0.6	9	3	0.06891
		4	0.10161
		5	0.13711
		6	0.17223
		7	0.20526
		8	0.23545
		9	0.26265
		10	0.28698
	10	3	0.06870
		4	0.10168
		5	0.13786
		6	0.17409
		7	0.20854
		8	0.24035
		9	0.26925
		10	0.29529
0.8	3	3	0.05467
		4	0.06163
		5	0.06758
		6	0.07248
		7	0.07649
		8	0.07980
		9	0.08255
		10	0.08487
	4	3	0.05202
		4	0.05929
		5	0.06587
		6	0.07156
		7	0.07642
		8	0.08057
		9	0.08412
		10	0.08719
	5	3	0.05017
		4	0.05755
		5	0.06448
		6	0.07067
		7	0.07612
		8	0.08090
		9	0.08508
		10	0.08875
	6	3	0.04883
		4	0.05626
		5	0.06340
		6	0.06995
		7	0.07584
		8	0.08109
		9	0.08577
		10	0.08994

Table 38. F-test Power Values for n = 3 to 10 and s-ratio, λ = 0.0 to 1.0 (cont.)

λ	*n_y*	*n_x*	Power
0.8	7	3	0.04785
		4	0.05529
		5	0.06258
		6	0.06938
		7	0.07560
		8	0.08124
		9	0.08633
		10	0.09092
	8	3	0.04709
		4	0.05453
		5	0.06193
		6	0.06893
		7	0.07541
		8	0.08136
		9	0.08680
		10	0.09175
	9	3	0.04650
		4	0.05393
		5	0.06141
		6	0.06856
		7	0.07527
		8	0.08148
		9	0.08721
		10	0.09248
	10	3	0.04603
		4	0.05345
		5	0.06099
		6	0.06827
		7	0.07516
		8	0.08159
		9	0.08757
		10	0.09312
1.0	3	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
	4	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000

λ	*n_y*	*n_x*	Power
1.0	5	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
	6	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
	7	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
	8	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
	9	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000
	10	3	0.05000
		4	0.05000
		5	0.05000
		6	0.05000
		7	0.05000
		8	0.05000
		9	0.05000
		10	0.05000

Table 39. F-test Power Values for n = 5 to 100 and s-ratio, λ = 1 to 3

λ	n_y	n_x	Power
1	5	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05
	10	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05
	15	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05

λ	n_y	n_x	Power
1	20	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05
	25	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05
	30	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05

λ	n_y	n_x	Power
1	40	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05
	50	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05
	60	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05

Table 39. F-test Power Values for n = 5 to 100 and s-ratio, λ = 1 to 3 (cont.)

λ	n_y	n_x	Power
1	70	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05
	80	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05
	90	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05

λ	n_y	n_x	Power
1	100	5	0.05
		10	0.05
		15	0.05
		20	0.05
		25	0.05
		30	0.05
		40	0.05
		50	0.05
		60	0.05
		70	0.05
		80	0.05
		90	0.05
		100	0.05
2	5	5	0.21041
		10	0.22926
		15	0.23658
		20	0.24043
		25	0.24281
		30	0.24442
		40	0.24646
		50	0.24770
		60	0.24853
		70	0.24913
		80	0.24958
		90	0.24993
		100	0.25022
	10	5	0.38414
		10	0.49639
		15	0.55109
		20	0.58353
		25	0.60501
		30	0.62027
		40	0.64053
		50	0.65336
		60	0.66221
		70	0.66869
		80	0.67363
		90	0.67753
		100	0.68068

λ	n_y	n_x	Power
2	15	5	0.45487
		10	0.62152
		15	0.70573
		20	0.75560
		25	0.78820
		30	0.81099
		40	0.84054
		50	0.85870
		60	0.87092
		70	0.87969
		80	0.88626
		90	0.89137
		100	0.89545
	20	20	0.49087
		10	0.68548
		15	0.78230
		20	0.83747
		25	0.87192
		30	0.89495
		40	0.92304
		50	0.93906
		60	0.94918
		70	0.95606
		80	0.96099
		90	0.96468
		100	0.96753
	25	5	0.51241
		10	0.72299
		15	0.82516
		20	0.88085
		25	0.91389
		30	0.93485
		40	0.95864
		50	0.97099
		60	0.97817
		70	0.98272
		80	0.98578
		90	0.98795
		100	0.98955

Table 39. F-test Power Values for n = 5 to 100 and s-ratio, λ = 1 to 3 (cont.)

λ	*n_y*	*n_x*	Power
2	30	5	0.52669
		10	0.74730
		15	0.85174
		20	0.90637
		25	0.93725
		30	0.95585
		40	0.97551
		50	0.98476
		60	0.98968
		70	0.99256
		80	0.99436
		90	0.99556
		100	0.99639
	40	5	0.54439
		10	0.77664
		15	0.88220
		20	0.93379
		25	0.96067
		30	0.97548
		40	0.98924
		50	0.99462
		60	0.99702
		70	0.99821
		80	0.99886
		90	0.99923
		100	0.99945
	50	5	0.55491
		10	0.79358
		15	0.89881
		20	0.94770
		25	0.97160
		30	0.98387
		40	0.99414
		50	0.99757
		60	0.99888
		70	0.99943
		80	0.99969
		90	0.99982
		100	0.99989

λ	*n_y*	*n_x*	Power
2	60	5	0.56187
		10	0.80456
		15	0.90914
		20	0.95588
		25	0.97764
		30	0.98820
		40	0.99632
		50	0.99869
		60	0.99948
		70	0.99977
		80	0.99989
		90	0.99995
		100	0.99997
	70	5	0.56683
		10	0.81224
		15	0.91614
		20	0.96120
		25	0.98137
		30	0.99073
		40	0.99745
		50	0.99921
		60	0.99972
		70	0.99989
		80	0.99996
		90	0.99998
		100	0.99999
	80	5	0.57053
		10	0.81791
		15	0.92118
		20	0.96490
		25	0.98387
		30	0.99235
		40	0.99810
		50	0.99947
		60	0.99984
		70	0.99994
		80	0.99998
		90	0.99999
		100	1.00000

λ	*n_y*	*n_x*	Power
2	90	5	0.57339
		10	0.82226
		15	0.92497
		20	0.96762
		25	0.98564
		30	0.99345
		40	0.99851
		50	0.99962
		60	0.99989
		70	0.99997
		80	0.99999
		90	1.00000
		100	1.00000
	100	5	0.57568
		10	0.82571
		15	0.92793
		20	0.96968
		25	0.98696
		30	0.99425
		40	0.99879
		50	0.99972
		60	0.99993
		70	0.99998
		80	0.99999
		90	1.00000
		100	1.00000
3	5	5	0.47603
		10	0.54899
		15	0.57700
		20	0.59187
		25	0.60108
		30	0.60736
		40	0.61537
		50	0.62026
		60	0.62355
		70	0.62593
		80	0.62772
		90	0.62911
		100	0.63024

Table 39. F-test Power Values for n = 5 to 100 and s-ratio, λ = 1 to 3 (cont.)

λ	*n_y*	*n_x*	Power
3	10	5	0.72136
		10	0.87675
		15	0.92836
		20	0.95158
		25	0.96404
		30	0.97154
		40	0.97985
		50	0.98420
		60	0.98681
		70	0.98853
		80	0.98973
		90	0.99062
		100	0.99130
	15	5	0.78336
		10	0.93786
		15	0.97640
		20	0.98918
		25	0.99431
		30	0.99669
		40	0.99860
		50	0.99928
		60	0.99957
		70	0.99972
		80	0.99980
		90	0.99985
		100	0.99988
	20	5	0.80975
		10	0.95808
		15	0.98816
		20	0.99597
		25	0.99841
		30	0.99930
		40	0.99982
		50	0.99994
		60	0.99998
		70	0.99999
		80	0.99999
		90	1.00000
		100	1.00000

λ	*n_y*	*n_x*	Power
3	25	5	0.82417
		10	0.96743
		15	0.99254
		20	0.99797
		25	0.99936
		30	0.99977
		40	0.99996
		50	0.99999
		60	1.00000
		70	1.00000
		80	1.00000
		90	1.00000
		100	1.00000
	30	5	0.83321
		10	0.97267
		15	0.99463
		20	0.99877
		25	0.99968
		30	0.99990
		40	0.99999
		50	1.00000
		60	1.00000
		70	1.00000
		80	1.00000
		90	1.00000
		100	1.00000
	40	5	0.84390
		10	0.97822
		15	0.99654
		20	0.99938
		25	0.99987
		30	0.99997
		40	1.00000
		50	1.00000
		60	1.00000
		70	1.00000
		80	1.00000
		90	1.00000
		100	1.00000

λ	*n_y*	*n_x*	Power
3	50	5	0.84999
		10	0.98107
		15	0.99738
		20	0.99960
		25	0.99993
		30	0.99999
		40	1.00000
		50	1.00000
		60	1.00000
		70	1.00000
		80	1.00000
		90	1.00000
		100	1.00000
	60	5	0.85393
		10	0.98279
		15	0.99783
		20	0.99971
		25	0.99996
		30	0.99999
		40	1.00000
		50	1.00000
		60	1.00000
		70	1.00000
		80	1.00000
		90	1.00000
		100	1.00000
	70	5	0.85668
		10	0.98394
		15	0.99812
		20	0.99976
		25	0.99997
		30	1.00000
		40	1.00000
		50	1.00000
		60	1.00000
		70	1.00000
		80	1.00000
		90	1.00000
		100	1.00000

Table 39. F-test Power Values for n = 5 to 100 and s-ratio, λ = 1 to 3 (cont.)

λ	*n_y*	*n_x*	Power
3	80	5	0.85871
		10	0.98476
		15	0.99831
		20	0.99980
		25	0.99998
		30	1.00000
		40	1.00000
		50	1.00000
		60	1.00000
		70	1.00000
		80	1.00000
		90	1.00000
		100	1.00000
	90	5	0.86026
		10	0.98537
		15	0.99844
		20	0.99983
		25	0.99998
		30	1.00000
		40	1.00000
		50	1.00000
		60	1.00000
		70	1.00000
		80	1.00000
		90	1.00000
		100	1.00000
	100	5	0.86150
		10	0.98584
		15	0.99855
		20	0.99985
		25	0.99998
		30	1.00000
		40	1.00000
		50	1.00000
		60	1.00000
		70	1.00000
		80	1.00000
		90	1.00000
		100	1.00000

Example 4. Suppose we have n_x = 10 contractor tests and n_y = 6 agency tests, conduct an α = 0.05 level test, and accept that these two tests represent populations with equal variances. What power did our test have to discern if the populations from which these two sets of tests came were really rather different in variabilities? Suppose the true population standard deviation of the contractor's test population (σ_x) was twice as large as that of the agency's test population (σ_y), giving a standard deviation ratio value, λ = 2. If we enter table 37 with λ = 2, n_x = 10, and n_y = 6, we find the power to be 0.29722. This tells us that with samples of n_x = 10 and n_y = 6, we have slightly less than a 30 percent chance of detecting a standard deviation ratio of 2 (and correspondingly a four-fold difference in variances) as being different.

t—test for Means. Suppose we have two sets of measurements, assumed to be from normally distributed populations, and wish to conduct a two-sided or two-tailed test to see if these populations have equal means, i.e., μ_x = μ_y. Suppose we assume these two samples are from populations with unknown, but equal, variances. If these two samples are x₁, x₂, …, x_nx with sample mean and sample variance , and y₁, y₂, …, y_ny with sample mean and sample variance , we can calculate

and accept H_o: μ_x = μ_y for values of t in the interval .

For this test, figure 51 or 52, depending upon the α value, shows the probability we have accepted the two samples as coming from populations with the same means. The horizontal axis scale is

where σ - σ_x = σ_y is the true common population standard deviation. We access the OC curves in figure 51 and 52 with a value for d of d* and a value for n of n' where

and

N prime is equal to N subscript X plus N subscript Y minus 1. This is then shown to be equal to 8 plus 8 minus 1, which results in a sum of 15.

Example 5. Suppose we have n_x = 8 contractor tests and n_y = 8 agency tests, conduct an α = 0.05 level test, and accept that these two sets of tests represent populations with equal means. What power did our test really have to discern if the populations from which these two sets of tests came had different means? Suppose we consider a difference in these population means of 2 or more standard deviations as a noteworthy difference that we would like to detect with high probability. This would indicate that we are interested in d = 2. Calculating

and

equation

We find from figure 51 that β ≈ 0.05 so that our power of detecting a mean difference of 2 or more σ would be approximately 95 percent.

Example 6. Suppose we consider an application where we still have a total of 16 tests, but with n_x = 12 contractor tests and n_y = 4 agency tests. Suppose that we are again interested in t—test performance in detecting a means difference of 2 standard deviations. Again

but now

We find from figure 51 that β ≈ 0.12 indicating a power of approximately 88 percent of detecting a mean difference of 2 or more standard deviations.

Figure 52 gives the appropriate OC curves for our use in conducting an α = 0.01 level test on means. This figure is accessed in the same manner as described above for figure 51.

OC Curves for the Single Agency Test Method

This procedure involves comparing the mean of 5 to 10 contractor tests with a single agency test result. The two are considered to be similar if the agency test is within an allowable interval on either side of the mean of the contractor's test results. The allowable interval is determined by multiplying the sample range of the contractor's test results by a factor that depends on the number of contractor test results. The equations for computing the allowable intervals are shown in table 40.

This comparison method is adapted from an approach for calculating the confidence interval for estimating a population mean. A confidence interval for a population mean is calculated about a sample mean and defines an interval within which there is a given percent confidence that the true population mean falls. When the variability of the population is unknown, a t-distribution, rather than a normal distribution, is used to calculate the confidence interval for the population mean. The t-distribution is what is used to establish the critical values for the t-statistic that is used in the t—test procedure that was presented above.

When calculating a confidence interval for the population mean, the t-statistic, which is similar in general concept to the Z-statistic of a normal distribution, is used. The t-statistic depends upon the degrees of freedom, defined as n - 1 where n is the number of values used to obtain the sample mean. The confidence interval is defined by:

The value of t depends upon the number of degrees of freedom and the level of significance chosen for the confidence interval. For example, for a 98 percent confidence interval, the value of t would be the value such that 98 percent of a t-distribution with n - 1 degrees of freedom fell within the mean and ± t standard deviations.

The single agency test approach uses this 98 percent confidence interval to approximate the interval within which a single test result should fall if sampled from a population with mean and standard deviation equal to the sample mean and standard deviation of the contractor's test results. For simplicity, the sample range, R, instead of the sample standard deviation, is used to estimate the population standard deviation. The population standard deviation can be estimated by dividing the sample range by a factor known as d2. Therefore, R ÷ d₂ is taken as an estimate of the population standard deviation.

The approach assumes that the population mean is equal to the sample mean of the contractor's tests and that the population standard deviation is equal to the contractor's sample range divided by d₂. The interval within which the single agency test result must fall is defined by the interval within which 98 percent of the single test results should fall. The 98 percent confidence interval is calculated based on the t-statistic.

To arrive at the factors in the table for determining the interval around the contractor's test mean within which the agency test must fall, the t-statistic for a 98 percent confidence interval and n - 1 degrees of freedom is multiplied by (R ÷ d₂). Since it is a two-sided confidence interval, a 98 percent confidence interval corresponds to the ± t-statistic, t.₉₉, above or below which there is only 1 percent of the t-distribution. The values necessary to develop the interval factors for this comparison method are shown in table 40.

Table 40. Derivation of the Single Agency Test Method Allowable Intervals

Sample Size, n	Degrees of Freedom, n — 1	t—statistic for which there is a 1% chance of being exceeded, t_.99	d₂
10	9	2.821	3.078
9	8	2.896	2.970
8	7	2.998	2.847
7	6	3.143	2.704
6	5	3.365	2.534
5	4	3.747	2.326

To illustrate the lack of power that this method has to discern differences between populations, the computer program ONETEST was developed as part of FHWA Demonstration Project 89.^[18] The ONETEST program assumes that the two sets of data have the same standard deviation value (an assumption that is part of the single test comparison method), and designates in standard deviation units the distance between the true means of the two datasets. The program then determines the probability of detecting the difference for various actual differences between the population means.

ONETEST was used to generate 6,000 comparisons for each of a number of different scenarios, i.e., comparing a single test result to samples of size 10, 9, 8, 7, 6, and 5. In each case, the two populations were assumed to have the same standard deviation, and the difference between the means of the two populations, stated in standard deviation units, Δ = (μ₁ - μ₂)/σ, varied from 0.0 to 3.0 in increments of 0.5. The results from this analysis are plotted as an OC curve in figure 55.

As can be seen in the OC curve in figure 55, even when the difference between population means was three standard deviations, the percentage of the time this procedure was able to determine a difference in populations ranged from only 58 percent for a sample size of 10 to 34 percent for a sample size of 5.

		n = 10 n = 9 n = 8 n = 7 n = 6 n = 5
Difference in Population Means, = (μ_a — μ_b)/σ Figure 55. OC Curves for the Single Agency Test Method

Previous | Table of Contents | Next

Page Owner: Office of Research, Development, and Technology, Office of Infrastructure, RDT

Topics: research, infrastructure, pavements and materials
Keywords: research, infrastructure, pavements and materials, Quality Assurance, Quality Control, Specifications, Statistical Specifications, QA, QC, Payment Adjustments
TRT Terms: research, facilities, transportation, highway facilities, roads, parts of roads, pavements, Road construction industry--United States--Quality control, Quality assurance--United States, Highway departments--United States--States--Quality control, Specifications, Statistical quality control
Scheduled Update: Archive - No Update needed

This page last modified on 03/08/2016