U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations
|This report is an archived publication and may contain dated technical, contact, and link information|
Publication Number: FHWA-RD-02-095
This chapter continues the discussion of Phase II of the specification development process. This chapter is intended to provide "how to use" best practices in the development of procedures for determining the acceptability of and payment for materials and construction. The steps that are involved in this part of the process are identified in the flowchart in figure 11. The numbers in boxes before the titles of the following sections refer to the corresponding box in the flowchart.
The last chapter presented the steps in determining what quality characteristics should be measured as part of the acceptance decision. It must next be decided if each of the characteristics to be measured will be used in the payment factor determination. If the answer is yes, the next step is to determine the quality measure to be used. If not, the property may be used as a screening test on a pass/fail basis.
A screening test is one for which the results can be determined quickly enough to prevent "non-conforming" material from being incorporated into the project. A pass/fail criterion is often used with this procedure. The advantage of a screening test is that it can prevent poor quality material from being incorporated into the project, and thus does not require a payment relationship. A potential disadvantage of this type of requirement is that, if a small sample size is used, there is an increased risk of making an incorrect decision. This is illustrated in an example later in this chapter.
In practice, either the agency or the contractor may run screening tests. The agency would use the tests as an acceptance function, while the contractor may choose to use screening tests as a QC function. Either way, the intent is to keep nonconforming material from being incorporated into the project. With the decrease in on-site agency personnel, more screening tests are being performed by the contractor. If this is done, there may be a requirement that agency personnel either "witness" the testing or examine the results on the test reports. Since these tests are pass/fail tests, typically no payment determination is made for nonconforming product. Instead, if this occurs, the product is not incorporated into the project. An exception to this general rule is discussed in case study 1 in chapter 9.
If the quality characteristic is to be used for payment determination, the quality measure to be related to the payment must be decided upon. There are several quality measures that can be used. In past acceptance plans, the average, or the average deviation from a target value, was often used as the quality measure. However, the use of the average alone provides no measure of variability, and it is now recognized that variability is often an important predictor of performance.
Several quality measures, including percent defective (PD) and percent within limits (PWL), have been preferred in recent years because they simultaneously measure both the average level and the variability in a statistically efficient way. Other measures that have been used by some agencies include the average absolute deviation (AAD) and the moving average. An additional measure that may be considered by some agencies is the conformal index (CI). Some of these measures are more discriminating than others, and the choice of the most effective measure can translate directly into economic savings, due either to a reduced inspection or testing effort or to a lesser amount of poor product accepted, or to both.
The TRB glossary (2) includes the following definition (where LSL and USL represent lower and upper specification limits, respectively):
This quality measure uses the sample mean and the sample standard deviation to estimate the percentage of the population (lot) that is within the specification limits. This is called the PWL method, and is similar in concept to determining the area under the normal curve.
In theory, the use of the PWL (or PD) method assumes that the population being sampled is normally distributed. In practice, it has been found that statistical estimates of quality are reasonably accurate provided the sampled population is at least approximately normal, i.e., reasonably bell shaped and not bimodal or highly skewed. Normality tests and their application are discussed in appendix H.
32.1.1. Estimating PWL. Conceptually, the PWL procedure is based on the normal distribution. The area under the normal curve can be calculated to determine the percentage of the population that is within certain limits. Similarly, the percentage of the lot that is within the specification limits can be estimated. Instead of using the Z-value and the standard normal curve, a similar statistic, the quality index, Q, is used to estimate PWL. The Q-value is used with a PWL table to determine the estimated PWL for the lot.
A sample PWL table is shown in table 5. A different format for a table relating Q values with the appropriate PWL estimate is shown for a sample size of n = 5 in table 6. A more complete set of PWL tables in this format, for sample sizes from n = 3 to n = 30, is available. (18) Another way of relating Q and PWL values is presented in table 7. In this table a range of Q values is associated with each PWL value. This table was developed by an agency such that any estimated PWL is rounded up to the next integer PWL value. Other possible rounding rules could be used to develop similar tables. The rounding rule in table 7 is the one that is most favorable to the contractor since it rounds any PWL number up to the next whole number. For example, 89.01 is rounded up to 90.00 in table 7.
32.1.2. Calculation and Rounding Procedures. As the previous paragraph illustrates, the calculation procedures and rounding rules can influence the estimated PWL value that is obtained. This can become a point of contention, particularly if the payment determination is based on the estimated PWL value. It is therefore important that the agency stipulate the specific calculation process, including number of decimal places to be carried in the calculations, as well as the exact manner in which the PWL table is to be used.
For example, in table 5, is the PWL value to be selected by rounding up, rounding down, or by linear interpolation. Each of these will result in a different estimated PWL value. For instance, if the sample size is n = 5, and the calculated Q value is 1.18, the estimated PWL values for rounding up, rounding down, and interpolating would be 89, 88, and 88.5, respectively.
32.1.3. The Quality Index and PWL. The Z-statistic that is used with the standard normal table, an example of which is shown in table 8, uses the population mean as the point of reference from which the area under the curve is measured:
The statistic Z, therefore, measures distance above or below the mean, µ, using the number of standard deviation units, s, as the measurement scale. This is illustrated in figure 12.
|PWL||n = 3||n = 4||n = 5||n = 6||n = 7||n = 8||n = 9||n = 10|
|PWL||n = 12 to 14||n = 15 to 18||n = 19 to 25||n = 26 to 37||n = 38 to 69||n = 70 to 200||n = 201 to ∞|
Values in the body of the table are estimates of PWL corresponding to specific values of QL = ( - L) / s or QU = (U - ) / s. For negative Q values, the table values must be subtracted from 100.
A more complete set of PWL tables in this format, for sample sizes from n = 3 to n = 30, is available.(18)
Conceptually, the Q-statistic, or quality index, performs identically the same function as the Z-statistic except that now the reference point is the mean of an individual sample, , instead of the population mean, µ, and the points of interest with regard to areas under the curve are the specification limits.
The Q-statistic, therefore, represents the distance in sample standard deviation units that the sample mean is offset from the specification limit. A positive Q-statistic represents the number of sample standard deviation units that the sample mean falls inside the specification limit. Conversely, a negative Q-statistic represents the number of sample standard deviation units that the sample mean falls outside the specification limit. These cases are illustrated in figures 13 and 14.
QL is used when there is a one-sided lower specification limit, while QU is used when there is a one-sided upper specification limit. For two-sided specification limits, the PWL value is estimated as:
PWLT = PWLU + PWLL - 100 (13)
An example using a simplified portland cement concrete specification can be used to explain the PWL concept. In this example, the minimum specification limit for strength is 21,000 kPa. One requirement of the PWL procedure is that the sample size must be greater than n = 2 since both the sample mean and sample standard deviation are necessary to estimate PWL. For this specification, the sample size has been chosen as n = 4. Furthermore, the specification requires that at least 95 percent of the lot exceed the minimum strength (i.e., PWL > 95). Table 5 shows that the minimum Q value is 1.35 for 95 PWL and a sample size of n = 4. Whenever the mean is 1.35s above the specification limit, the lot is accepted. However, as used most frequently, the Q value will be used to compute the PWL and that will, in turn, be used to determine a payment factor.
For example, suppose that the acceptance tests for a lot have a sample mean of 25,000 kPa and a sample standard deviation of 3,400 kPa. Does this lot meet the specification requirement? The quality index value is calculated as:
Using this Q-value, n = 4, and table 5, the estimated PWL for the lot is between 89 and 90. This is less than the required 95, so the lot does not meet the specified strength requirement.
Intuitively, PWL is a good measure of quality since it is reasonable to assume that the more of the material that is within the specification limits, the better the quality of the product should be. A detailed discussion and analysis of the PWL measure of quality is presented in the technical report for the project. (17)
As noted in the above definition for PWL, PD is related to the PWL by the simple relationship PWL = 100 - PD. The use of PD as a quality measure can have some advantages, particularly with two-sided specifications, because the PD below the lower specification limit can simply be added to the PD above the upper specification limit to obtain the total PD value. This is slightly easier than equation 13 that is required when using PWL. The relationship between PD and PWL is shown in figure 15.
Since PD and PWL can be converted to one another simply by subtracting from 100, they are equivalent quality measures. Therefore, any discussion of PWL will apply equally to PD. Most agencies have preferred to measure quality in terms of how much of the material meets the requirements, i.e., PWL, rather than to measure how much does not meet the requirements, i.e., PD. This approach to measuring how much is "good" as opposed to how much is "bad" seems to have been more popular among agencies. The PWL approach, rather than PD, has also been promoted by the FHWA. For these reasons, most discussions in this manual center on PWL rather than PD. However, examples and case studies for both methods are included.
For specifications that have a target value, the average deviation from the target value has in the past sometimes been used as a means for determining acceptability of the product. This approach can have the effect of encouraging the contractor to manipulate its process during the production of a lot. For example, if two test results in the morning are below the target value, there is a strong incentive for the contractor to increase the process mean in the afternoon in an effort to get two higher test results so that the average of the four tests for the lot will be near the target value. In essence, this acceptance approach encourages the contractor to increase process variability by making frequent adjustments to the process mean.
The average deviation from the target value should NOT be used as the quality measure for QA acceptance plans.
To avoid the problem of over-adjusting the process in response to early test results, the average absolute deviation from the target can be used for the acceptance decision. The TRB glossary (2) includes the following definition:
The equation for calculating AAD is as follows:
By taking the absolute value of the deviation from the target, the contractor cannot benefit by any strategy other than aiming for the target value. For example, if two early tests on the lot indicate values below the target by -0.4 and -0.6, using AAD the contractor cannot offset these low values with two later values of +0.5 and +0.5. This is true because, while the average of these four numbers is (-0.4 -0.6 +0.5 +0.5) ÷ 4 = 0.0, the average of the absolute values of these numbers is (0.4 + 0.6 + 0.5 +0.5) ÷ 4 = 0.5.
Intuitively, AAD is a good measure of quality since it is reasonable to assume that the lower the average absolute deviation, the closer the process is to the target and hence the better the quality of the product should be. There are, however, some disadvantages associated with AAD acceptance plans. One primary drawback is that they can only be used when there is a target value. They cannot, therefore, be used when there is only one specification limit, such as might be the case with concrete compressive strength.
Another drawback with using AAD is that the variability of the material in the lot may not be adequately measured. Specifically, very different sets of test results can give identical AAD values. For example, the three sets of four tests shown in table 9 each give the same value for AAD. Each set of tests would therefore yield the same payment for a lot. Not only must it be wondered if equal payment is appropriate for these widely different conditions, the use of AAD fails to document these differences that could be useful for considering future modifications of the specification. Specifically, the sample means and sample standard deviations (measure of variability) vary considerably for the three cases. With acceptance based solely on AAD, these mean and variability differences would not be identified.
A detailed discussion and analysis of the AAD measure of quality is presented in the technical report for the project. (17)
|X - T|
|Test||Lot 1||Lot 2||Lot 3|
|Sample Std Dev||0.58||0.08||0.08|
The TRB glossary (2) includes the following definition:
Conceptually, the CI is similar to the AAD. While the AAD uses the average of the absolute values of the individual deviations from the target value, the CI uses the squares of the individual deviations from the target value. The CI is also similar in concept to the standard deviation. The standard deviation is the root mean square of differences from the mean; whereas the CI is the root mean square of differences from a target such as the job mix formula for HMAC. Like the AAD, the CI discourages mid-lot process adjustments by not allowing positive and negative deviations from the target to cancel out one another. As shown in the TRB glossary definition above, the CI is calculated as follows:
The CI is very similar in practice to the AAD, and has the same disadvantages of not being appropriate for a one-sided specification and of potentially having the same CI value for vastly different sample results. Table 10 shows the CI, sample mean, and sample standard deviation values for three example lots.
A detailed discussion and analysis of the CI measure of quality is presented in the technical report for the project. (17)
|Test||Lot 1||Lot 2||Lot 3|
|Sample Std Dev||1.73||0.08||0.08|
A few agencies have developed acceptance procedures based on the moving average of the quality characteristic. For moving averages first a sample size, say n = 4, must be determined. The first average is then determined from the first four values. For the second moving average, the fifth value replaces the first value in the calculations. For the third moving average, the sixth value replaces the second value, and so on. This process is illustrated in figure 16.
The use of the moving average provides a smoothing effect compared to plotting individual test results. Moving averages have typically been applied for process control purposes, and are particularly appropriate when continuous processes are involved. Moving averages have frequently been used for process control in the manufacture of chemicals.
Moving averages were not developed for use as an acceptance approach, and they have several drawbacks when applied in acceptance plans. Due to the nature in which it is calculated, the use of the moving average is not consistent with the use of lot-by-lot acceptance. Since each individual test result appears in multiple moving averages, it is difficult to determine when or where a lot begins or ends. Also, when acceptance is based on a lot, it is assumed that the various lots are independent of one another and that the material within each lot represents material from a constant process or population. Since each individual test result appears in several moving averages, the moving averages are obviously correlated and the results of one average are not independent of the next.
The use of moving averages also does not lend itself well to determining payment factors. Since successive moving averages are correlated, and since individual lots are not well defined, it is not easy to determine payment factors on a lot-by-basis. As a result, acceptance procedures based on the moving average often result in production shut downs and plant adjustments rather than determining appropriate payment factors for specific production lots.
The moving average should NOT be used as the quality measure for QA acceptance plans.
It is necessary to measure both center and spread when characterizing a lot (population) of material. Even if appropriate typical process standard deviation and target miss values are used to establish the acceptance procedures, there are potential difficulties with AAD and CI quality measures. One drawback with AAD and CI acceptance plans is that since lot variabilities are not directly measured, a given lot AAD or CI can come from a number of different populations. For example, the population could be centered at the target, but have a relatively large standard deviation, i.e., larger than the one that was assumed when developing the AAD or CI acceptance limits or payment equation. Another population could have the same AAD or CI by being centered off the target and having the same standard deviation that was assumed when developing the acceptance values. A third population could have the same AAD or CI value by having a mean far from the target, but also having a relatively small standard deviation.
While some of these drawbacks may also apply to PWL (or PD) acceptance plans, such as the fact that a given PWL can represent many different populations, there are fewer drawbacks due to the fact that both sample mean and standard deviation are determined in the PWL method. Also, since the PWL method can be used with both one-sided and two-sided acceptance properties, it is more versatile since it does not require separate approaches for one-sided and two-sided cases. The PWL approach has been endorsed and recommended by the FHWA for many years, and it is also the method used in the AASHTO QA Guide Specification. (7)
PWL is the recommended quality measure. It should be noted that PD is equally suitable and provides all the same mathematical advantages as PWL. The case studies in chapter 9, for example, use PD as the quality measure. A detailed analysis and evaluation of the PWL, PD, AAD, and CI measures, including how non-normal populations impact these measures, is included in the technical report for this project. (17) Any agency that is considering using the AAD or CI quality measures should thoroughly review the technical report before making a final decision
This is similar to the decision for quality measures that was made for payment determination characteristics, with the possible exception of the number of test results available for the decision. Because there may be only a few test results for the pass/fail decision, usually one or two, this may be insufficient to allow a rigorous statistical analysis. Thus, the acceptance decision may have to be made on a single test or the average of two tests. As mentioned above, this will increase the risks involved. Due to the small sample sizes involved, PWL and PD quality measures are not likely to be options. In most cases, the screening test becomes a simple pass/fail attributes plan. That is, does the one test fall within the allowable limits or does it not?
The AQL and RQL must be defined, and the specification limits and acceptance limits must be determined. The AQL, RQL, and specification limits, are intimately related and the decisions regarding these are typically made concurrently. Decisions regarding acceptance limits are related to the risks to the contractor and agency, and the acceptance limits are typically determined based on a risk analysis (see chapter 7).
Specification and acceptance limits are often confused. The TRB glossary (2) provides the following definitions:
Specification limits are based on engineering requirements and are usually expressed in the same units as those used for the quality characteristic of concern. The acceptance limits are usually expressed in statistical units (e.g., mean, PD, PWL, AAD, etc.). For accept or reject acceptance plans, the acceptance limits are established from a risk analysis (see chapter 7). For acceptance plans with payment adjustment provisions, additional acceptance limits, often expressed in the form of an equation or equations, are used to distinguish among the various possible payment levels. Payment provisions are presented later in this chapter, while the analysis of risks associated with payment adjustment provisions is presented in chapter 7.
The TRB glossary (2) offers the following definitions for AQL and RQL:
Establishing specification requires defining acceptable (AQL) and unacceptable (RQL) material. These are both engineering decisions. The AQL decision defines acceptable material, and should address the material that will provide satisfactory service at an affordable cost when used for the intended purpose. What constitutes acceptable material is often determined based on what has performed well in the past. However, it is preferable if performance data are available to quantify performance. Statistics has been a valuable tool in defining the parameters (mean and standard deviation) for acceptable material. Caution should be exercised if a lower variability is chosen for the specification than has been shown to be readily achievable. Arbitrarily "tightening the specs" can increase the cost of the material above that which may be cost effective, or even attainable.
In addition to defining acceptable material, a sometimes more difficult decision must be made regarding what constitutes unacceptable, or RQL, material. Unacceptable material is that which is unlikely to perform as planned. It should have a low probability of being accepted or will be accepted only under the conditions of a reduced payment schedule.
Selecting the specification limit(s) must be done in concert with the quality measure and the AQL and RQL. For instance, if PWL is used, the AQL might be set at 90 PWL. This means that when the sample statistics estimate the population to have 90 percent of the product within specification limits, the product is completely acceptable. However, it is conceivable that comparable product could be defined at an AQL of 85 PWL with more restrictive specification limits, or many other possible combinations of AQL and specification limits. An example may help to clarify this situation.
Suppose that an agency has decided, based on a large amount of project data that it collected and analyzed, that a "typical" standard deviation (see chapter 5) for a lot defined in the acceptance plan for asphalt content for HMAC is 0.18 percent. How could this information be used to establish specification limits, and AQL and RQL values for asphalt content?
Decide on Quality Measure. The first issue to address is what measure of quality is to be used. Assume that PWL will be used as the quality measure. How can this measure be related to the definitions of AQL and RQL materials and to the specification limits?
Define AQL Material. Since asphalt content has a stipulated target value, i.e., the job mix formula (JMF) asphalt content, the agency may choose to define AQL material as a lot for which the average asphalt content is equal to the JMF target value and for which the standard deviation is equal to or less than the "typical" value of 0.18 percent. This defines AQL material in terms of the desired population mean and standard deviation, but the AQL definition must also be related to the required quality measure, which in this case is PWL.
Set Specification Limits. The specification limits and the AQL are related. For example, the agency might decide to set the AQL as a value of 90 PWL. This selection of 90 PWL for the AQL is arbitrary, but is a commonly used value. The AQL population defined from past projects in terms of mean and standard deviation should just meet this PWL definition for AQL. So, in this case, the specification limits would be set such that a population with a mean at the JMF and a standard deviation of 0.18 percent would have 90 percent of its area within the specification limits. These limits can be determined by finding the Z-value from a standard normal table that corresponds to an area of 0.90 within the mean ±Z standard deviations (i.e., m ± Zs).
Table 11 presents some typical ± Z regions within which selected areas of the normal distribution fall. From this table it is seen that 0.90 (or 90 percent) of the normal distribution falls within ±1.645 standard deviations from the population mean. Figure 17 shows a graphical representation of a population of this AQL material.
The specifications might therefore be set at the JMF asphalt content plus or minus 1.645 times the typical standard deviation value, or JMF ± 1.645(0.18 percent) = JMF ± 0.30 percent. In this case, the AQL is 90 PWL and the specification limits are JMF ± 0.30 percent.
Alternatively, using the same AQL population, the agency could decide to establish the AQL as 85 PWL. In this case, the specification limits would be set at the JMF plus or minus 1.439 times the typical standard deviation value (see table 11), or JMF ± 1.439(0.18 percent) = JMF ± 0.26 percent. The specification limits are different in this case because the definition for AQL in terms of PWL is different.
Define RQL Material. The RQL must now be defined. There is no single correct way to establish either the AQL or the RQL. In this case, once the AQL and specification limits are established, the RQL could be established in a number of ways. One way would be to decide that the material should be rejected once a "large" percentage of material is outside the specification limits. What constitutes a "large" percentage would then need to be decided. The agency could decide that the material is "bad" once half of it is outside of the specification limits. In this case, the RQL would be established as a PWL value of 50. Any lot with an estimated PWL value of 50 or less would then be required to be removed and replaced.
Alternately, the highway agency might base the definition of RQL on the analysis of past project data. For instance, the agency might decide that past projects had performed inadequately when the average asphalt content was 0.25 percent above or below the JMF target value. In this case, the agency might decide to set the RQL based upon the PWL value for a population that has a mean 0.25 percent above or below the JMF target and that has a standard deviation equal to the "typical" value of 0.18 percent. The PWL for the RQL population depends upon which specification limits, those based on AQL = 90 PWL or those based on AQL = 85 PWL, apply. Figure 18 illustrates the case of the RQL population when the specification limits are JMF ± 0.30 percent. In this case, it can be seen that the PWL for the RQL population corresponds to the area of the population that lies within the specification limits. In this case, the RQL might be defined as a lot with a PWL value of 60.
This approach to defining RQL material contains a number of simplifying assumptions. For example, it looked only at how far the population mean departed from the target value and did not consider the population standard deviation. This, in essence, assumes that the typical standard deviation value of 0.18 percent will be achieved on all projects. This approach also does not consider the interaction and effect of other quality characteristics, such as density, thickness, etc., on the performance of past projects.
Ideally, since the current trend is to write PRS for which the quality measure is related to expected performance in some known, quantitative way, the agency may wish to analyze past project data in terms of the chosen quality measure. For example, if the agency has chosen PWL as the statistical quality measure, it may be worthwhile to use any available data (which might be in-house data, or equivalent data from other agencies) to seek a relationship between PWL and expected performance. If sufficient data are available, this may be the most direct way to determine realistic values for both the AQL and the RQL. Various methods to develop suitable performance relationships are discussed in more detail later in this manual.
As noted previously, there is no single "correct" method for establishing the AQL and RQL values and the specification limits. Another example may help to further illustrate how these are all related and how they can be established.
An agency needs to establish an acceptance plan for the percent passing the 75 micron (mm) sieve for an aggregate base course. Experience from past projects indicates that base courses perform well if the amount of material passing the 75 mm sieve is 7 percent or less, and that they perform poorly if the amount of material passing the 75 mm sieve exceeds 10 percent. A typical standard deviation for this material has been found from analysis of past project data to be about 1.1 percent.
Decide on Quality Measure. From the past project information in the previous paragraph, the agency believes that the base will perform well as long as most of the material has less than 7 percent passing the 75 mm sieve. This indicates that a convenient quality measure is the percentage of the material with less than 7 percent passing the 75 mm sieve. This makes PWL a convenient and appropriate quality measure.
Define AQL Material. Based on the information in the preceding paragraph, to define the AQL it will be necessary for the agency to decide what PWL value corresponds to "most." While this is an arbitrary decision, the agency might choose to define "most" as 90 percent or more. Other choices are obviously possible, but 90 PWL is a common choice. Thus, the AQL is set at 90 PWL. This is a relatively conservative definition because, even if the standard deviation were considerably larger than the typical value, there is little chance that any of the material in the normal distribution representing AQL quality would reach the known critical value of 10 percent passing the 75 mm sieve. The diagram in figure 19 illustrates AQL material.
Define RQL Material. The PWL value for the RQL must also be determined. If the extreme upper tail of a normal distribution with a standard deviation of 1.1 percent is placed at the known critical value of 10 percent, then the mean of that distribution will be at approximately 10 percent - (3 x 1.1 percent) = 6.7 percent. The table of areas under the standard normal curve, table 8, can be used to determine that this corresponds to approximately 60 percent of the population below the known satisfactory value of 7 percent (or approximately 40 percent above the satisfactory value). On those occasions where the standard deviation was larger than the typical value of 1.1 percent, a relatively small portion of the distribution would extend above the critical value of 10 percent. As the amount of material with 7.0 percent or less passing the 75 mm sieve decreases below 60 percent, however, progressively more will exceed the critical value of 10 percent and serious performance problems might be expected to develop. Thus, the RQL is chosen as 60 PWL. The diagram in figure 20 illustrates RQL material.
Set Specification Limit. The upper specification limit, based on past project data and the definition of AQL, is 7.0 percent passing the 75 mm sieve.
Because of the severe consequences imposed when RQL work is detected, such as requiring removal and replacement at the contractor's expense, or the assignment of a minimum payment factor, it is important that the RQL be set at a sufficiently low level of quality that the agency could, if challenged, defend this decision. An additional reason for setting the RQL at a relatively low level of quality is to reduce the risk of mistakenly identifying an RQL condition due to the imprecision of the sampling and testing process. (See further discussion of this in chapter 7 on evaluating risks.)
For screening tests, the determination of the specification limits is usually simpler than that for payment determination. The acceptance plan for screening quality characteristics may have specification limits, acceptance limits, or both. The specification limits are the limiting values that yield the desired performance. The acceptance limits are the limiting values that permit acceptance of the product. Deciding on the AQL and RQL requires a determination of what is acceptable and unacceptable material.
The same discussion that applied to establishing AQL, RQL, and specification limits for quality characteristics to be used for payment determination applies to characteristics used for screening tests. However, the fact that the sample sizes are likely to be one, or at most two, makes the analyses less involved since PWL, AAD, and CI are not likely to be the quality measure. Because of this, the acceptance limits and specification limits are likely to be the same (another way of looking at this would be to say that there are only specification limits since there is no additional quality measure on which to base acceptance limits) since the measure of quality will usually be the test result or the average of two test results. In theory, an AAD or CI value could be calculated for a single test result so they could be thought of as potential quality measures for screening tests. However, when the sample size is one, calculating AAD or CI is the same process as comparing a single test to a set of specification limits.
With only a single test result on which to decide whether or not to incorporate the material into the project, there is no way to measure variability. Therefore, the screening test is really just a measure of the mean of the quality characteristic for the material being evaluated. The fact that a single test, or the average of two tests, is not a particularly good measure of the mean of a population indicates that screening tests by nature can have potentially high risks. While risks are mentioned briefly in the following examples, risks are addressed in detail in chapter 7.
A simple example will help to illustrate how specification limits might be set for a screening test.
Suppose that an agency decides to use slump as a screening test for PCC. Guidance for the determination of AQL and RQL for slump might be obtained from agency historical records or from published standards such as those from the ASTM or the American Concrete Institute (ACI).
For example, a slump range of 25 millimeters (mm) to 75 mm might be obtained from ACI 211, "Standard Practice for Selecting Proportions for Normal, Heavyweight, and Mass Concrete." Then, specification limits of ±12.5 mm for specified slump of 50 mm or less and ±25 mm for specified slumps of more than 50 mm through 100 mm might be obtained from ASTM C 94, "Standard Specification for Ready-Mixed Concrete." A slump of 75 mm might be specified, yielding a lower specification limit of 50 mm and an upper specification limit of 100 mm [i.e., 75 mm ± 25 mm].
Based on the specification limits from the above example, if the result of a screening test for slump fell within the range of 50 mm to 100 mm, the material would be considered acceptable for incorporation into the project. If the slump test were outside of this range, the material would be considered unacceptable for incorporation into the project. Questions regarding whether or not to allow addition of water, or other measures, to bring the concrete mix within the slump requirements would also need to be addressed as part of the technical aspect of the specification.
This shows how engineering decisions can be used to arrive at the specification limits for quality characteristics to be used as screening tests. However, while the process seems simple, the small sample size can lead to high risks of making the wrong decision. Suppose that the records of the agency indicate that 12.5 mm is a typical value for the standard deviation for slump. The following example shows how this information can be used to investigate the risks involved in the screening test.
If the average slump is desired to be 75 mm, and the typical standard deviation for slump is 12.5 mm, i.e., s = 12.5, and we assume that slump follows a normal distribution, then the desired (or AQL) population is as indicated in figure 21. The specification limits of 50 mm to 100 mm are also indicated in this figure. It is seen that some of the AQL population falls outside of the specification limits. It is therefore possible that the contractor could produce exactly the desired population, but still have a slump test fall outside of the specification limits, thereby leading to the rejection of the material even though it actually meets the AQL requirement. By calculating the Z-values and using the standard normal table in table 8 this possibility can be quantified as follows:and (17)
So, the probability of rejecting this population, i.e., the probability of a single test result being either less than 50 mm or greater than 100 mm, is 1.0 minus the area of the normal distribution that is between Z = -2.0 and Z = +2.0. From table 8, this probability can be calculated as 1.0000 - 0.9544 = 0.0456, or 4.56 percent. This is a risk to the contractor, and is represented by the shaded regions in figure 21.
Now, suppose that the agency decides, based on historical records, engineering calculations, or engineering judgment, that the material should definitely not be accepted for incorporation into the project if the mean slump for the population is 25 mm or more above the target slump of 75 mm. This defines RQL material. An example of an RQL population is shown in figure 22, along with the specification limits of 50 mm to 100 mm. As can be seen from the figure, fully half of the RQL population falls within the specification limits. Therefore, there is a 50 percent probability (this can be verified by calculating the Z-value and using table 8) that the RQL population will be accepted by virtue of the screening test for slump yielding a value within the specification limits. This is a risk to the agency, and is represented by the shaded regions in figure 22.
While this is a simple example, it nevertheless clearly illustrates the fact that any single screening test, due to the small sample size, has high risks of accepting rejectable material. It must also be noted that with a single test there is no measure of variability. The single-test screening process therefore is based on the implicit assumption that the standard deviation for the lot is no greater than the one used to derive the specification limits. This may provide a false sense of security that the specification requirements are being met. In reality, at best, screening tests are intended to keep highly nonconforming material from being incorporated into the project. They are neither really intended to nor able to identify material that is "marginally" nonconforming. They will not, therefore, be able to discern RQL material if the definition for RQL is not considerably different from that for AQL material.
Specific decisions must be made regarding how many test results will be used and how the acceptance/rejection procedures will be used. As has been mentioned several times, the more test results on which to base a decision, the more accurate the decision is likely to be. However, a primary purpose for a screening test is to allow a decision on the quality of the product to be made quickly. Thus, by their nature, screening tests will be few in number. This means, by default, that the risk level may not be as low as desired, but it must certainly still be within the realm of practicality.
By the nature of screening tests, the acceptance and rejection procedures will most likely be to take one test, use the material if the test result is within the allowable specification limits, and do not allow the use of the material if the test result is not within the specification limits. Due to the expense involved if acceptable material is incorrectly rejected, the agency could decide to establish a retest procedure that calls for acceptance if the test result is within certain tolerances, retest if it is outside these tolerances but within a wider set of "reject" tolerances, and rejection if the test result is outside the reject tolerances. If the material is retested without first reworking the material, then it must be stipulated whether the second test will replace or be combined with the first test.
Unless there is a reason to suspect that the first test result is an error, such as the test being run improperly, the first test result should not be discarded and replaced with the retest. This practice has been incorrectly used for many years. However, since the material is tested before it is incorporated into the project, there are occasions in which failing material can be reworked or altered and then retested. If this is feasible, the procedures for reworking and retesting must be detailed. When a product is reworked, it is normally considered to be a new population and, when retested, it is considered as if it had not previously been tested.
There are several decisions that must be made concerning payment relationships. These are extremely important. Experience of the authors has shown payment relationships in an acceptance plan to be the most important factor from a contractor's perspective. The contractor submits a bid with a certain expectation of the amount of payment for the product. Achieving this amount (or more) of payment is critical in maintaining a viable business. Therefore, maintaining close liaison with the task force that is developing the acceptance plan and keeping the industry apprised of payment factor decisions is imperative when establishing these procedures. Relating quality and performance to payment is the most desirable form of payment relationship because the relationship supports and defends the decision. This is true because negative payment adjustments are typically viewed with skepticism by the contracting industry. However, when the payment schedule can be shown to be related to quality and, preferably, to performance, it is viewed to be more credible than when it is established arbitrarily.
LCC analyses, which relate quality to performance, are being developed for some materials, and the use of this concept is encouraged. Performance-related payment relationships, such as LCC, require a model relating quality to performance. However, these models may not exist for all properties. Thus, payment relationships other than those that are performance-related are used. These other relationships may be exclusive of or integrated with performance-related payment relationships. These may include the use of incentives/ disincentives, minimum payment provisions, remove and replace provisions, and retest provisions. When used with a payment factor, the AQL should be set such that it yields an expected payment of 100 percent of the unit bid price. When the RQL is used with a payment factor, the agency must decide whether to require removal and replacement or the assignment of a minimum payment factor at the RQL.
The use of incentives for exceptional quality is becoming commonplace practice for many agencies and is viewed as an incentive for the contractor to improve quality. However, the use of incentives is not viewed positively by all agencies. Some think that the use of an incentive is paying extra for what is typical quality. For this reason, it is important to try to assure that the AQL is properly established such that the incentives are applied only for exceptional quality.
During the latter part of the 20th century the highway profession developed the idea of acceptance of construction work by payment adjustment. This approach foreshadowed the current trend toward performance-related specifications that use mathematical models to predict expected life that, in turn, is combined with LCC analysis to develop appropriate payment schedules.
The strongest argument for this approach is its practicality. While many statistical acceptance procedures used in the private sector tend to characterize a lot as either acceptable or unacceptable, such a sharp distinction is not considered appropriate for most highway construction items. Highway engineers felt more comfortable defining a high level of quality that is clearly acceptable (AQL) and another, substantially lower, level of quality that is clearly rejectable (RQL). In between, the work is not so defective that removal and replacement is required, but neither does it warrant full payment.
Another strong motivating factor was the gradual shift toward end-result specifications. Under the earlier, method-type specifications, agencies had to specify in precise detail how an item was to be constructed, had to devote considerable personnel to inspection activities to make sure the detailed instructions were followed, and, in spite of this, still found that they were often legally responsible if the finished product did not in some way measure up to the desired result. Conversely, with end-result specifications, the agency had the far simpler task of defining a measurable result, and the contractor was given considerable latitude to use its expertise to determine how best to accomplish that result. Besides being a simpler process, this method had the added advantage that it placed the bulk of the responsibility for producing a satisfactory product on the contractor. Since the finished product was evaluated in a quantitative way, this approach lent itself extremely well to the use of adjusted payment schedules to award payment in proportion to the degree to which the desired end result was achieved.
It was eventually realized that, if it made sense to reduce payment for substandard work, it would also make sense to offer some degree of monetary incentive for superior work that exceeds the AQL. Just as the justification for reducing payment for marginally defective work is based on the anticipated increase in future maintenance and repair costs, it was recognized that extra quality usually benefits agencies by reducing these same costs. Therefore, it is justifiable to pass some of these savings back to the conscientious contractor in the form of modest incentive payments in addition to the contract bid price. The incentive payment concept was initially supported by the FHWA as an experimental feature. (19) After several years of satisfactory experience, it was approved for general use and is now a standard feature in many highway construction specifications.
Historically, the payment adjustment approach has been used with a variety of statistical quality measures. In the early 1960's, analysis of data from the AASHO (now AASHTO) Road Test (1958-1960) demonstrated in a dramatic way just how variable most construction characteristics could be. (1) It was soon recognized that construction quality cannot adequately be described by a single point value, but is better characterized as a statistical distribution. It was found that, in a great many cases, the distributions were sufficiently normal that normal curve theory could be used both to describe the quality level desired and to assess the quality level actually achieved. This led many agencies to define the AQL and RQL in terms of PWL, or its counterpart, PD (the percent of the lot falling outside specification limits), both of which are believed to be indicators of performance. A few agencies have used other statistical measures, such as the mean or the AAD, and the CI has also been proposed as a statistical quality measure upon which payment schedules could be based.
Today, nearly all agencies use the payment-adjustment approach for at least some construction items, and an increasing number have begun to use some form of positive-incentive provision.
The primary purpose of a payment schedule is to provide sufficient incentive to produce the desired level of quality at the time of initial construction. Effective payment schedules encourage contractors to apply appropriate QC measures to assure that the finished product will equal or exceed the desired level of quality a high percentage of the time. The rationale of the agency is that the small additional cost of good QC practices expended in advance is a better bargain than being faced with the anticipated future costs of poor quality construction, which may lead to premature failure of pavements, excessive maintenance repairs, possibly unsafe driving conditions, etc.
A secondary purpose of the payment schedule is to recoup at least part of the anticipated future costs that are likely to occur when poor quality is received. For a variety of reasons, there will occasionally be times when QC measures are either absent or ineffective, leading to less-than-acceptable work. Provided the work is not too seriously deficient, it usually is both impractical and unnecessary to require removal and replacement, and the better solution in these cases is to accept the work at a reduced price. This is consistent with the legal principle of liquidated damages, a well-established means for recovering losses that are difficult to quantify precisely at the time the contract is executed.
37.2.1. Legal Considerations. In essence, an adjusted payment schedule serves the same purpose as a liquidated damages clause because its function is to state an agreed upon monetary remedy for a breach of contract (i.e., the failure to provide the level of quality specified) for a situation in which the monetary damages are not known precisely and can only be estimated.
It is quite clear that the magnitude of the payment reduction must be reasonably appropriate for the amount of damage actually suffered. This stresses the importance of developing the necessary quality-performance relationships that make it possible to estimate the effects of poor quality. However, this need not be interpreted to mean that the amount of damage must be estimated with great precision. With regard to liquidated damages, the U.S. Supreme Court, in Wise v. United States, 249 U.S. 361, 365 (1919), stated:
In simpler language, this states that two contracting parties may agree on the amount to be withheld in the event of noncompliance, and that the courts will uphold this agreement provided that the stipulated amount is reasonably appropriate for the damages actually suffered and there is no element of deception, either consciously or inadvertently. This decision seems to provide solid support for the payment adjustment concept.
Although the liquidated-damages concept has traditionally been applied to losses related to delay of completion, there is no apparent reason why this same rationale should not also apply to losses resulting from a failure to provide the specified level of quality. A logical extension of that argument is that it should also apply to monetary incentives awarded for superior quality. This acknowledgment that extra quality translates into additional value lends credibility to the payment-adjustment concept as a whole.
First, an attempt should be made to demystify the terminology. Particularly in the highway field, in which the idea of receiving extra payment for extra quality is still relatively new, there has been some reluctance to refer to this as a "bonus clause" for fear that this might imply that something is being given away. Those who have had to make a persuasive case for the use of these clauses to top-level administrators, legal counsel, or even legislative bodies, have done so on the basis that these extra payments are not a gift but, in fact, have been earned in return for extra attention paid to QC. The term "bonus" is firmly rooted in the engineering lexicon and, indeed, is frequently heard in the conversations of highway specification writers. It is proposed that either term, bonus or positive incentive, is acceptable, and that the more important consideration is the manner in which such a provision is actually applied.
37.3.1. Fairness Issue. So, whether it is referred to as a positive incentive or a bonus, there are several arguments that can be made in its favor. The first is the matter of fairness. An actual example from a highway agency may help to explain the fairness issue. Several years ago, the agency did not believe that it was necessary to include an incentive clause. This was not based on any type of analysis; it was just a universal opinion that was held at that time.
What changed the agency's thinking was the field trial of a new specification for PCC compressive strength. The agency had explained to the contractors and suppliers that, to be considered acceptable under this specification, at least 90 percent of the lot must have compressive strengths greater than the class design strength (i.e., the AQL can be expressed as PWL = 90). This was one of the earliest field trials, and payment adjustments were to be computed but not actually assessed. The total project consisted of about $2 million worth of PCC and, when all the results were in, a surprising thing had occurred. The QC had been very consistent, the project was almost exactly at the AQL of 90 PWL, but the average payment factor came out to be 97 percent. In other words, in return for supplying exactly the level of quality that the specification had defined as acceptable, the contractor would have received a payment reduction of about $60,000!
Needless to say, this was brought to the agency's attention by the construction industry, but that was not necessary. The agency was already scrambling to try to figure out what had gone wrong with an acceptance procedure that was not unlike many others being used around the country at that time. This time an analysis was conducted to see what had happened. The cause of the problem was immediately apparent and was first reported in 1980. (20)
The explanation is quite simple. The standard method for estimating PWL was known to be an unbiased statistical procedure, so that was not the problem. The problem was related to the fact that any statistical estimation procedure has some degree of variability associated with it, particularly at the smaller sample sizes conventionally used for highway construction specifications. In other words, while the average of a large quantity of estimates will be very close to the true population mean value (which happened to be the AQL of PWL = 90 in this particular case), the individual estimates will be both above and below it to varying degrees. Those that were below PWL = 90 all received some degree of payment reduction, but all those above 90 PWL were limited to the maximum payment factor of 100 percent. Obviously, this will cause the average payment factor to be biased downward and, in this particular field trial, it produced an unwarranted payment reduction of about $60,000.
Just as the explanation is simple, so is the solution. Even a relatively small positive incentive (or bonus) for work exceeding the specified AQL will correct this situation. In all cases, the OC curve should be constructed to confirm that the acceptance procedure is working as desired and, in particular, that the average payment factor at the AQL is 100 percent. The subject of OC curves is addressed in detail in chapter 7.
37.3.2. Effect on Quality. The introduction of a new acceptance procedure with a payment adjustment clause fully in effect tends to produce significant increases in the level of quality received. While there have been no controlled studies to attempt to separate the effect of a bonus provision from the effect of the payment schedule as a whole, it seems quite likely that a bonus provision can only serve to enhance the motivation to produce good work.
37.3.3. Construction Industry Relations. Virtually every agency that has introduced statistical acceptance procedures with adjusted payment schedules has had to overcome considerable resistance from the construction industry. Although much of the resistance can be attributed to a general fear of the unknown, there is no denying that specifications of this type can impact severely on a contractor's means of livelihood. It is not difficult to understand why there might be strong resistance to a system that is perceived to be only punitive, that only penalizes for poor performance but does not reward excellent performance. The most effective way to counter this perception is to make it possible to earn tangible rewards for superior performance. The inclusion of positive incentives (or bonuses) casts the whole system in a new light and is especially appealing to those contractors who have made QC an integral part of their operations.
37.3.4. Economic Benefits. From an economic standpoint, a bonus provision can be claimed to be beneficial if the added value of increased quality produced by the bonus provision exceeds the amount of money paid out in actual bonuses. More precisely, the added value only has to exceed the amount by which the total contract cost was increased.
Although it is nearly impossible to find suitable projects to serve as test cases and controls to quantify the effect of bonus provisions, it is easy to show (see appendix I) that only modest increases in expected pavement life would be necessary to justify the use of bonus clauses.
However, a word of caution is in order regarding the magnitude of bonus payments. Because there may be other modes of failure besides those involving the quality characteristic for which the bonus is being paid, many agencies have either identified a maximum amount of bonus, or else have made the bonus payment conditional upon satisfactory completion of other parts of the contract, or both.
37.3.5. Effect on Bidding Process. There appears to be growing evidence that there is yet another benefit of bonus provisions that was not initially anticipated or recognized. A construction firm that has made QC an integral part of its operation may be able to submit a lower bid because it is confident that it can earn a substantial bonus. If this effect is found to be typical, the use of bonus provisions will be an effective way for agencies, which are bound by the competitive bidding system, to put more work in the hands of highly qualified contractors.
The earliest payment schedules were usually stepped schedules, such as that shown in table 12 and plotted in figure 23.
|Estimated PWL||Payment Factor, %|
|95.0 - 100.0||102|
|85.0 - 94.9||100|
|50.0 - 84.9||90|
|0.0 - 49.9||70|
More recently, there has been a tendency to use continuous (equation-type) payment schedules such as that shown in the equation 18 below and also plotted in figure 20.
Although risk analysis (see chapter 7) would show these two payment schedules to have very nearly the same long-term performance, there is a distinct advantage associated with the continuous form. When the true quality level of the work happens to lie close to a boundary in a stepped payment schedule, the quality estimate obtained from the sample may fall on either side of the boundary due primarily to chance. Depending upon which side of the boundary the estimate falls, there may be a substantial difference in payment level, which may lead to disputes over measurement precision, round-off rules, and so forth. This potential problem can be completely avoided with continuous payment schedules that provide a smooth progression of payment as the quality measure varies.
Ordinarily, a pavement is designed to sustain a specified number of load applications before major repair (such as resurfacing) is required. If, due to construction deficiencies, the pavement is not capable of withstanding the design loading, it will fail prematurely. The necessity of repairing this pavement at an earlier date results in an additional expense that, since it usually occurs long after any contractual obligations have expired, must be borne by the agency. Therefore, one possible purpose for an adjusted payment schedule might be to withhold sufficient payment at the time of construction to cover the extra cost anticipated in the future as the result of deficient quality work.
Pavements are usually designed to withstand a required number of equivalent single axle loads (ESALs). For those quality characteristics used in the design procedure, the as-built values can be compared to the design values to estimate the fraction of design loadings the pavement is capable of sustaining. As an approximate estimate, this fraction can be multiplied by the design life to obtain the expected life of the pavement. If greater precision is desired, a traffic growth rate can be assumed, the effect of which is to extend slightly the expected life (since fewer of the allowable loads will occur in the early part of a pavement's life).
To estimate the cost to the agency of premature pavement failure, it is necessary to determine the net present value of the various actions made necessary by early failure. Unlike various intuitive methods that often require unrealistic assumptions, the LCC basis presented in this section assumes a practical repair strategy that an agency may employ.
For example, suppose that experience has shown that overlays typically last about 10 years. If the initial resurfacing were to fail one or two years prematurely, it is not likely that an agency would do a minor repair to extend the life of the pavement to the originally expected value of 10 years. A much more practical decision would be to reschedule the overlay that was planned for the 10th year and do it one or two years sooner. However, if the 10th year overlay is rescheduled to an earlier date, and overlays typically last 10 years, then all future overlays must as well be moved earlier in time.
Because it provides a valid and rational way to estimate the net present value of future actions such as these, LCC analysis makes it possible to obtain a realistic estimate of the cost of the actions resulting from premature failure. The procedure involves the calculation of a series of debits and credits and turns out to be relatively easy. Moving the 10thyear overlay to the 8th year, for example, would result in a debit in net present value terms because it represents a cost in the 8th year that was not planned. However, there will also be a credit for no longer having to do an overlay in the 10th year. Since the 10th year overlay is farther in the future, the credit for this action is discounted to a greater degree, resulting in a net debit for the rescheduling of the 10th year overlay. While it is true that the net debits from the rescheduling of overlays farther in the future are discounted to a greater extent, and soon become insignificant, ignoring them altogether would substantially underestimate the true cost of pavement failure. Alternatively, selecting a specific analysis period would require an assumption about the residual value of a partially depleted overlay, information that is not readily available.
Fortunately, this is an easy problem to solve mathematically and the derivation is given in appendix I. This produces the expression given in equation 19, which requires input information that is readily available or can easily be obtained.
Suppose that, based on an appropriate performance relationship, the as-constructed resurfacing, for which an appropriate payment adjustment is to be determined, is expected to last E = 8 years instead of the design value of D = 10 years. The cost of this premature failure would be estimated by first computing R = 1.04 / 1.08 = 0.963 and then applying equation 18, as follows:
In the same way, the appropriate payment adjustments for other values of expected life can be calculated as summarized in table 13.
|Expected Life, years||Appropriate Payment Adjustment|
It is seen from table 13 that the cost of premature failure can be substantial, terminating at the initial cost of resurfacing of $23.92 per square meter (m2) for zero expected life. It is common practice for most agencies to make use of an RQL provision that gives them the option to require removal and replacement at no additional expense when the quality falls below some predetermined, seriously deficient level. Such a provision would probably apply before the lower portion of this table is reached, but if for some reason an agency elected not to require removal and replacement, this method provides the levels of payment adjustment that would be justified for extremely poor quality.
It can also be seen from this table that the method properly recognizes that a tangible benefit results when the as-constructed quality exceeds the design standard and extends the expected life of the pavement, thus justifying incentive payments for superior quality.
To apply the LCC basis for payment schedules in a manner that is both fair to all parties and legally defensible, it is necessary to have at least an approximate performance relationship. The purpose of the performance relationship is to predict from quality characteristics measured at the jobsite what the expected service life of the construction item will be. This is the independent variable to be entered into the LCC equation presented in the previous section.
Research is currently in progress to develop mathematical models upon which valid performance-related specifications can be based, and some prototype models are in the development stage. However, it may still require several years before these models are available for widespread implementation. This does not mean that the development of useful and effective payment schedules must be put on hold until these models become available. In many cases, current knowledge, combined with engineering and mathematical principles, may be sufficient to develop interim performance models that can be demonstrated to be serviceable.
37.6.1. Polynomial Model. Prototype models can be developed by identifying a sufficient number of "known" points that are then used to develop generalized mathematical models. Appropriate assumptions must be made concerning mathematical form, and engineering considerations are used to establish realistic boundary conditions. The procedure is presented in sufficient detail in appendix J to allow the reader the opportunity to use different known values or assumptions to develop models that are suited for specific applications.
For example, table 14 illustrates a typical performance matrix that might be used to develop the following type of polynomial performance model for two quality characteristics.
|Air Voids Quality||PD = 10||PD = 90|
|PD = 10||20 yrs.||10 yrs.|
|PD = 75||10 yrs.||5 yrs.|
The method involves using the four pieces of data in table 14 to write four simultaneous equations that can be solved to obtain the four unknown equation coefficients, producing the performance model given in equation 21 below. (The complete development is contained in appendix J.) Values of EXPLIF for selected combinations of PDVOIDS and PDTHICK have been calculated and are presented in table 15.
|10 (AQL)||10 (AQL)||20.0 (Design)|
It can be seen from table 15 that when both quality measures are at the AQL value of PD = 10, the expected life equals the design life of 20 years. For excellent quality with PDVOIDS = PDTHICK = 0, the EXPLIF equation predicts that the pavement life will be extended to almost 23 years, while, for extremely poor quality with PDVOIDS = PDTHICK = 100, the expected life is reduced to less than 3 years. For any combination of PDVOIDS and PDTHICK in between, the EXPLIF equation gives an appropriate estimate for expected life.
The accuracy of this equation is obviously dependent upon the use of realistic values for expected life in the performance matrix in table 14. The values used in the example in this table have been estimated by the New Jersey Department of Transportation (NJDOT), which believes that the resultant values for expected life predicted by the performance model are realistic. This model has been used as the basis for the composite quality measure (described in appendix K) used by the NJDOT in their current acceptance procedures for HMAC pavement (see case study 2 in chapter 9).
37.6.2. Exponential Model. Although the above polynomial procedure is easy to apply and can produce a very serviceable model when only two quality characteristics are involved, another approach is more effective when two or more quality measures must be included in the acceptance procedure. A detailed presentation of this method is contained in appendix L and involves the use of the exponential model in equation 22.
It is explained in Appendix L why PD is better suited than PWL as the statistical quality measure for this particular model. However, PWL can be used with this model simply by substituting (100 - PWL) for each of the PD terms, if desired.
This model has certain important advantages. It tends to produce a sigmoidal ("S") shape that is believed to be an appropriate form for many performance relationships. Also, because this particular model form produces a maximum of "A" and a minimum as close to zero as desired (but not below zero), it can easily be made to fit most real-world situations. Finally, it requires relatively straightforward data and simple mathematics to accommodate as many acceptance characteristics as are likely to be necessary. (A detailed example involving in-place air voids, thickness, and smoothness of HMAC pavement is presented in appendix L.)
If the method is to be valid, it must be based on realistic data, and if it is to be practical, the required data must be readily obtainable. Table 16 is a generic data matrix that must be completed to develop the exponential model.
For example, consider a resurfacing project for which historical data have shown the typical expected life to be about 10 years. A typical value for the AQL is PD = 10, while RQL values tend to vary more widely, depending on what quality level the agency believes justifies removal and replacement at the contractor's expense. For purposes of this example, suppose the agency has decided to use RQL values of PD = 65, 75, and 85, respectively, because it is believed that these levels correspond to approximately a 50 percent loss of pavement life, or an expected life of 5 years. These assumptions lead to the completed data matrix shown in table 17.
|PDVOIDS||PDTHICK||PDSMOOTH||EXPLIF, in years|
|10 (AQL)||10 (AQL)||10 (AQL)||10 (AQL)|
|65 (RQL)||10 (AQL)||10 (AQL)||5 (poor voids)|
|10 (AQL)||75 (RQL)||10 (AQL)||5 (poor thickness)|
|10 (AQL)||10 (AQL)||85 (RQL)||5 (poor smoothness)|
The ease of applying this method should now be apparent. All that remains is to use the information in the data matrix to solve for the unknown coefficients in the exponential performance equation for EXPLIF. To accomplish this, it is first necessary to take logarithms of both sides, producing equation 23:
The complete set of equations is presented in Appendix L along with the solution, leading to the following performance model:
A lengthy series of tests is described in Appendix L to confirm that the model will produce realistic values of expected life for any combination of quality levels in the three measures. Once these checks have been completed, the model can be relied upon to produce the value of EXPLIF needed for the LCC equation, which is equation 19.
Most specifications will contain multiple quality characteristics. How can these be combined to come up with a single payment factor for a lot? The ideal situation is to have a performance model that can predict long-term performance of the payment. In such a case, the quality characteristic measurements can be input to the model and the payment factor can be based on the predicted performance of the in-place pavement as compared to the desired performance. Unfortunately, such models are either not available or are not widely accepted at this time. Therefore, other methods for determining the composite payment factor are currently in use.
Various agencies have considered at least four different approaches for combining a number of payment factors for individual acceptance quality characteristics into a single composite payment factor. These approaches include:
The approach using the minimum individual payment factor for the composite is based on the "weak link" theory, i.e., the lowest payment factor indicates the value of all the quality characteristics. For the other three approaches the concept is that all individual factors contribute to the total. However, the composite payment for the three approaches can be quite different depending on the value of the individual payment factors.
An example will help to show how composite payment factors can be determined from individual payment factors. Suppose that for a PCC pavement, the quality characteristics are compressive strength, permeability, and thickness. The composite payment factor determined by various methods can be quite different depending on the magnitude of each individual payment factor. Table 18 shows the composite payment factors for various methods for combining the individual payment factors.
|Individual Payment Factor||Composite Payment Factor|
The averaging, multiplying, and summing methods for combining individual payment factors implicitly assume that each individual payment property is equally important. Several agencies have chosen to weight the payment factors with the concept that some quality characteristics are more important than others. For HMAC, when mixture properties and field compaction are used as quality characteristics, the in-place air voids are often weighted more heavily than the mixture properties.
For example, a weighting system that is used by one agency is as follows:
Another agency uses a different weighting system for the same HMAC properties:
Weighted average composite payment factors such as these are intuitively appealing since it is very likely that all payment quality characteristics do not have the identical impact on pavement performance. A drawback to this approach, however, is that there is no obvious methodology for determining the appropriate weightings. The weightings, therefore, are subjective in nature and, as the above equations show, will hence vary from agency-to-agency depending upon agency or individual preferences.
As noted in the previous section, statistical construction specifications based on multiple quality characteristics frequently use payment equations that include a separate term for each of the quality characteristics so that the resultant payment adjustment is a function of the combined effect of all quality measures. An alternate method to accomplish the same purpose is to base the payment equation on a single quality measure that is a composite of the individual quality measures. This latter approach, because it keys the various decisionmaking steps to a single performance indicator, simplifies the procedure and offers several practical advantages.
For example, the performance relationship that was presented previously for expected pavement life can be used to develop a single, composite quality measure upon which all of the various acceptance decisions (accept, reject, retest, payment adjustment, etc.) can be based. Equation 21 for expected pavement life, which was developed in a previous section above, is as follows:
As described in detail in appendix K, a simple transformation converts this equation into the composite quality measure given by equation 27:
PD* progresses smoothly from zero to 100 percent as the individual quality measures, PDVOIDS and PDTHICK, vary throughout the same range. Table 19 presents a few selected examples of this. More extensive tables and graphs are contained in appendix K.
It can be seen in table 19 that the case in which PDVOIDS and PDTHICK are both equal to 50 produces essentially the same level of expected life as the case in which PDVOIDS = 25 and PDTHICK = 75. This result flows directly from the manner in which the EXPLIF equation was derived and is realistic because an increase in the quality of one measure might be expected to offset a decrease of quality in the other. Appropriately, both cases produce virtually the same value of PD* in the last column of the table, indicating that PD* is well-suited as a measure upon which a QA specification can be based.
This property of the composite quality measure, which properly accounts for the combined effect of multiple quality characteristics, also makes it possible to develop an RQL provision that is far superior to the alternative of defining separate RQL provisions for the individual quality measures. For the example in table 20 it is assumed that the agency has defined for air voids and thickness separate RQL provisions of PDVOIDS > 75 and PDTHICK > 90. Clearly, case 3 in table 20 is by far the worst case, yet it is not recognized as an RQL condition when using individual RQL provisions, while the other two cases are.
|1||PD = 75 (RQL)||PD = 0 (Excellent)||Yes||60.5|
|2||PD = 0 (Excellent)||PD = 90 (RQL)||Yes||60.2|
|3||PD = 74 (Almost RQL)||PD = 89 (Almost RQL)||No||87.9|
To demonstrate the effectiveness of an RQL provision based on the composite quality measure, equation 27 was used to compute the corresponding values for PD* that appear in the last column of table 20. In this example, a PD* value of 60, or more, would be regarded as rejectable and, as can be seen in the last column, case 3 is properly recognized as being well into the rejectable region. The actual development of RQL provisions based on multiple quality measures is covered in greater detail in appendix K.
There are several reasons why an agency might choose to perform a second set of tests before making the final decision concerning the acceptability or rejectability of a construction item:
If a retest provision is to be used, it must be spelled out clearly in the contract documents, including precisely how the retest results are to be processed. There are two distinctly different ways to do this:
Advocates of the first method argue that it makes maximum use of the available information and that it is wasteful to discard any valid information. Advocates of the second method would question whether the original sample was truly valid. If the low quality level were the result of some malfunction of the testing process, then it would be more appropriate to disregard the questionable data.
Each agency must decide for itself which method is more appropriate in any particular situation, depending on the quality characteristic that is being measured and the test method that is being used. If the decision is made to combine the retest results with those obtained from the initial sample, caution must be exercised in computing the OC curve for this procedure since the probabilities of failing the original test and passing the retest are not statistically independent, but are correlated to some degree.
Most construction and materials acceptance plans are what can be called "single sampling" plans. That is, a single sample of a stipulated size is taken and a decision is made based on the test results from this sample. The preceding discussion on retesting indicated some reasons that additional sampling and testing might be conducted. Sampling plans based on more than one level of sampling and testing have been used in manufacturing applications for many years. They are typically based on "accept or reject" attributes acceptance plans rather than the payment adjustment variables acceptance plans that are predominate in highway construction.
37.10.1. What Are They? In "double sampling" plans it is possible that a decision could be made after a smaller first sample is taken and tested, but the decision may be deferred until a second sample is obtained and tested. For example, such a plan might be phrased as follows:
To further reduce the amount of sampling and testing done, it is customary to curtail testing when the rejection number of defects is attained. For the above acceptance plan for example, if the first sample had 2 defective items then a second sample would be obtained. If 2 additional defective items were obtained after testing 3 items from the second sample, then testing would cease and the lot would be rejected because the reject number of 4 had been reached.
The concept of double sampling can be extended to "multiple sampling," which represents the case when three or more samples might be obtained before a final decision on the lot is obtained. Multiple sampling acceptance plans may allow for as many as seven samples before a final decision must be made.
"Sequential sampling" is generally used to represent the case where a decision is possible after each item is tested and there are no specified limits on the total number of items that will be tested. Sequential sampling acceptance plans have been used in manufacturing operations primarily when a destructive testing method is required. The underlying purpose behind sequential sampling acceptance plans is to keep to a minimum the number of items that must be destructively tested and still be consistent with the risk levels that are desired. Their use in highway construction may be questionable given the amount of time that it often takes to obtain a test result once a sample has been obtained.
37.10.2. Applications in Highway Construction. While various double and multiple sampling plans have been developed for manufacturing operations, there are some limitations to their use for highway construction and materials acceptance plans. In manufacturing operations the product that is being produced is generally easy to sample and test, and usually does not change properties from the first sample to the second sample taken from the lot. For example, if a sample of 10 bolts is obtained and tested from a lot of 300 bolts, there is no reason to believe that the lot will have changed in any way prior to obtaining the second required sample of 10 bolts. This may not be true for many highway construction materials.
The use of double or multiple sampling may be limited in highway construction due to the nature of the way in which samples are obtained and tested. For highway construction it is quite possible, even likely, that the population to be sampled could change between the first and second sample. These potential changes in the population must also be considered if a retest provision is selected for use in the acceptance plan.
For example, if HMAC samples for asphalt content determination are obtained from the back of the truck, production for the lot may be completed by the time the test results from the first sample are available. It would therefore not be possible to obtain a second, or "double," sample from trucks on this lot. It would also probably not be appropriate to take cores from the pavement to represent the second sample since there is no reason to believe that cored samples will have the same sampling and testing variabilities as sampling and testing from the truck. Remember that chapter 5 stressed the importance of developing specification limits based on the appropriate process variability. Specification limits based on sampling and testing from the truck would not be appropriate for use with sampling and testing of cores.
Similarly, if acceptance were based on 28-day compressive strengths of cylinders cast when the concrete was placed, the only way to obtain a "double" sample would be to cast all of the cylinders for both samples when the concrete was initially placed. It is questionable whether a "second" sample could be obtained by coring and testing the cores since these are such different processes.
There may be cases where it would be possible to develop double sampling acceptance plans, but these must be carefully scrutinized to ensure that the two samples are indeed from the same population. For example, suppose that an HMAC overlay is cored for in-place air voids or density determination, and is then opened to traffic. If the first sample results were not in the "accept" range, would a second set of cores obtained a week later still represent the same population, or would compaction under traffic loading make this a different population? It is likely that the specification limits for this characteristic would have been developed based on the results from cores that had not been subjected to traffic loading.
The double, multiple, and sequential acceptance plans that have been used in manufacturing have been primarily, if not exclusively, for accept or reject decisions. They have not been applied in a situation, such as highway construction and materials, where there may also be an option to accept the lot at an adjusted payment. While the risks and expected payments can conceivably be developed for these more complex procedures, it would be quite a bit more complicated than the case for single sampling plans.
The complications with using more than a single sample that are discussed in this section may be among the reasons that agencies have almost exclusively used single sample acceptance plans for highway construction and materials. While double, multiple, or sequential sampling acceptance plans could possibly be considered for highway construction materials, the procedures must be developed with caution, and some serious degree of reservation due to the factors discussed in the preceding paragraphs.
The typical approach to construction or material that is deficient in quality has been to require it to be removed and replaced if the quality is too deficient, or to accept it at a reduced payment if the deficiency is such that it would not be economically justifiable to require removal and replacement. In some instances, correction or repair of the deficiency might be an additional option.
For example, if coring indicates a deficiency in thickness for an HMAC overlay, then it may be possible to increase the thickness with an additional overlay. Similarly, a pavement with sections that are deficient in smoothness may be corrected by selective grinding. If correction and repair are to be considered as options, then the agency needs to address how to incorporate this into the acceptance procedures, and whether or not it will have any impact on the way in which samples are obtained and their results are analyzed.
For example, one approach that some agencies have used with respect to pavement thickness is to further investigate any coring location for which the thickness is deficient to a certain extent or greater. In the event of a core that is deficient in thickness, some specifications call for additional coring, moving outward from the original core location, to determine the extent of the deficient thickness. The deficient section can then either be corrected or removed and replaced. Similarly, the extent of smoothness deficiencies could be required to be identified and corrected.
If a correct or repair strategy is employed, the agency must be careful regarding how this information is treated with respect to other test results. For example, in the event that a deficient core thickness is identified and then removed and replaced, that core should not be included with other cores from the sample when calculating averages, standard deviations, or PWL or PD values. In fact, if the intent of the coring is to identify areas of deficient thickness, then a random sampling procedure to estimate population parameters may not be the most appropriate sampling approach. A systematic sampling scheme, such as taking a core every 300 meters or 1,000 feet, might be a better way to look for regions that are deficient in thickness. If this is done, however, it must be realized that the data obtained in this systematic manner would not be appropriate to use for estimating population parameters such as mean and standard deviation.
A lot is the amount of material that is to be judged acceptable or unacceptable on the basis of a sample comprised of a stated number of test results. The determination of lot size is primarily an economic decision. Since each lot is tested, for very small lots the cost of testing may exceed the benefits. Very large lots may allow acceptance of a large amount of less than desirable quality or severely negatively impact the price received by the contractor. However, with the increase in the contractor doing both QC and acceptance testing, the likelihood of this occurring is diminished. This is because the contractor most likely will monitor the payment factor concurrently with the quality measure to assure that less than desirable quality, and consequently low payment factors, does not occur. The consequence of this is that some agencies are starting to use relatively large lot sizes. For example, the California DOT and the FHWA Federal Lands Highway Division each use a project or an entire item of construction as a lot.
Lots can be established based on time, on quantity (e.g., tonnage, area, or volume), or on an entire construction/material item for the total project. Each choice has advantages and disadvantages. It is important for the agency to recognize the predominant contractor operation and use that knowledge to select the best definition for a lot.
As an example, a day's production is one choice for a lot size. The advantage is that the operation goes through a complete cycle of start-up, run, and shut down. The disadvantage is that the quantity of material included in each lot may vary considerably from lot-to-lot because of production interruptions caused by inclement weather, equipment breakdowns, etc.
Another example for a lot size might be a specified tonnage such as 1800 megagrams (Mg). This has the advantage of a consistent amount of material in each lot, while the disadvantage is that each lot may have a different number of production start-up, run, and shut down cycles. Ideally, a lot should represent a single population. That is, the material in the lot should have all been produced from essentially the same process under essentially the same production conditions. This is less likely to occur if materials from several production days are incorporated into one lot for acceptance and payment determination. Combining material from more than one production day increases the chances that materials from more than one population will be combined into a joint population for the lot. This will tend to increase the variability associated with the combined population.
Typically, a lot is subdivided into equally sized sublots. This procedure promotes the use of stratified random sampling plans. Stratified sampling is used to ensure that the specimens for the sample are obtained from throughout the lot, and are not concentrated in one portion or section of the lot. Figure 24 illustrates the basic principle of stratified random sampling. The large rectangle represents a lot,perhaps one day's paving from which cores are to be obtained. Using a random selection process, it is possible (but not necessarily likely) that all of the cores could be selected within the first half of the lot. To avoid this possibility, the lot can be stratified into a number of sublots equal to the sample size to be selected from the lot. One core is then randomly selected from within each sublot. This ensures that each portion of the lot has the same chance of being selected while, at the same time, ensuring that the sample is spread out over the entire lot.
The sample size is the number of test results used to judge the quality of a lot and, thus, is directly related to the lot size. One of the reasons to use larger lot sizes is the potential resultant increase in sample size. This tends to provide a lower level of risks, other considerations remaining equal. However, as noted above, a major assumption that is required is that all of the material and/or construction processes remain consistent throughout the total lot. This may be more difficult to achieve if the lot spans several production days, and obviously even more difficult to achieve if the entire project is used as the lot size.
From the above discussions it is obvious that there is a definite relationship between the lot size and sample size selections. Small lot sizes are not compatible with large sample sizes due to the large amount of testing that will be required. This can only work if the tests that will be used are nondestructive and can be completed quickly. This might be the case, for example, with the use of nuclear gauges for estimating density of HMAC pavements. Larger sample sizes can be used with large lot sizes to decrease risks of making incorrect decisions. However, the likelihood of combining materials from possibly different populations must be taken into consideration. For this to work, the variability data used to establish the AQL, RQL, and specification limits must also have been obtained from similar large lots.
In the prior discussions regarding establishing the appropriate process variability (see chapter 5), the importance was stressed of determining a process standard deviation that is consistent with the way in which a lot will be defined for acceptance. However, in practice the decision regarding lot size often cannot be determined with certainty early in the data collection process. It may be possible to determine whether or not the total project will be a lot, or whether acceptance will be on a lot-by-lot basis. In reality, the final decision regarding the sample size per lot cannot be made until an evaluation of the risks has been completed (see chapter 7).
If a major change in the definition of the lot, such as changing from lot-to-lot acceptance to project acceptance, is made after the typical process standard deviation has been calculated, then it will be necessary to re-evaluate the data to determine if a new typical process standard deviation should be used. If a revised standard deviation is selected, then it may be necessary to see if this will affect prior decisions regarding the AQL, RQL, and specification limits.
The literature review suggested earlier in the manual should reveal the practices of other agencies. A 1998 TRB paper, entitled "Summary of Current QC/QA Practices for Hot-Mix Asphalt Concrete," contains a survey of typical lot sizes, sample sizes, sample locations, etc., for HMAC. (21) Agencies can also be contacted directly to ascertain their experiences with the development of QA specifications.
Typical lot and sample sizes and sampling locations for payment determination are listed below for HMAC and PCC. The HMAC values are from an NCHRP project. (21) For screening tests, the tests are most often performed on a unit basis, similar to that used in attributes acceptance plans. The result is that acceptance is usually by the truckload or some other small discrete quantity with an accompanying small sample size.
For HMAC material properties: (21)
For HMAC roadway compaction: (21)
For paving PCC:
For structural PCC:
The discussion above is related to the initial establishment of lot sizes and sample sizes. These decisions are very likely to be economic ones, based on the resources available. These initial selections for lot size and sample sizes may need to be modified based on the analyses of risks that will be conducted next. The development of OC curves and their corresponding risk analyses are discussed in detail in the next chapter.