|This report is an archived publication and may contain dated technical, contact, and link information|
Publication Number: FHWA-HRT-12-027
Date: May 2012
Task B of the contract required the research team to conduct a critical review of a recently published paper in TRR 1813 entitled "Procedure for Monitoring and Improving Effectiveness of Quality Assurance Specifications" and to present the findings including conclusions and recommendations.  The assessment is contained in this appendix.
Task B.1 included a critical review of “Procedure for Monitoring and Improving Effectiveness of Quality Assurance Specifications” TRR 1813. 
The definition of effectiveness proposed by the Florida study is "Quality Desired = Quality Specified = Quality Received."(p. 164) 
Under QA specifications, the desired quality level is typically presented in statistical terms. Often, the quality level may be measured, but it may not be what an agency wants. As an example, FDOT measures the average density. Does it really want at least 98 percent on average? Moreover, since variability is not measured, what influence does variability have on what the State wants?
Only as the quality level relates to the PF can it be assumed to be what the agency desires. Few States know or specify the AQL. The AASHTO Quality Assurance Guide Specification and the AASHTO Quality Assurance Implementation Guide contain a suggested AQL for PWL of 90 percent for PCC and HMA.[6,7] Few States specifically identify the AQL when applying the PWL/PD approach.
The ratios of quality wanted equals quality specified equals quality delivered do not provide absolute values. There must be allowable deviations (or gray areas) that still meet a definition of "effective." A measure of the acceptable ratios that define "effectiveness" are not determined in the paper.
It is generally agreed that specifying only the average will not produce an effective specification. The paper in the analysis finds this to be true for various reasons. It is understood that the paper had to proceed with the analysis to test the effectiveness procedure. Moreover, the procedure passed this test.
It appears the test method for measuring asphalt content changed during the analysis period. Most likely the testing variability of the different test methods are different. How did this affect the analysis? It is assumed the tolerances for asphalt content remained the same. If so, the contractors using the least variable test method would have an advantage over contractors using the more variable method in their ability to meet the specification. Does the use of two methods with different variability measuring the same quality characteristic affect what FDOT (or other States) want? Not unless the specifications clearly state the acceptance limits associated with each test method.
This section of the paper makes several critical points. Namely, the importance of the State included in the study having sufficient information or data that will yield answers to the questions concerning the following:
As stated previously, it was intended that the study would be conducted utilizing a data-based exercise to determine the levels of quality the agency has typically received, but this was not possible because the team was unable to obtain comprehensive data from the agencies involved in the study. As a result, the scope of the project was modified to evaluate the agencies’ PCC and HMA pavement QA specifications. Since the extent of this information was not available, even for the specification reviews, it was necessary to statistically estimate some parameters.
The third question, "what is delivered?," can best be addressed with the State’s computerized database designed to monitor the specifications. The database should contain individual materials and construction quality acceptance test results for each acceptance quality characteristic. It should tie each test result to a lot or sublot and to a mix design. This weakness in the FDOT density data that contained only the lot average is recognized. Although this allowed the determination of whether the specification was met and what the PF was, it did not allow further analysis of the variability of the lots.
The specifics of the procedure must depend on the specifications, the database, and the type and amount of needed information that is available to answer the three key questions. Moreover, it must be sufficiently flexible to handle different types of quality measures. This was done using the average and AAD. However, with the numerous combinations of quality measures contained in agency acceptance plans, it may not be feasible to address all the combinations. The point made in the paper, "They simply require a good conceptual understanding of specification effectiveness along with the sound application of some basic principles of statistics" (p. 167) is well taken and was paramount in the research team’s approach. 
What is Wanted?
Does what an agency want equal what it has? Most specifications have been developed around what agencies think they have been getting. How well this is known comes into question. In addition, if the quality measure changes with the specification development, how well has the quality measure translation from the old quality measure (e.g., average only or single sample) to the new one (e.g., PWL or AAD)?
Having a database that contains individual values from which averages, variances, and offsets from target can be determined makes answering the question "What do we have?" simpler, and allows the question, "What do we want?" to be answered. Input from an agency may be necessary to tie these two questions together. It should not be assumed the questions are equivalent. It is possible "what we have" has not been performing well, and "what we want" should be better. This decision can be provided by only the agency.
The statistical parameters describing the population of what is wanted are also discussed. This raises the following critical question: how many States have defined "What do we want as a population?" It can be surmised that many States that still use only an average or single samples for acceptance of one or more quality characteristics have not done this. What is typically missing for these States is the understanding of variability and data needed to measure it.
The survey of FDOT materials and construction engineers could not consistently respond to the question "What do we want?." This result should not be surprising; most States would likely be as uninformed. However, the question is basic and has not received the attention it deserves. One of the goals of the present study is to raise the awareness of this question so States will be inclined to ask and answer it more frequently.
"Procedure for Monitoring and Improving Effectiveness of Quality Assurance Specifications" makes a good point about the timeframe to which the conclusions apply. Ideally, the database should extend over a period of several years so the conclusions are not a "1-year wonder." How consistent the quality wanted equals quality specified equals quality delivered over the years of analysis should also be an input to the effectiveness of the specification. How consistent are the ratios?
What is Specified?
At this point, the need for a definition of AQL must be considered. In teaching QA over the past 40 years to relatively high-placed State engineers over the past 15 years in particular, very seldom have the participants been able to respond to the AQL that their States use. The definition of the AQL receiving 100 percent pay is one way of coming to the determination. As mentioned previously, the AASHTO Quality Assurance Guide Specification defines AQL as using a PWL of 90 for HMA and PCC.
Two aspects of "What are we getting?" not discussed in "Procedure for Monitoring and Improving Effectiveness of Quality Assurance Specifications" are how to set specification limits and RQL. In setting specification limits, consideration must be given to whether single or double limits are needed. Single specification limits are somewhat easier to establish than double limits and allow contractors more flexibility in their variability. Double limits must strongly consider the typical variability contractors have in their processes.
Establishing the RQL is also an important aspect of setting specification limits to allow States to control what they do not want. This subject is generally foreign to most State specification writers.
What is Delivered?
"Procedure for Monitoring and Improving Effectiveness of Quality Assurance Specifications" indicates that the response to the question "What are we getting?" requires an analysis of the construction quality data that have been generated though the use of the QA specifications whose effectiveness is being assessed.  This requires the determination of the typical lot-by-lot statistical parameters, which can be difficult. When acceptance is based on single samples, this is difficult to do, and the specification becomes essentially a pass/fail decision. In addition, there are acceptance procedures that allow different lot sizes and different sample sizes (e.g., both of the FDOT quality characteristics that were analyzed).
The finding that the density specification was generally easy to meet should not be surprising. When only an average is specified, meeting this requirement is generally easier to meet than when variability is also included. For density, when a low value is encountered through QC testing, the roller operator can be instructed to increase the number of passes for the next test improving the average value. It increases the variability, but since that is not measured, it is not a factor. This helps explain why the lots are seldom assessed pay decreases. In fact, if a decrease in payment is assessed, either the mix was difficult to compact and additional rolling did not help, or someone was not paying attention to or conducting the QA tests.
The inconsistencies in the specifications are not surprising and probably exist in most State specifications. As mentioned previously, different sample sizes with the same limits when measuring the average are probably more typical than varying the limits. Once again, the statistical concept of the central limit theorem is foreign to many SHAs. Establishing limits that can be kept constant is a great advantage to PWL/PD, but experience indicates that even States that use PWL/PD do not realize this advantage.
FDOT should be commended for improving training to inspectors and contractors on compaction. However, did the training result from dissatisfaction with the densities they were receiving? If so, this is an indication they were not satisfied, at least intuitively, with "What they were getting."
"Procedure for Monitoring and Improving Effectiveness of Quality Assurance Specifications" states that the asphalt specification has been relatively more difficult to meet. This is at least in part due to the quality measure, AAD, used. Although several different averages and standard deviations can produce the same AAD, at least a measure of variability enters the calculation. AAD does not allow a contractor to modify production to offset low results by increasing the next result.
The conclusion that contractors are providing essentially the same quality level at each sample size is interesting. The delivery quality level being independent of the specified quality level suggests the ratio between the two is not ideal. Is this a function of AAD? Would the same relationship exist if PWL were used?
As "Procedure for Monitoring and Improving Effectiveness of Quality Assurance Specifications" states, it should not be surprising that the procedure for assessing specification effectiveness found inconsistencies.  The critical issue is how to measure the effectiveness. Is the procedure described in the paper adequate or can another procedure be developed that will improve the measure of effectiveness?
States should decide if or how to improve the specification when inconsistencies or less than desirable effectiveness in the specifications are found. The lack of evolving specifications is also a concern. Many States have not appreciably changed their specification in 20–30 years. With the improvements in quantifying risks that have been developed during this period of time, many more questions regarding specification effectiveness can be readily answered. States that still accept single samples or only averages may especially benefit from doing risk analyses of the type that have been illustrated in this study.
The need for maintaining a good materials and construction database cannot be overly stressed. To emphasize this point, the inability to access the database for the participating States in the present study was the reason for the scope change.
Identifying the most practical "best" analytical procedure and starting to develop information on what acceptable and less than acceptable ratios between quality wanted = quality specified = quality delivered were worthy goals of the present study. However, the lack of data necessary to accomplish these goals necessitated the change in scope.
Topics: research, infrastructure
Keywords: Quality assurance, Percent within limits, SPECRISK
TRT Terms: research, strategic planning