| FHWA > Engineering > Pavements > Concrete > FHWA-HRT-07-019 > Chapter 5 |
Products and Research
|
Advanced Quality Systems: Guidelines for Establishing and Maintaining Construction Quality DatabasesCHAPTER 5. ANALYSIS OF DATA IN CONSTRUCTION QUALITY DATABASES - EXAMPLES5.1 DATA ANALYSIS – POTENTIAL AND USESThus far, attention has been called to the various uses of a well-developed and organized construction quality database system. The implementation of a system that has well-integrated components (or individual databases) that can be linked with each other using a common reference system has additional benefits to the owner agency. These benefits can range across the technical, administrative, and legislative levels to improve the quality of construction and enhance the agency’s operations overall. The system can be designed to generate periodic reports, the nature of which would depend on the complexity of the database system and the sophistication of the linkage between the various individual database components. This section presents potential analyses that agencies can perform. At the simplest level of a construction quality database system, the data would include basic materials and construction AQCs, such as lot-by-lot acceptance test results for smoothness, strength, and thickness for PCC paving; and smoothness, density, and mixture properties for HMA paving. These data can be used to determine fundamental statistical parameters and assess variability in the test results. Note that these analyses can be performed on data groups representing a specific year of construction, test equipment, contractor, district, project, or other factors. In addition, if the database houses both agency and contractor test results, suitable data can be extracted to compare the two test populations using standard statistical hypothesis tests, such as the t-test and F-test. The availability of contractor QC test results will make available additional information for use during forensic investigations when AQC specifications are not met or when premature pavement failures occur. At an intermediate level of construction quality database maintenance, the database would be linked to a condition survey database, such as the agency’s PMS. In this case, the analysis could be extended to correlate the material and construction tests data with material performance. These analyses can then form the basis for establishing thresholds for warranty specifications and to estimate risks in warranties. Agencies maintaining an advanced level of construction quality database systems which can be integrated with other databases or project information, such as a cost database, can perform complex analysis to assess the cost-effectiveness of the agency’s specifications and agency practices. Cost databases should typically include material and construction costs, maintenance costs, and user costs to determine life-cycle costs accurately. In such cases, cost analysis can be performed to strike an optimum balance between quality and cost or performance and cost for the specific material types and construction practices followed by the agency. Further, life-cycle costs can be established as an AQC for different pavement types. The objective of the analysis is to minimize the life-cycle cost for each pavement type, and this information can be used to refine pavement type selection procedures within the agency over time. For example, studies that compared cost effectiveness of pavement types (ACPA, 2001; Cross and Parsons, 2002) would benefit from a comprehensive QA database linked to the agency’s cost database. 5.2 ANALYSIS ILLUSTRATIONSThis section provides examples of statistical analyses that can be performed for all three levels of databases discussed above. These analyses are simple examples that were developed using field test data or using data simulated artificially so that the various analysis capabilities of a construction quality database can be illustrated. Also, these examples represent only a subset of potential analyses that can be performed and by no means signify the overall scope of analysis using a construction quality database. 5.2.1 Estimation of Variability and Comparison of Test ResultsThis example illustrates the use of statistical analyses to test for differences of means between contractor and agency testing, testing the normality of the data, a comparison of the variation in construction of two contractors, and an evaluation of the adequacy of the sampling plan for thickness. Data were generated through simulation for this example. Description of Project and AQCs Two different contractors constructed jointed plain concrete (JPC) pavements under two separate contracts along a highway. Lots built by each contractor were numbered 1 through 10 by contractor A and 11 through 20 by contractor B. Each lot was approximately 0.8 to 2.4 km (0.5 to 1.5 mi) long and consisted of two-lane paving and four sublots each. The AQCs included slab thickness as one key parameter that was analyzed. Simulated Construction Process Two randomly located core samples were taken and measured for thickness from each of the four sublots within a given lot. The contractor testing was done by an independent laboratory, while the State testing was performed by the district laboratory. Note that the random samples were simulated from a normal distribution for the purpose of this example. The population means of sublots along each lot were varied slightly to simulate results that would occur along a project. Each project had the same design thickness and the population means were set equal. The project variances were set at different values to determine if this difference could be found through the sampling and testing process. Simulated results from construction testing for slab thickness for the two contracts are shown in tables 5 and 6.
Analysis of Data The basic data provide an opportunity for a highway agency to evaluate the following, through appropriate statistical parameters and analyses:
The mean and standard deviation were calculated for the thickness values recorded in the project by the State and the contractor, as tabulated in table 7. This table shows that the mean thicknesses appear to be similar, but the standard deviation of the thickness measurements in lots 11 through 20 built by contractor B is higher than those measured in lots 1 through 10 built by contractor A. Data in table 5 show that the thicknesses ranged from 183 to 224 mm (7.2 to 8.8 in) for contractor A, and table 6 shows a range of 155 to 241 mm (6.1 to 9.5 in) for contractor B. Since the target value was 203 mm (8.0 in), some of these samples will be out of the specification (which had a minimum rejection level of 178 mm [7.0 in]). This aspect is not discussed further herein. The test data indicate that paving operations performed by contractor B show a higher variability. But is it significantly higher? The first evaluation is to determine if the contractor results are significantly different from those of the State for each project. With contractor A (contract 1), the mean is 201 mm (7.92 in) versus the state’s value of 203 mm (8.01 in). The closeness of these values indicates no practical difference. If there was a significant difference, this would indicate to the State that the contractor’s data were suspect. It is also possible that the State’s measurement system may be out of calibration. The first evaluation is made using the t-test with paired-sublot mean values (contractor mean lot values compared with State mean lot values). The null hypothesis to be tested is that there is no difference between the means of the paired-lot samples for either contract. Based on a 0.05 level of significance, the results in table 8 show the calculated t-statistic is far below the critical value. Thus, there is no evidence that the contractor’s results are significantly different than the State’s for either contractor. Since it is known that the data from the contractor and State were randomly selected from the same normal distribution, this result agrees with the true underlying population of thickness values in each contract. Next, the analysis can be extended to compare the mean sublot core thickness obtained from the contractor testing and from the State testing. Results show reasonable correlation. This is in agreement with the t-test that failed to reject the null hypothesis at the 0.05 level of significance that the mean lot values were from the same population. Figure 9 shows no significant bias of results between the State and contractor (e.g., one being consistently above or below the other). The best-fit lines show a slope of 1.01 and 1.00 (both close to 1.0) for contractors A and B, respectively.
Figure 9. Comparison of State and contractor mean sublot core thicknesses. ![]()
Another important issue is whether the contractor and agency test results show significantly different variation. This can be evaluated using the standard F-test, which compares the two variances. The F-test is easily implemented using the Data Analysis functions in Excel®. For the F-test, the first step is to compute the variance for the contractor’s tests, Sc2 , and the agency’s tests, Sa2 for each contract. The F-test examines the null hypothesis: H0: Sa2 = Sc2 . The F-value is calculated as the ratio of these variances (always use the larger of the variances in the numerator so the ratio will be greater than 1). The closer this value is to 1.0, the closer the variances of the two data sets. After selecting a level of significance—0.05 in the case of this example—for the test the critical F-value (Fcrit) can be determined using F-tables of the F-distribution, which are a function of level of significance and degrees of freedom (n-1) associated with each set of test results. Excel® computes both the F and Fcrit values. The results of the F-tests are presented in table 9. F-test performed for the data in this analysis shows that the null hypothesis is not rejected at the 0.05 level for contractor B indicating that the variance in thickness results of the contractor and the State are not significantly different. However, the results for contractor A show that the variance in thickness measured by the contractor and the State are significantly different. In fact, it is demonstrated that with the number of thickness readings collected by the State, the variance in State readings for contract 2 is lesser than for contract 1. Note that individual readings in each sublot (and not lot means) were included in this test.
The next result illustrates the statistical analysis to determine if one contractor has a significantly higher variation in slab thickness than the other contractor. Results of F-test analysis (one-tailed to determine if one contractor has higher variance than the other) performed with the data collected by the two contractors are presented in table 10. The F-value of 3.03 exceeds the critical value of 1.45, indicating that there is a significant difference in variation in the thickness measurements between the two contractors. The variance in thickness measurements for contractor B is much higher than that for contractor A, indicating that the quality of construction of the former is less controlled and perhaps could exhibit a different performance over time. Locations along the project having thinner slabs would show a reduced life, all other properties being equal.
Another interesting comparison that can be made is to derive the frequency distributions for each project and test them for normality. This is accomplished using the Chi-square test for goodness of fit, which examines if the distribution of the data about the mean follows a normal distribution. The distributions for thickness measurements by contractor A and B are presented in figures 10 and 11. The larger variability in contractor B’s results is evident through a comparison of the distributions shown in these figures. The computed Chi-square statistic is 1.488 and 1.927 for contracts 1 and 2, which are well below the critical value of 3.84 and 7.8 for these contracts respectively. These results indicate that they both follow a normal distribution at 95% confidence level, which is expected because the data was generated using a normal distribution for the purpose of this example. Finally, the test data can be combined to calculate the percent deficient slab thickness for each of the contracts. For these projects, the specifications call for special remedial action (removal and replacement) for any area that has a thickness of less than 178 mm (7 in). The percent area less than 178 mm (7 in) can be estimated as follows. All of the contractor and State data can be combined to compute an overall mean and standard deviation for each contract. It has already been shown that the thickness values approximately follow a normal distribution. Since this results in a fairly large data set (e.g., > 30), it can be assumed that the population mean and standard deviation are known and the standardized normal deviate, Z, can be used to make the calculation.
where µ is the mean and σ the standard deviation of entire contract dataset, based on contractor and State data. The value of Z was calculated to be −1.78 and −1.31 for contracts 1 and 2, respectively. From a normal distribution table, this shows a percent area of 3.75% for contract 1, and 9.5% for contract 2 related to the percentage of defective samples. Figure 10. Frequency histogram and validation of normal distribution through Chi-square test for thickness measurements by contractor A. ![]()
Figure 11. Frequency histogram and validation of normal distribution through Chi-square test for thickness measurements by contractor B. ![]()
Contract 2 has a much greater percent deficient indicating that contract 2 resulted in much higher variation in construction process than contract 1. Thus, even though the mean thicknesses are the same between the two contracts, this higher variation in slab cracking will lead to a much earlier amount of cracking in contract 2 than 1 because thinner slabs mean more rapid fatigue cracking. In fact, 25mm (1 in) is a very significant difference that could reduce the time to, say, 10 percent cracked slabs (often used as a failure criterion) by one-half or more. Summary and Recommendations In summary, the statistical analyses in this example, which are based on basic QA data for two projects, show the following results:
Thus, even basic QA data can be analyzed to show a number of important practical conclusions about the as-constructed characteristics of a given contract or multiple contracts. More analyses can be performed that address such things as within contract variability from lot to lot and how these data may be used to improve the specification. 5.2.2 Contractor Test Results Used in the Acceptance DecisionAlthough most SHAs have transferred primary responsibility for QC of asphalt materials to the producers doing the work, the issue to allow contractor test results for acceptance control and payment remains. A database that tracks both contractor and agency test results would allow SHAs to verify contractors’ QC tests to provide resolution on issues like contractor payment. The FHWA allows SHAs to use the contractor’s test results for payment, provided the States verify that the contractor’s results are representative of the actual material being reproduced. This requirement is statistically challenging. Verification testing could be limited. For example, the State may select (randomly) to test a single sub-lot compared to the contractor that tests each sub-lot for QC. While the State and contractor tests may compare within statistical limits, the risk to the agency of accepting unacceptable quality work (Type II error) is great. This problem is being addressed under NCHRP Project 10-58 (02), "Using Contractor-Performed Tests in Quality Assurance." NCHRP Synthesis 346 (Hughes, 2005) discusses State construction QA programs and the Title 23, CFR 637 regulation adopted by FHWA in 1995 that requires each SHA to develop a QA program for the National Highway System (NHS). It emphasizes that verification of contractor test results by the State be done through the use of independent samples. The use of split samples can only address differences in test procedures. A database of both State assurance and contractor control test results collected independently (and randomly), spanning several lots or days of work combined, would allow a more thorough evaluation of material conformance. Verification of Contractor Test Results As mentioned previously, there are two procedures for verification of independently obtained test results, the F-test and t-test, which usually are used together. This procedure involves two hypothesis tests, where the null hypothesis, H0, for each test is that the contractor’s test and the agency’s tests are from the same population. The F-test is applied to ensure that the variabilities of the two data sets are equal and the t-test to ensure that the means of the two data sets are equal. Both tests require more than one agency test result before a comparison can be made. The application of this verification procedure is easily illustrated in this example. Consider the contractor and agency air void test results shown in table 11. For each of the four sublots from every lot of AC paved, the contractor obtains a sample randomly and performs a density test to determine the air voids content of the sample. This measure is used to evaluate compaction density and is used for pay factor calculation. To verify the result, the agency independently runs the same test from a single, randomly obtained core from a lot. One procedure used when comparing a single agency test result with multiple contractor test results is to define the allowable interval within which the agency test result must fall as:
where Description of Project and AQCs Table 11 presents the air void test results on a flexible pavement construction project undertaken by a contractor. The project included 14 lots, each of which was divided into 4 sublots by the contractor. Contractor results include measurements in each sublot, while the agency’s acceptance testing was performed on each lot.
Data Analysis Both the F-test and t-test can be implemented easily using the Data Analysis functions in Excel®. Table 12 shows the results of an F-test run on the example data shown in table 11. The test was run using a level of significance or alpha of 0.01. This is the probability of rejecting a null hypothesis when it is actually true. Typical levels of significance are 0.1, 0.05, and 0.01. Note that the Excel®-calculated F values are determined for a one-tail F distribution. For a two-tail F-test in Excel®, the level of significance input must be halved (α = 0.005). This will be the case when evaluating the null hypothesis, H0: Sa2 = Sc2. Since F < Fcrit (i.e., 1.225 < 3.885), there is no reason to believe that the two sets of data have different variabilities. That is, they could have come from the same population.
Given that the standard deviations of the two data sets are sufficiently similar, the Student t-test can be used to evaluate the hypothesis of equal means. If not, an alternative—the Cochran variant of the t-test (assuming unequal variances)—must be run to evaluate the hypothesis of equal means. Both t-test variants can be evaluated using the Data Analysis routines in Excel®. Table 13 shows the results of a t-test assuming equal variances at the 0.01 significance level.
Results of the t-test shown in table 13 indicate that t-statistic is less than t-crit (0.284 < 2.650); therefore, the hypothesis of equal means cannot be rejected at the 99 percent confidence level. Note that use was made of the two-tail distribution for evaluating the hypothesis of equal means. We can therefore assume that both data sets came from the same population and that the agency results verify the contractor results. This provides the agency with confidence in using the contractor’s results for acceptance decisions. 5.2.3 Illustration of Relationships between Pavement Construction AQCs and PerformanceThis example was derived using simulation and the latest models from the Mechanistic-Empirical Pavement Design Guide (MEPDG) (Applied Research Associates, 2004). The example shows through reasonable simulation that it is possible to obtain a relationship between measured AQC test results and the subsequent performance of the pavement over a period of 15 years. This information can be used for many purposes including the improvement of the construction specification. Description of Project and AQCs Portions of a JPC pavement project were constructed over 2 months (July and October) that included a total of 20 lots. Each lot was approximately 0.8 to 2.4 km (0.5 to 1.5 mi) long, consisting of two-lane paving and four sublots each. The AQCs included initial IRI, compressive strength at 28 days, and slab thickness. Simulated Construction Process Two random samples of strength (cylinders behind the paver), cores for slab thickness, and IRI averaged in the wheelpaths of each lane were taken from each of the sublots within a given lot. The random samples were actually simulated from normal distributions for the purpose of this example. The population mean of each lot was varied in a way to demonstrate what might occur from an inconsistent contractor (e.g., higher variability between lots which might be built on different days). The mean AQCs obtained from sampling (from a normal distribution) varied substantially from lot to lot along the project, demonstrating inconsistent quality of construction. Results from construction testing for slab thickness, compressive strength at 28 days, and initial IRI are shown in table 14. Future Performance Prediction The beginning and ending of each lot were referenced in the field and recorded, making it possible to correlate directly with performance data measured by the pavement management bureau over a period of 15 years. This link is obviously required to make this correlation. The pavement showed fairly wide-ranging performance along the project over the 15 years. A summary of performance data measured at the end of the 15-year period for slab cracking (percent slabs transverse cracks), mean joint faulting, and IRI (in the outer traffic lane) is shown in table 15. Note that the performance data were simulated using the MEPDG prediction models. Analysis of Data Simple plots showing the mean lot AQCs versus projected cracking, faulting, and initial IRI illustrate the correlations that might be achieved in an actual situation. Figures 12 through 15 show the correlations that appear to be significant. Other correlations did not show any relationship to each other. All three of the AQCs for this project appear to have an impact on distress and IRI after 15 years of performance of this project. Further, as shown in figure 16, the month of construction was found to have a significant effect on the magnitude of joint faulting.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||