This report is an archived publication and may contain dated technical, contact, and link information

Publication Number: FHWA-RD-03-037
Date: May 2005

Validation of Accident Models for Intersections

FHWA Contact: John Doremi,
HRDI-10, (202) 493-3052, John.doremi@dot.gov

PDF files can be viewed with the Acrobat® Reader®

2. VALIDATION OF ACCIDENT MODELS

This chapter presents validation results for the five types of rural intersections. It first provides a description of the overall validation approach. Second, the data sources and issues are discussed. Third, the individual validation activities and their results are presented. Finally, a discussion of these results is provided.

2.1 VALIDATION APPROACH AND PRELIMINARIES

Several objectives were identified to guide the validation efforts:

Determine whether the existing accident models developed by FHWA were "over-fit" to the estimation data, thus making elements (factors) appear to affect accidents when, in fact, they do not.
Determine whether important variables were omitted from the models due to lack of representative data, insufficient sample size, lack of power, or some combination of thereof.
Determine whether the functional forms of the models have been properly specified.
Determine whether the models are valid at other locations (across space) and at the same intersections in the future (across time).

To meet these objectives, researchers conducted two aspects of model validation: internal and external model validations. The first aspect of model validation consists of qualitative assessments of statistical models, including examining the collection of variables used to identify missing or irrelevant variables, inspecting the functional form of the models, and assessing the implications of the models regarding an underlying theory of the data generating process-motor vehicle crashes, in this case. External model validation is more quantitative and is focused on various quantitative measures of a statistical model's prediction ability. Three model validation activities comprising both internal and external validation activities were undertaken:

Validation of the models against additional years of accident data for the same intersections used in the calibration. Because the existing accident models were developed as direct inputs into the IHSDM's Crash Prediction Module, highway agencies will use the models to estimate the safety performance of an existing or proposed roadway. Therefore, the models should be able to forecast accidents over time and space. This validation was used to assess the models' ability to forecast accidents across time. Temporal stability of a model suggests that it will predict accident frequencies well across time; the effect of time, or covariates that are influenced by time, are either not important or, if important, are included in the model by some relevant explanatory variable.
Validation of the models against Georgia data. This validation was used to assess the models' ability to forecast accidents over space. The primary application of the models is to forecast the impact of design considerations and countermeasures in regions and jurisdictions not represented in the calibration data. Thus, this validation exercise attempts to assess the ability of the models to capture differences across regions as reflected through relevant variable expressions in the models.
Validation of the Accident Prediction Algorithm. Validation of the Accident Prediction Algorithm as a whole was considered to be important. This validation activity addresses the logical defensibility of the algorithm used for predicting accidents, and provides quantitative evidence of how well it is predicting accidents.

The validation exercises in this chapter focus primarily on external validation-validation concerned with assessing performance of the models when compared to external data. The discussion section mentions internal validation concerns-the internal coherence, structure, theoretical soundness, and plausibility of the models proposed. And more focus is given to internal validation in the recalibration activities documentation, which follows the report on these validation efforts. In these validation activities, all model specifications were validated "as is," that is, no changes to model specifications were considered or assessed.

Throughout the report the subjective criteria of alpha () equal to 0.10 is applied. The support for this level of a is as follows. In statistical models of accident occurrence, a type II error can be argued to be more serious than a type I error. With a type I error, the analyst concludes that the null hypothesis is false, when in fact it is true with probability. This translation is not precise, but the precise and correct conditional probability interpretation is cumbersome and, for practical purposes, does not lend any additional insights. This means that the analyst would conclude, for example, that the presence of a left-turn lane reduces accidents when, in reality, it does not. As a result of this conclusion, one might install left-turn lanes without realizing a reduction in crashes. A type II error occurs with beta () probability. In general, choosing a larger results in a smaller, all else being equal. Thus, continuing with the previous example, making a type II error results in concluding that the presence of a left-turn lane does not reduce crashes when in fact it does. The risk is in failing to install an effective countermeasure. To summarize, committing a type I error results in applying an ineffective countermeasure, while committing a type II error results in failing to apply an effective countermeasure. Computing the actual in negative binomial models is extremely difficult. However, applying a liberal equal to 0.10 suggests that this study has simultaneously accepted a smaller level of beta.

Several goodness-of-fit (GOF) statistics to assess model fit to validation data were employed. Comparisons between models, however, are generally subjective. In the documentation to follow, the terms "serious," "moderate," and "marginal" denote subjective evaluations of GOF comparisons between models. Serious differences in GOF are suggestive of noteworthy or significant model deficiencies. Moderate differences in GOF suggest cases where models could be improved, but improvements might be difficult to obtain. Marginal differences in GOF are thought to be negligible and are potentially explained by random fluctuations in the observed data.

2.2 DATA SOURCES AND ISSUES

This section presents the data sources used for the validation and some general data issues. The data used came from three sources:

The original data used for the calibration of the main models for total accidents were obtained from the researchers who developed those models.
HSIS data were obtained for additional years for the same intersections used in the calibration for the additional years available.
An independent validation data set was assembled specifically for this project.

The first validation activity employed the original datasets with the objective of reproducing the original models. In the second validation activity, additional years of accident data (1990 to 1998) from Minnesota were used to validate models I and II over time. The Washington data collected, but not used, for the final development of models I and II, plus accident data for one more year (1996) also were utilized. For models III, IV, and V, additional years of accident data (1996 and 1997) from California and Michigan were used to validate the models over time. For the third validation activity, data from Georgia were assembled during this project and used to further assess the models' ability to predict accidents over space (in a different jurisdiction). The fourth validation activity assessed the "Red Book" accident prediction algorithm, including recommended base models.⁽³⁾

Models I and II, which were developed using Minnesota data, modeled police reported "intersection" or "intersection-related" accidents which occurred within 76.25 m (250 ft) of the intersection. Models III, IV, and V were each developed using Michigan and California data for two dependent variables. The first used all accidents occurring within 76.25 m (250 ft) of the intersection. The second used only those accidents considered to be "intersection-related" and occurring within 76.25 m (250 ft) of the intersection. Special criteria were needed for the latter case because California data does not include a variable indicating if an accident was "intersection-related." An accident was considered to be intersection related accidents if it was:

Vehicle-pedestrian accident.
An accident where one vehicle involved was making a left turn, right turn or U-turn before the crash.
A multivehicle accident in which the accident type is either a sideswipe, rearend, or broadside/angle.

Vehicle-bicyclist, head-on, and run-off-the-road crashes may could possibly be classified as intersection related but were not included in the analysis in order to maintain consistency and comparability with previously estimated models from which this research was based.⁽²⁾ According to discussions with FHWA, for four-leg STOP-controlled intersections, these crashes typically represent about 6 percent of the total crashes occurring within 76.25 m (250 ft) of the intersection. For three-leg STOP-controlled intersections, these crashes typically represent about 13 percent of the total crashed occurring within 76.25 m (250 ft) of the intersection.

Table 1. Basic Statistics for the Data Sources

Basic statistics for the data sources (Blank cells indicate accident types not validated; x indicates accident types validated)		Model I¹	Model II¹	Model III²	Model IV²	Model V²
Sample size	Original Years	389	327	84	72	49
Sample size	Georgia Data	121	114	52	52	51
States³	Original years	MN	MN	CA and MI	CA and MI	CA and MI
	Subsequent Years	MN and WA	MN and WA	CA and MI	CA and MI	CA and MI
	Georgia Data	GA	GA	GA	GA	GA
Years covered	Original Years	5(1985 to 1989)	5(1985 to 1989)	3(1993 to 1995)	3(1993 to 1995)	3(1993 to 1995)
	Subsequent Years	9(1990 to 1998 for MN1993 to 1996 for WA)	9(1990 to 1998 for MN1993 to 1996 for WA)	2(1996 to 1997)	2(1996 to 1997)	2(1996 to 1997)
	Georgia Data	2(1996 to 1997)	2(1996 to 1997)	2(1996 to 1997)	2(1996 to 1997)	2(1996 to 1997)
Accident types validated	Total	X	X
	Injury	X	X
	TOTACC⁴			X	X	X
	TOTACCI⁴			X	X	X
	INJACC			X	X	X
	INJACCI			X	X	X

¹ Vogt and Bared 1998, (pp. 60-67)

² Vogt, 1999, (pp. 53-64)

³ MN: Minnesota, WA: Washington, CA: California, MI: Michigan, GA: Georgia.

⁴ TOTACC: Total number of accidents within 76.25 m (250 ft); TOTACCI: Only those crashes considered intersection-related and within 76.25 m (250 ft); Similar distinction between INJACC and INJACCI

The Georgia data specially collected for this validation also do not include a variable coded as "intersection" or "intersection-related" by the police. As such, the above criteria for determining an "intersection-related" accident were used.

To use the Georgia data to validate the original models, consistency of accident location definitions had to be resolved. The original models included accidents only within 76.25 m (250 ft) from the intersection center. However, the accident data recorded in Georgia measures the distance of an accident from an intersection within two decimal places of a mile, i.e., within 8.05 m (26.4 ft). The issue was whether or not to include accidents within 0.06 or .08 kilometers (km) (0.04 or 0.05 miles, or 211 or 264 ft). In the end, both the 0.04- and 0.05-mile buffers were used in separate validation efforts, mainly to check the sensitivity of the results to this definition.

2.3 MODEL PERFORMANCE MEASURES

Several GOF measures were used to assess model performance. It is important to note at the outset that only after an assessment of many GOF criteria is made can the performance of a particular model or set of models be assessed. In addition, a model must be internally plausible, and agree with known theory about crash causation and processes. The GOF measures used were:

Pearson's Product Moment Correlation Coefficients Between Observed and Predicted Crash Frequencies

Pearson's product moment correlation coefficient, usually denoted by r, is a measure of the linear association between the two variables Y₁ and Y₂ that have been measured on interval or ratio scales. A different correlation coefficient is needed when one or more variable is ordinal. Pearson's product moment correlation coefficient is given as:

Equation 2. Pearson's product moment correlation coefficient, lowercase R, equals the sigma of Y subscript I1 minus the mean of the Y subscript I observations, Y bar, subscript 1, times Y subscript I2 minus Y bar subscript 2, all divided by the following raised to the one-half power: the sigma of Y subscript I1 minus Y subscript 1, all squared, times the sigma of Y subscript I2 minus Y subscript 2, all squared.

(2)

where

= the mean of the observations.

A model that predicts observed data perfectly will produce a straight line plot between observed (Y₁) and predicted values (Y₂), and will result in a correlation coefficient of exactly 1. Conversely, a linear correlation coefficient of 0 suggests a complete lack of a linear association between observed and predicted variables. The expectation during model validation is a high correlation coefficient. A low coefficient suggests that the model is not performing well and that variables influential in the calibration data are not as influential in the validation data. Random sampling error, which is expected, will not reduce the correlation coefficient significantly.

Mean Prediction Bias (MPB)

The MPB is the sum of predicted accident frequencies minus observed accident frequencies in the validation data set, divided by the number of validation data points. This statistic provides a measure of the magnitude and direction of the average model bias as compared to validation data. The smaller the average prediction bias, the better the model is at predicting observed data. The MPB can be positive or negative, and is given by:

(3)

where

n = validation data sample size; and

= the fitted value observation.

A positive MPB suggests that on average the model overpredicts the observed validation data. Conversely, a negative value suggests systematic underprediction. The magnitude of MPB provides the magnitude of the average bias.

Mean Absolute Deviation (MAD)

MAD is the sum of the absolute value of predicted validation observations minus observed validation observations, divided by the number of validation observations. It differs from MPB in that positive and negative prediction errors will not cancel each other out. Unlike MPB, MAD can only be positive.

(4)

where

n = validation data sample size.

The MAD gives a measure of the average magnitude of variability of prediction. Smaller values are preferred to larger values.

Mean Squared Prediction Error (MSPE) and Mean Squared Error (MSE)

MSPE is the sum of squared differences between observed and predicted crash frequencies, divided by sample size. MSPE is typically used to assess error associated with a validation or external data set. MSE is the sum of squared differences between observed and predicted crash frequencies, divided by the sample size minus the number of model parameters. MSE is typically a measure of model error associated with the calibration or estimation data, and so degrees of freedom are lost (p) as a result of producing Y_hat, the predicted response.

MSE =

Equation 5. Mean squared error, MSE, equals the sum from I equals 1 to N of the sum of Y subscript I minus Y tophat subscript I, squared, all divided by the estimation data sample size, N subscript 1, minus degrees of freedom lost, P.

(5)

MPSE =

Equation 6. Mean squared prediction error, MSPE, equals the sum from I equals 1 to N of the sum of Y subscript I minus Y tophat subscript I, squared, all divided by the validation data sample size, N subscript 2.

(6)

where

n₁ = estimation data sample size; and

n₂ = validation data sample size.

A comparison of MSPE and MSE reveals potential overfitting or underfitting of the models to the estimation data. An MSPE that is higher than MSE may indicate that the models may have been overfit to the estimation data, and that some of the observed relationships may have been spurious instead of real. This finding could also indicate that important variables were omitted from the model or the model was misspecified. Finally, data inconsistencies could cause a relatively high value of MSPE. Values of MSPE and MSE that are similar in magnitude indicate that validation data fit the model similar to the estimation data and that deterministic and stochastic components are stable across the comparison being made. Typically this is the desired result.

To normalize the GOF measures to compensate for the different numbers of years associated with different data sets, GOF measures were computed on a per year basis. For MPB and MAD per year, MPB and MAD were divided by number of years. However, since MSPE and MSE are the mean values of the squared errors (MPB or MAD were squared and divided by n or n-p), MSPE and MSE were divided by the square of number of years to calculate MSPE and MSE per year, resulting in a fair comparison of predictions based on different numbers of years.

Overdispersion Parameter, K

The overdispersion parameter, K, in the negative binomial distribution has been reported in different forms by various researchers. In the model results presented in this report, K is reported from the variance equation expressed as:

Equation 7. The estimated variance of the mean accident rate, Var open bracket M closed bracket, equals the estimated mean accident rate, E open bracket M closed bracket, plus the estimated overdispersion parameter, K, times E open bracket M closed bracket squared.

(7)

where

Var{m} = the estimated variance of the mean accident rate;

E{m} = the estimated mean accident rate; and

K = the estimated overdispersion parameter.

Variance overdispersion in a Poisson process can lead to a negative binomial dispersion of errors, particularly when the Poisson means are themselves approximately gamma distributed, or possess gamma heterogeneity. The negative binomial distribution has been shown to adequately describe errors in motor vehicle crash models in many instances. Because the Poisson rate is overdispersed, the estimated variance term is larger than the same under a Poisson process. As overdispersion gets larger, so does the estimated variance, and consequently all of the standard errors of estimates become inflated. As a result, all else being equal, a model with smaller overdispersion (i.e., a smaller value of K) is preferred to a model with larger overdispersion.

2.4 VALIDATION ACTIVITY 1: VALIDATION USING ADDITIONAL YEARS OF ACCIDENT DATA

The acquisition of subsequent years of accident data allowed for the validation of the models across time. For the type I and II models, although data from both Minnesota and Washington were collected, only data from Minnesota were used for the final calibration. Only the models developed using Minnesota data were validated because a report by Vogt and Bared states that "in view of the small size of the Washington State sample ... the non-random and ad hoc character of the Washington intersections, the lesser quality of the collected Washington data ... we take the Minnesota models as fundamental." (p. 123)⁽¹⁾ However, the Washington data were still obtained to further assess the transferability of these models over time and across jurisdictions. The original report performed a similar validation exercise with Washington data but with fewer years of accident data. The validation undertaken for this activity included applying the original models to the new data and assessing various measures of GOF. It also included recalibrating the models using the additional years of data and comparing the parameter estimates with those of the original models.

The Type I and II models were originally calibrated on accident data from 1985 to 1989. In the original report, 1990 to 1993 data were used to validate the models in addition to the Washington data. The results for the MAD are given in Table 2.

Table 2. Validation Statistics from Original Report¹

Measure	Model I (90-93 Minnesota data)	Model II (90-93 Minnesota data)	Model I (93-95 Washington data)	Model II (93-95 Washington data)
MAD	1.02	1.28	1.17	2.68

¹ Vogt and Bared, 1998, (p. 131-132)

For the new validation of Type I and II models, accident data from 1990 to 1998 were obtained to expand the validation dataset.

For the Type III, IV, and V models, the original models were developed with accident data that were collected between 1993 and 1995. Accident data collected between 1996 and 1997 at the same locations were acquired to validate the models.

Data Limitations in the Minnesota Data

Because site characteristics change over time, the original sites were examined on important variables to determine which ones were no longer suitable for inclusion. Also, any sites where any year of accident data was missing was not included. Of the 327 original four-legged sites, 315 were retained while 367 of the original 389 three-legged sites were retained. The sites that were excluded typically changed from being a rural to a suburban environment, changed in the number of legs, and changed traffic control or had missing years of accident data.

Subsequent to the analysis and draft report, it was discovered that the mile log information that identifies the location of an intersection for some sites had changed over the 1990 to 1998 period. Although errors will exist in the accident counts used at these sites, these errors were found to be negligible. For example, for Type I sites, the average number of accidents per site per year in the validation data was 0.25 and 0.11 for total and injury data, respectively. In the corrected data, these averages are 0.26 and 0.11. Therefore, the conclusions drawn from the analysis are not affected and the analysis has not been redone.

Data Limitations in the Michigan Data

Traffic volumes from 1993 to 1995 were not updated for this validation exercise because AADT information for major and minor roads for 1996 and 1997 were not available. One of the complexities regarding AADT acquisition is outlined in the report published originally by Vogt.⁽²⁾ In the original data, minor road AADT from 1993 to 1995 was not available for Michigan intersections. Therefore in the report, major AADT plus peak-hour turning movement counts were used to estimate missing years of minor road AADT.

Subsequent to the analysis and draft report it was discovered that for type V sites from Michigan, the original researchers manually identified crossroad mileposts for about 40 percent or more of these intersections (State routes) and counted accidents that occurred on the crossroads. At non-State route crossroads, accidents are mileposted to the mainline. However, the Michigan accident data for 1996 and 1997 years obtained did not include the crossroad accidents at intersections with a State route crossroad because the validation team did not have the crossroad milepost information at that time. As a result, the later year crossroad accident numbers at these sites should be systematically lower than expected.

2.4.1 Model I

The summary statistics shown in table 3 indicate that there are similar accident frequencies between the original (1985-89 data for 389 sites) and the additional (1990-98 data for 367 sites) years. Note that the statistics compare 5 years of accident data in the original set to 9 years of accident data for validation over time. The latter years of data exhibit an increased variability amongst the sites. The Washington data shows a higher average accident frequency and a lower variability between sites compared to the original Minnesota data.

Table 3. Accident Summary Statistics for Type I Sites

Dataset	No. of Sites	Mean	Median	Std. Deviation	Max.
Minnesota Total (85-89)¹	389	1.35 (0.27/year)	0.00	2.88	39
Minnesota Injury (85-89)¹	389	0.59 (0.12/year)	0.00	N/A²	17
Minnesota Total (90-98)	367	2.21 (0.25/year)	1.00	3.89	32
Minnesota Injury (90-98)	367	0.95 (0.11/year)	0.00	1.79	13
Washington Total (93-96)	181	1.43 (0.36/year)	0.00	2.48	14
Washington Injury (93-96)	181	0.66 (0.17/year)	0.00	1.33	10

¹ Vogt and Bared, 1998, (p. 60)

² N/A: not available

Total Accident Model

The model was recalibrated with the additional years of accident data for the original Minnesota sites. Recall that the intersection related variable for vertical curvature, VCI1, did not exactly match those statistics given in the report. The parameter estimates, their standard errors, and p-values are provided in table 4, which reveals differences in the parameter estimates of the variables between the two time periods.

VCI1 and RT were estimated with opposite signs to the original models, although in the original calibration these parameter estimates had large standard errors and were not statistically significant. This indicates that variables with large standard errors compared to the parameter estimates should not be included in the model even if engineering "common sense" suggests that they should. The other parameters were estimated with the same sign as originally and in some cases similar magnitude and significance to the original models. The overdispersion parameter, K, was estimated with a similar magnitude.

Table 4. Parameter Estimates for Type I Total Accident Model Using Additional Years of Data

Variable	Original Estimates¹ (s.e., div-value)	Recalibrated Estimates (s.e., p-value)	Additional Years Estimates (s.e., div-value)
Constant	-12.9922 (1.1511, 0.0001)	-12.90 (1.16, <0.001)	-13.48 (1.07, <0.001)
Log of AADT1	0.8052 (0.0639, 0.0001)	0.8051 (0.0784, <0.001)	0.8199 (0.0714, <0.001)
Log of AADT2	0.5037 (0.0708, 0.0001)	0.4991 (0.0660, <0.001)	0.4808 (0.0584, <0.001)
HI1	0.0339 (0.0327, 0.3004)	0.0339 (0.0220, 0.124)	0.0145 (0.0206, 0.481)
VCI1	0.2901 (0.2935, 0.3229)	0.1900 (0.2260, 0.402)	-0.245 (0.269, 0.363)
SPD1	0.0285 (0.0177, 0.1072)	0.0273 (0.0144, 0.058)	0.0375 (0.0130, 0.004)
HAZRAT1	0.1726 (0.0677, 0.0108)	0.1806 (0.0754, 0.017)	0.0779 (0.0655, 0.234)
RT MAJ	0.2671 (0.1398, 0.0561)	0.2690 (0.1420, 0.058)	-0.077 (0.126, 0.539)
HAU	0.0045 (0.0032, 0.1578)	0.0043 (0.0024, 0.075)	0.00145 (0.00222, 0.513)
K²	0.481	0.485	0.500

¹ Vogt and Bared, 1998, (p. 115)

² K: Overdispersion value

GOF measure comparisons are shown in table 5. The Pearson product-moment correlation coefficient was marginally higher for the original data than when the model based on that data were used to predict the additional years of data. The MPB is higher for the prediction of the additional years of data. The MADs per year are similar for the predictions for the two time periods. The MSPE per year squared is lower than the MSE per year squared.

Table 5. Validation Statistics for Type I Total Accident Model Using Additional Years of Data

Measure	Recalibrated 1985-89 Model	Original 1985-89 Model	Recalibrated 1985-89 Model
Years used for validation	1985-89	1990-98	1990-98
Number of sites	389	367	367
Pearson product-moment correlation coefficients	0.662	0.612	0.614
MPB	-0.01	-0.050	-0.52
MPB/yr	0.00	-0.06	-0.06
MAD	1.03	1.84	1.84
MAD/yr	0.21	0.20	0.20
MSE	4.64	N/A¹	N/A¹
MSE/yr²	0.19	N/A¹	N/A¹
MSPE	N/A¹	10.93	10.94
MSPE/yr²	N/A¹	0.13	0.14

¹ N/A: not available

The model was also recalibrated with the Washington data. The parameter estimates, their standard errors, and p-values are provided in table 6, which reveals differences in the parameter estimates of the variables between the two locations.

All parameters were estimated with the same sign as the original models, and in some cases with similar magnitude and significance. Notable exceptions are VCI1 and HAZRAT1 which were estimated with the same sign but with a large difference in magnitude. For VCI1 both the original and newly estimated parameters had large standard errors. For the new Washington data HAZRAT1 had a large standard error and the overdispersion parameter, K, was higher than the original Minnesota model.

A comparison of validation measures for the model recalibrated with the original data and the original model applied to the Washington data is shown in table 7.

The Pearson product-moment correlation coefficient was slightly higher for the original data as compared to Washington data. The MPB, mean absolute deviations and mean squared prediction errors were similar in magnitude.

Table 6. Parameter Estimates for Type I Total Accident Model Using Washington Data

Variable	Original Estimates (s.e., p-value)¹	Recalibrated Estimate (s.e., p-value)	Washington Data Estimates (s.e., p-value)
Constant	-12.9922 (1.1511, 0.0001)	-12.90 (1.16, <0.001)	-12.59 (2.01, <0.001)
Log of AADT1	0.8052 (0.0639, 0.0001)	0.8051 (0.0784, <0.001)	0.8730 (0.1750, <0.001)
Log of AADT2	0.5037 (0.0708, 0.0001)	0.4991 (0.0660, <0.001)	0.4858 (0.0805, <0.001)
HI1	0.0339 (0.0327, 0.3004)	0.0339 (0.0220, 0.124)	0.0170 (0.0356, 0.632)
VCI1	0.2901 (0.2935, 0.3229)	0.1900 (0.2260, 0.402)	0.6200 (0.4440, 0.162)
SPD1	0.0285 (0.0177, 0.1072)	0.0273 (0.0144, 0.058)	0.0122 (0.0179, 0.495)
HAZRAT1	0.1726 (0.0677, 0.0108)	0.1806 (0.0754, 0.017)	0.1030 (0.1010, 0.309)
RT MAJ	0.2671 (0.1398, 0.0561)	0.2690 (0.1420, 0.058)	0.2780 (0.2750, 0.311)
HAU	0.0045 (0.0032, 0.1578)	0.0043 (0.0024, 0.075)	0.0032 (0.0127, 0.802)
K²	0.481	0.485	0.769

¹ Vogt and Bared, 1998, (p. 115)

² K: Overdispersion value

Table 7. Validation Statistics for Type I Total Accident Model Using Washington Data

Measure	Recalibrated 1985-89 Model	Original 1985-89 Model
Years used for validation	1985-89 (Minnesota)	1993-96 (Washington)
Number of sites	389	181
Pearson product-moment correlation coefficients	0.662	0.579
MPB	-0.01	-0.30
MPB/yr	0.00	-0.08
MAD	1.027	1.39
MAD/yr	0.205	0.35
MSE	4.64	N/A¹
MSE/yr²	0.19	N/A¹
MSPE	N/A¹	4.45
MSPE/yr²	N/A¹	0.28

¹ N/A: not available

Injury Accident Model

Table 8 shows the parameter estimates for the type I injury accident model using additional years of data. VCI1, RT MAJ, and HAUHAUHa were estimated with opposite signs than the original model, but this is not surprising when the large standard errors are considered. The other parameters were estimated with the same direction of effect, and in some cases similar magnitude and significance as the original model. The overdispersion parameter, K, was estimated to be slightly higher than that in the original model.

Table 8. Parameter Estimates for Type I Injury Accident Model Using Additional Years of Data

Variable	Original Estimates (s.e., p-value)¹	Additional Years Estimates (s.e., p-value)
Constant	-13.0374 (1.7908, 0.0001)	-15.41 (1.44, <0.001)
Log of AADT1	0.8122 (0.0973, 0.0001)	0.7774 (0.0932, <0.001)
Log of AADT2	0.4551 (0.0977, 0.0001)	0.5815 (0.0760, <0.001)
HI1	0.0335 (0.0327, 0.3047)	0.0067 (0.0268, 0.802)
VCI1	0.1869 (0.3657, 0.6092)	-0.359 (0.393, 0.361)
SPD1	0.0156 (0.0269, 0.5618)	0.0514 (0.0179, 0.004)
HAZRAT1	0.2065 (0.0930, 0.0263)	0.1618 (0.0857, 0.059)
RT MAJ	0.3620 (0.1814, 0.0460)	-0.260 (0.163, 0.159)
HAU	0.0051 (0.0045, 0.2594)	-0.00109 (0.00282, 0.699)
DRWY1	-0.0120 (0.0714, 0.8671)	-0.0008 (0.0541, 0.988)
K²	0.494	0.526

¹ Vogt and Bared, 1998, (p. 116)

² K: Overdispersion value

Validation statistics are shown in table 9 for the additional years of injury accident data. Since the original injury accident counts were not obtained these measures are not provided for the original years. The Pearson product-moment correlation coefficient was lower (0.553) than that for total accidents (0.614 for the recalibrated model) and the MAD per year (0.106) was about one half of that for total accidents.

The model was also recalibrated with the Washington data. The parameter estimates, their standard errors, and p-values are provided in table 10, which reveals differences in the parameter estimates of the variables between the two States.

Table 9. Validation Statistics for Type I Injury Accident Model Using Additional Years of Data

Measure	Additional Years 1990-98
Number of sites	367
Pearson product-moment correlation coefficients	0.553
MPB	-0.219
MPB/yr	-0.024
MAD	0.955
MAD/yr	0.106
MSPE	2.530
MSPE/yr²	0.031

Table 10. Parameter Estimates for Type I Injury Accident Model Using Washington Data

Variable	Original Estimates (s.e., p-value) ¹	Washington Data Estimates (s.e., p-value)
Constant	-13.0374 (1.7908, 0.0001)	-14.64 (2.49, <0.001)
Log of AADT1	0.8122 (0.0973, 0.0001)	1.012 (0.222, <0.001)
Log of AADT2	0.4551 (0.0977, 0.0001)	0.5237 (0.0968, <0.001)
HI1	0.0335 (0.0327, 0.3047)	0.0055 (0.0404, 0.892)
VCI1	0.1869 (0.3657, 0.6092)	0.578 (0.526, 0.272)
SPD1	0.0156 (0.0269, 0.5618)	0.0118 (0.0227, 0.602)
HAZRAT1	0.2065 (0.0930, 0.0263)	0.071 (0.119, 0.552)
RT MAJ	0.3620 (0.1814, 0.0460)	0.190 (0.305, 0.534)
HAU	0.0051 (0.0045, 0.2594)	0.0103 (0.0163, 0.526)
DRWY1	-0.0120 (0.0714, 0.8671)	-0.0111 (0.0695, 0.873)
K²	0.494	0.513

¹ Vogt and Bared, 1998, (p. 116)

² K: Overdispersion value

The parameters were all estimated with the same sign but in several cases the difference in magnitude was large. The overdispersion parameter, K, was estimated with a similar magnitude.

Validation measures for the Washington data are shown in table 11. The Pearson product-moment correlation coefficient was lower (0.505) than that for total (0.579) accidents, and the MADs per year are about one half that for total accidents.

Table 11. Validation Statistics for Type I Injury Accident Model Using Washington Data

Measure	Washington Data 1993-96
Number of sites	181
Pearson product-moment correlation coefficients	0.505
MPB	-0.084
MPB/yr	-0.021
MAD	0.712
MAD/yr	0.178
MSPE	1.343
MSPE/yr²	0.08

2.4.2 Model II

The summary statistics shown in table 12 indicate that there is little difference in the mean accident frequencies between original (1985-89) and additional (1990-98) years for the Minnesota sites, although there is more variation in the latter data. The Washington sites exhibit a large increase in the mean accident frequency and a larger variability between sites.

Total Accident Model

The model was recalibrated with the additional years of accident data for the Minnesota sites. Recall that the intersection related variables did not exactly match those statistics given in the report. The parameter estimates, their standard errors, and p-values are provided in table 13, which reveals differences in the parameter estimates of the variables between the two time periods.

Table 12. Accident Summary Statistics for Type II Sites

Dataset	No. of Sites	Mean	Median	Std. Deviation	Maximum
Minnesota Total (85-89)¹	327	1.51 (0.30/year)	1.00	2.36	16
Minnesota Injury (85-89)¹	327	0.77 (0.15/year)	0.00	N/A²	9
Minnesota Total (90-98)	315	2.83 (0.31/year)	2.00	5.17	67
Minnesota Injury (90-98)	315	1.50 (0.17/year)	1.00	2.95	36
Washington Total (93-96)	90	3.97 (0.99/year)	1.00	5.07	20
Washington Injury (93-96)	90	2.46 (0.62/year)	1.00	3.38	14

¹ Vogt and Bared, 1998, (p. 64)

² N/A: not available

The parameters were estimated with the same sign as the original model and in several cases with similar magnitude and significance. VCI1 and HAU were estimated with a large difference in magnitude. The overdispersion parameter, K, was almost twice as large as the original model.

Table 13. Parameter Estimates for Type II Total Accident Model Using Additional Years of Data

Variable	Original Estimates (s.e., p-value) ¹	Additional Years Estimates (s.e., p-value)
Constant	-10.4260 (1.3167, 0.0001)	-10.74 (1.08, <0.001)
Log of AADT1	0.6026 (0.0836, 0.0001)	0.6673 (0.0768, <0.001)
Log of AADT2	0.6091 (0.0694, 0.0001)	0.6135 (0.0622, <0.001)
HI1	0.0449 (0.0473, 0.3431)	0.0702 (0.0456, 0.123)
VCI1	0.2885 (0.2576, 0.2628)	0.066 (0.199, 0.741)
SPD1	0.0187 (0.0176, 0.2875)	0.0130 (0.0159, 0.415)
DRWY1	0.1235 (0.0519, 0.0173)	0.0988, (0.0417, 0.018)
HAU	-0.0049 (0.0033, 0.1341)	-0.00006 (0.00141, 0.967)
K²	0.205	0.385

¹ Vogt and Bared, 1998, (p. 115)

² K: Overdispersion value

A comparison of validation measures for the original data additional years is shown in table 14.

The Pearson product-moment correlation coefficient was higher for the original data than for the original model applied to the additional years. The MADs per year are similar. For the additional years model, the MSPE per year squared is higher than the MSE per year squared, indicating that the model is not performing as well on the additional years of data.

Table 14. Validation Statistics for Type II Total Accident Model Using Additional Years of Data

Measure	Original Data 1985-89	Additional Years 1990-98
Number of sites	327	315
Pearson product-moment correlation coefficients	0.760	0.668
MPB	-0.017	-0.392
MPB/yr	-0.003	-0.043
MAD	1.034	2.060
MAD/yr	0.207	0.229
MSE	2.364	N/A¹
MSE/yr²	0.095	N/A¹
MSPE	N/A¹	15.000
MSPE/yr²	N/A¹	0.185

¹ N/A: not available

The model was also recalibrated with the Washington data. The parameter estimates, their standard errors, and p-values are provided in table 15, which reveals differences in the parameter estimates of the variables between the two States.

HI1, VCI1, and HAU were estimated with opposite signs, which might be expected on the basis of the large standard errors both in the original model and the model estimated using Washington data. Again, this would seem to indicate that variables with large standard errors in the parameter estimates should not be included in the model even if engineering common sense suggests they should. The other parameters were estimated with the same sign but with generally large differences in magnitude. The overdispersion parameter, K, was estimated to be over three times as large as that for the original model.

Table 15. Parameter Estimates for Type II Total Accident Model Using Washington Data

Variable	Original Estimates (s.e., p-value)¹	Washington Data Estimates (s.e., p-value)
Constant	-10.4260 (1.3167, 0.0001)	-8.64 (2.06, <0.001)
Log of AADT1	0.6026 (0.0836, 0.0001)	0.3680 (0.2030, 0.071)
Log of AADT2	0.6091 (0.0694, 0.0001)	0.7340 (0.1080, <0.001)
HI1	0.0449 (0.0473, 0.3431)	-0.1646 (0.0766, 0.032)
VCI1	0.2885 (0.2576, 0.2628)	-0.0140 (0.2230, 0.950)
SPD1	0.0187 (0.0176, 0.2875)	0.0139 (0.0185, 0.454)
ND	0.1235 (0.0519, 0.0173)	0.0118 (0.0815, 0.884)
HAU	-0.0049 (0.0033, 0.1341)	0.0251 (0.0129, 0.051)
K²	0.205	0.667

¹ Vogt and Bared, 1998, (p. 115)

² K: Overdispersion value

A comparison of validation measures for the original data and the original model applied to the Washington data is shown in table 16. The Pearson product-moment correlation coefficient was higher for the original data as compared to the Washington data. The MAD per year is much higher for the Washington data, indicating that the model is not performing well on these data.

Table 16. Validation Statistics for Type II Total Accident Model Using Washington Data

Measure	Original Data 1985-89	Washington Data 1993-96
Number of sites	327	90
Pearson product-moment correlation coefficients	0.760	0.517
MPB	-0.017	-0.42
MPB/yr	-0.003	-0.11
MAD	1.034	3.19
MAD/yr	0.207	0.80
MSE	2.364	N/A¹
MSE/year²	0.095	N/A¹
MSPE	N/A¹	20.15
MSPE/yr²	N/A¹	1.26

¹ N/A: not available

Injury Accident Model

For the injury accident model, the parameter estimates, their standard errors, and p-values are provided in table 17. HAU was estimated with the opposite sign to the original model and with a large standard error. The other variables were estimated with the same sign as the original model, but in some cases with large differences in magnitude. The overdispersion parameter, K, was estimated to be more than twice that of the original model.

Table 17. Parameter Estimates for Type II Injury Accident Model Using Additional Years of Data

Variable	Original Estimates (s.e., p-value)¹	Additional Years Estimates (s.e., p-value)
Constant	-10.7829 (1.7656, 0.0001)	-12.19 (1.39, <0.001)
Log of AADT1	0.6339 (0.1055, 0.0001)	0.6497 (0.0948, <0.001)
Log of AADT2	0.6229 (0.0870, 0.0001)	0.6727 (0.0815, <0.001)
HI1	0.0729 (0.0635, 0.2513)	0.0935 (0.0549, 0.089)
VCI1	0.2789 (0.4623, 0.5464)	0.109 (0.240, 0.650)
SPD1	0.0112 (0.0251, 0.6567)	0.0287 (0.0204, 0.159)
HAZRAT1	-0.1225 (0.0720, 0.0889)	-0.1313 (0.0749, 0.080)
RT MAJ	0.0451 (0.1665, 0.7865)	0.014 (0.148, 0.924)
HAU	-0.0043 (0.0044, 0.3258)	0.00105 (0.00176, 0.552)
ND	0.0857 (0.0639, 0.1799)	0.0791 (0.0513, 0.123)
K²	0.1811	0.435

¹ Vogt and Bared, 1998, (p. 117)

² K: Overdispersion value

Validation measures for the additional years of data are provided in table 18. Because the original injury accident counts were not obtained these measures are not provided for the original years. The Pearson product-moment correlation coefficient was lower (0.671) than that for total accidents (0.668), and the MAD per year is about twice that for total accidents.

Table 18. Validation Statistics for Type II Injury Accident Model Using Additional Years of Data

Measure	Additional Years 1990-98
Number of sites	315
Pearson product-moment correlation coefficients	0.641
MPB	-0.152
MPB/yr	-0.170
MAD	1.230
MAD/yr	0.137
MSPE	5.130
MSPE/yr²	0.063

The model was also recalibrated with the Washington data. The parameter estimates, their standard errors, and p-values are provided in table 19, which reveals differences in the parameter estimates of the variables between the two States.

HI1, RT MAJ, and HAU were estimated with opposite signs and large differences in magnitude compared to the original estimates. The other variables were estimated with the same sign but with generally large differences in magnitude. The overdispersion parameter, K, was estimated to be more than three times that of the original model.

Table 20 shows the validation statistics for the original injury accident model applied to the Washington data. The Pearson product-moment correlation coefficient was lower (0.482) than that for total accidents (0.517) and the MADs per year are about sixty percent of that for total accidents.

Table 19. Parameter Estimates for Type II Injury Accident Model Using Washington Data

Variable	Original Estimates (s.e., p-value)¹	Washington Data Estimates (s.e., p-value)
Constant	-10.7829 (1.7656, 0.0001)	-11.62 (2.39, <0.001)
Log of AADT1	0.6339 (0.1055, 0.0001)	0.504 (0.223, 0.024)
Log of AADT2	0.6229 (0.0870, 0.0001)	0.845 (0.135, <0.001)
HI1	0.0729 (0.0635, 0.2513)	-0.1740 (0.0992, 0.080)
VCI1	0.2789 (0.4623, 0.5464)	0.025 (0.223, 0.911)
SPD1	0.0112 (0.0251, 0.6567)	0.0463 (0.0207, 0.025)
HAZRAT1	-0.1225 (0.0720, 0.0889)	-0.299 (0.130, 0.022)
RT MAJ	0.0451 (0.1665, 0.7865)	-0.775 (0.292, 0.008)
HAU	-0.0043 (0.0044, 0.3258)	0.0220 (0.0136, 0.105)
ND	0.0857 (0.0639, 0.1799)	0.0589 (0.0845, 0.486)
K²	0.181	0.556

¹ Vogt and Bared, 1998, (p. 117)

² K: Overdispersion value

Table 20. Validation Statistics for Type II Injury Accident Model Using Washington Data

Measure	Washington Data 1993-96
Number of sites	90
Pearson product-moment correlation coefficient	0.482
MPB	0.345
MPB/yr	0.086
MAD	2.071
MAD/yr	0.518
MSPE	8.86
MSPE/yr²	0.55

2.4.3 Model III

The summary statistics shown in table 21 indicate that there are differences in the accident frequencies between the original (1993-95) and additional (1996-97) years. Note that the statistics compare 3 years of accident data in the original data to 2 years of accident data for validation using the additional years of data. For example, the means per year of TOTACC and TOTACCI for 1993-95 are 1.29 and 0.87, respectively. The means per year of TOTACC and TOTACCI for 1996-97 are 0.75 and 0.62, respectively. Note that the 1993-95 data for the original INJACC and INJACCI models could not be obtained.

Table 21. Accident Summary Statistics of Type III Sites

Dataset	Mean	Median	Std. Deviation	Maximum
TOTACC (93-95)¹	3.88 (1.29/year)	2	4.33	19
TOTACC (96-97)	1.50 (0.75/year)	1	2.42	12
TOTACCI (93-95)¹	2.62 (0.87/year)	1	3.36	13
TOTACCI (96-97)	1.23 (0.62/year)	0	2.09	10
INJACC (96-97)	0.55 (0.28/year)	0	1.19	7
INJACCI (96-97)	0.46 (0.23/year)	0	1.02	7

¹ Vogt, 1999, (p. 53)

Previous | Table of Contents | Next

Page Owner: Office of Research, Development, and Technology, Office of Safety, RDT

Topics: research, safety, intersection safety
Keywords: research, safety, Accident modification factors, Traffic safety, Signalized intersections, Crash models, Crash model validation, Interactive highway safety design model
TRT Terms: Traffic accidents–United States–Forecasting, Roads–United States–Interchanges and intersections–Mathematical models, Rural roads–United States, Low-volume roads–United States, signalized intersections
Scheduled Update: Archive - No Update needed

This page last modified on 03/08/2016