U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations
|This report is an archived publication and may contain dated technical, contact, and link information|
Publication Number: FHWA-HRT-04-100
Date: September 2005
Safety Effects of Marked Versus Unmarked Crosswalks at Uncontrolled Locations Final Report and Recommended Guidelines
PDF Version (3.21 MB)
PDF files can be viewed with the Acrobat® Reader®
To test the final crash prediction model in the terms of validity for the available database, several types of tests were conducted. These tests included:
Below is as excerpt from the PROC GENMOD output (table 14). In assessing the goodness-of-fit of the negative binomial regression model for crosswalks, we can see that the scaled deviance and the Pearson chi-square are small indicating that the model fits the data well.
We can test for overdispersion with a likelihood ratio test based on Poisson and negative binomial distributions. This test tests equality of the mean and the variance imposed by the Poisson distribution against the alternative that the variance exceeds the mean. For the negative binomial distribution, the variance = mean + k mean2 (k> = 0, the negative binomial distribution reduces to Poisson when k = 0). The null hypothesis is: H0 : k = 0 and the alternative hypothesis is: Ha : k>0.
To test the functional form, we used the likelihood ratio test, that is, compute LR statistic, ‑2 (LL (Poisson) - LL (negative binomial)). The asymptotic distribution of the LR statistic has probability mass of one half at zero and one half - chi-square distribution with 1 df. (40) To test the null hypothesis at the significance level α, use the critical value of chi-square distribution corresponding to significance level 2α, that is reject H0 if LR statistic > X2 (1-2α, 1 df).
Table 15 is an excerpt from the PROC GENMOD output for a Poisson regression model with the same independent variables are is the final negative binomial model.
−2 (LL (Poisson) - LL (negative binomial)) =
−2* (−568.4558 − (−548.7469)) =
2* (568.4558 − 548.7469) = 39.4178
Thus, the null hypothesis is rejected for α = 0.01, and we conclude that the Poisson distribution is inadequate for this model. (40)
Because generalized estimating equations (GEE) were used, the interpretation of residuals is problematic and no residual analysis was undertaken.
Certainly multicollinearity is an issue, because the marked crosswalk and the unmarked crosswalk were matched on geographic terms, thus the number of lanes, median type, and traffic ADT are distributed very similarly in the marked and the unmarked crosswalks.
Multicollinearity was explored using the regression diagnostics suggested by Belsley, Kuh, and Welsch. (41) They suggest two different measures: variance inflation factor (VIF) and the proportion of variation. VIF gauges the influence potential near dependencies may have on the estimation of the standard error of the estimate of the regression parameters. The proportion of variation is a diagnostic which permits the detection of morel complex dependencies. For the final model with predictor variables, the values were: an indicator for marked versus unmarked, pedestrian ADT, and traffic ADT; two indicators for number of lanes; two indicators for type of median; an interaction between the indicator for marked versus unmarked and pedestrian ADT; and an interaction between indicator for marked versus unmarked and traffic ADT. The largest VIF was 4.0; this is not high (VIF < 10), however, it is more than the suggested criterion of VIF > 1.55. Thus, the VIF for indicator for marked versus unmarked VIF = 3.5, traffic ADT, VIF = 2.5, and the interaction of these two predictor variables VIF = 4.0. There is some variance inflation in this model. Since none of the VIF are greater than 10, we can conclude that the model has not been degraded by collinearity. We should interpret the results with some care, because three predictors have VIFs greater than 1.55.
The proportion of variation suggested by Belsley, Kuh, and Welsch with a condition index of 9.4 suggests a weak dependency between the three predictors: indicator for marked versus unmarked, traffic ADT, and the interaction of these two predictor variables. It is not surprising that an interaction is correlated with the main factors.
In conclusion, the model does have a weak dependency among the predictor variables. This does not inflate the variance too much; thus, reasonable tests may be conducted. The mild nature of the collinearity does not present a threat to the interpretability of the model. (41)