U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations
|This report is an archived publication and may contain dated technical, contact, and link information|
Publication Number: FHWA-RD-02-089
Date: July 2002
Safety Effectiveness of Intersection Left- and Right-Turn Lanes
PDF Version (1.48 MB)
PDF files can be viewed with the Acrobat® Reader®
APPENDIX B. NEGATIVE BINOMIAL REGRESSION MODELS
In each of the three approaches to before-after evaluation discussed in Section 5, an adjustment for differences in traffic volumes was made. In the YC approach, a simple proportional traffic volume adjustment was used. In the CG and EB approaches, an adjustment based on a regression relationship between accident frequencies and traffic volumes was used. This appendix discusses the development of these regression relationships through negative binomial modeling of accident frequencies as a function of traffic volumes and other variables. The application of these models has been illustrated in Figures 5 and 6 in the main text of this report.
Accident counts at a given intersection are inherently discrete, positive numbers, and often small, as in the case of fatal and injury accidents. Furthermore, the distribution of accidents is often skewed in that most sites experience few accidents while a small number of sites experience relatively many more accidents. The Poisson distribution is generally thought of when dealing with rare discrete events such as accidents. The Poisson distribution has only one parameter, namely its mean. The variance of a Poisson distribution is, by definition, equal to its mean. This relationship between the mean and the variance (dispersion) is often violated for accident counts due to inherent overdispersion in the data (i.e., the variance of accident counts typically exceeds the mean). A flexible distribution that can be used to effectively model overdispersed count data is the negative binomial (NB) distribution. This distribution has two parameters, the mean and a dispersion parameter. When the dispersion parameter nears zero, the NB distribution approaches the Poisson distribution.
The relationship between the expected number of accidents, Yi, occurring at intersection i with a set of q intersection parameters, Xi1, Xi2, ..., Xiq, is
where b0, b1 ..., bq are the regression coefficients and with the assumption that the number of accidents, Yi, follows a negative binomial distribution with parameters a and d (with 0 £ a £ 1 and d ³ 0). That is, the probability that an intersection defined by a known set of predictor variables, Xi1, Xi2, ..., Xiq, experiences Yi = yi accidents can be expressed as:
where yi! denotes the factorial of yi.
The mean and variance of the negative binomial distribution of accident counts can then be expressed in terms of the parameters a and d as follows:
The term mi can be referred to as the Poisson variance function and mi2/d as the extra component arising from combining the Poisson distribution with a gamma distribution for the mean to obtain the negative binomial distribution. The overdispersion parameter d is not known a priori, but can be estimated so that the mean deviance becomes unity or the Pearson chi-square statistic equals its expectation (i.e., equals its degrees of freedom).(74)
The model regression coefficients, b0, b1 ..., bq, are estimated by the method of maximum likelihood. The asymptotic normality of maximum likelihood estimates is used to obtain tests of significance of the parameters and goodness of fit measures for the models.
The parameters a and k of the negative binomial distribution can be indirectly estimated using a generalized linear model to obtain the model regression coefficients b0, b1 ..., bq. The commercially available software SAS provides a procedure, PROC GENMOD (a generalized linear model procedure), that can be used to estimate the regression coefficients.(75)
To assess the goodness of fit of a model, a number of statistics are available:
Selection of Independent and Dependent Variables in the Regression Model
Using the reference group data, yearly accident counts were modeled as a function of three independent or explanatory variables, as appropriate:
Based on experience in previous intersection modeling by Bauer and Harwood (20) and preliminary modeling in this study, the regression models included separate terms for major- and minor-road traffic volumes rather than a combined term for the total traffic volume entering the intersection. In all of the NB models developed in this study, the natural logarithm of the major-road and minor-road traffic volumes was used. Thus, in the NB model described in Equation (B-1), X1 and X2 generally represent log(MajADT) and log(MinADT), respectively.
A state factor was included because the multistate database assembled for the study exhibited large state-to-state variations which needed to be accounted for in the CG and EB approaches to insure that these state effects were not mistaken for treatment effects. In most cases, the negative binomial modeling was limited to intersections in the comparison and reference groups that had no existing turn lanes. For modeling of urban signalized intersections, a fourth independent variable, the number of existing left-turn lanes was added because there were not enough of such intersections without turn lanes in the comparison and reference groups for modeling.
In summary, the multiplicative model relating the expected accident counts and the selected independent variables can be rewritten as:
where b3 and b4, the coefficients for the categorical variables—state and existing left-turn lanes—vary with the levels of the variables.
As discussed in section 5, a number of dependent variables (safety measures) were considered for modeling, including:
Selection of Intersection Types
Regression relationships were developed for as many combinations of the following intersection characteristics as possible using the comparison and reference site data:
Negative Binomial Repression Results
The coefficients b0, b1, b2, b3, and b4 of the negative binomial regression in Equation (B-5) and the dispersion parameter, k, were estimated by maximum likelihood using PROC GENMOD of SAS. In all cases, a 10 percent significance level was chosen. Of the 300 available sites in the reference group, models were developed for a total of 252 sites, grouped as follows:
The eight types of safety measures discussed above were then considered as dependent variables in the NB modeling, resulting in 96 models to be estimated (8 safety measures x 12 combinations of types of sites).
In each case, a variation of the model shown in Equation (B-5) was investigated, including either all possible independent variables, or excluding selected ones, to assess which model best fits the data. The following four models (Model Types 1 through 4) were investigated:
In summary, an attempt was made to estimate the regression coefficients and dispersion parameter of a total of 256 models (4 variations on 96 models), and of these 256, select the best model, if one was available, for each of the 96 cases.
An investigation of the number of accidents in each of the eight safety measure categories found that the number of some types of accidents in a group of intersections defined by area type, traffic control, number of lanes, and number of legs was too small (less than 10 over the entire study period) to warrant modeling. This was true in the following situations:
Thus no modeling was attempted in any of these 34 cases. This left a total of 62 (96-34) models to estimate.
The significance of the model as a whole and of the regression coefficients in particular, the magnitude and signs of the measures of fit discussed above, whether the maximum likelihood algorithm to estimate the regression coefficients converged, and whether the coefficients made engineering sense, were all part of the decision process in choosing a model in a particular case. Using these criteria, models for the final 64 cases were selected. The models were rated as follows:
The negative binomial regression results are shown in tables B-1 through B-6 for six of the eight types of safety measures. No tables are shown for fatal and injury project-related accidents because there were no statistically significant models for these safety measures. Each table includes the following statistics:
Overall Assessment of the Final Models
The combination of types of sites and safety measures required a total of 96 models to be estimated. Of these 96 models, 26 models (27 percent) could be estimated with fully satisfactory results and 13 models (14 percent) could be developed, but were not statistically significant. These latter models were used, despite the lack of statistical significance, because they represented the best available model. No models could be estimated in 23 cases (24 percent). In 34 cases (35 percent), models could not be developed because of sparse accident data over the entire study period. The R2 and RFT2 values range from 1.7 to 57.6 percent and from 0.9 to 50.3 percent, respectively, for the 26 statistically significant models. The R2 and RFT2 values range from 2.33 to 43.9 percent and from 0.68 to 36.0 percent, respectively, for the 13 models that were not statistically significant but were still used.
The types of model used in adjusting accidents frequencies for traffic volumes, state effect and, where applicable, the effect of existing left-turn lanes in the CG and EB approaches can be summarized as follows:
Models like those in tables B-1 through B-6 are intended for predicting annual accident frequencies. Caution should be exercised in interpreting the individual coefficients in the model as representing the effect on an individual factor on safety. However, it is interesting to note that the coefficient of EXLEFT for urban four-leg signalized intersections on multilane highways shown in table B-1, when evaluated with the average of the eight state effects shown in the table, represents an accident reduction effectiveness of 12 percent for installation of a left-turn lane on one intersection approach. This is in good agreement with the 10 percent effectiveness for this project type determined with the EB approach, as shown in table 47.
Use of the Negative Binomial Regression Models in the CG and EB Evaluation Approaches
The overall adjustment procedures to account for traffic volumes changes in the CG and EB evaluation approaches are discussed separately in section 5. To use any of the regression equations shown in tables B-1 through B-6, proceed as follows: (a) select the proper table (i.e., type of safety measure) and type of site within that table; (b) use the coefficients shown for the intercept, major-road and minor-road traffic volumes; (c) select the coefficient for the appropriate state, if state is included in the model; and (d) select the coefficient for the zero existing left-turn lanes, if that parameter is included in the model (four cases only—Model Types 3 and 4). In these four cases, the number of existing left-turn lanes was set equal to zero because the models were applied to sites with no existing turn lanes.
When no usable model was available, a simple proportional adjustment for traffic volume was made in the CG approach. When no usable model was available, sites of that type were not used in the EB approach.