U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
202-366-4000


Skip to content
Facebook iconYouTube iconTwitter iconFlickr iconLinkedInInstagram

Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations

Report
This report is an archived publication and may contain dated technical, contact, and link information
Publication Number: FHWA-RD-02-089
Date: July 2002

Safety Effectiveness of Intersection Left- and Right-Turn Lanes

PDF Version (1.48 MB)

PDF files can be viewed with the Acrobat® Reader®

APPENDIX B. NEGATIVE BINOMIAL REGRESSION MODELS

In each of the three approaches to before-after evaluation discussed in Section 5, an adjustment for differences in traffic volumes was made. In the YC approach, a simple proportional traffic volume adjustment was used. In the CG and EB approaches, an adjustment based on a regression relationship between accident frequencies and traffic volumes was used. This appendix discusses the development of these regression relationships through negative binomial modeling of accident frequencies as a function of traffic volumes and other variables. The application of these models has been illustrated in Figures 5 and 6 in the main text of this report.

Statistical Approach

Accident counts at a given intersection are inherently discrete, positive numbers, and often small, as in the case of fatal and injury accidents. Furthermore, the distribution of accidents is often skewed in that most sites experience few accidents while a small number of sites experience relatively many more accidents. The Poisson distribution is generally thought of when dealing with rare discrete events such as accidents. The Poisson distribution has only one parameter, namely its mean. The variance of a Poisson distribution is, by definition, equal to its mean. This relationship between the mean and the variance (dispersion) is often violated for accident counts due to inherent overdispersion in the data (i.e., the variance of accident counts typically exceeds the mean). A flexible distribution that can be used to effectively model overdispersed count data is the negative binomial (NB) distribution. This distribution has two parameters, the mean and a dispersion parameter. When the dispersion parameter nears zero, the NB distribution approaches the Poisson distribution.

The relationship between the expected number of accidents, Yi, occurring at intersection i with a set of q intersection parameters, Xi1, Xi2, ..., Xiq, is

Figure B-1: Formula. [Name of formula.]A function with argument Y sub I equals the sum of beta sub 0 plus beta sub 1 times Xsub I1 plus beta sub 2 times X sub I2 plus a series of similar terms following the same pattern up to beta sub Q times X sub IQ(B-1)

where b0, b1 ..., bq are the regression coefficients and with the assumption that the number of accidents, Yi, follows a negative binomial distribution with parameters a and d (with 0 £ a £ 1 and d ³ 0). That is, the probability that an intersection defined by a known set of predictor variables, Xi1, Xi2, ..., Xiq, experiences Yi = yi accidents can be expressed as:

Figure B-2: Formula. [Name of formula.] The probability that Y sub I is equal to Y sub I given the values of alpha and D equals the quantity Y sub I plus D minus 1 factorial times alpha raised to the power of Y sub I divided by Y sub I factorial times the quantity D minus 1 factorial times the quantity 1 plus alpha raised to the power of the quantity Y sub I plus K, where Y sub I is equal to the series 0, 1, 2 , etc.(B-2)

where yi! denotes the factorial of yi.

The mean and variance of the negative binomial distribution of accident counts can then be expressed in terms of the parameters a and d as follows:

Figure B-3: Formula. [Name of formula.] Mean equals the expected value of Y, which in turn equals mu sub I, which in turn equals D times alpha(B-3)

Figure B-4: Formula. [Name of formula.] Variance equals the variance of Y, which in turn equals the quantity D times alpha plus the quantity D times alpha squared, which in turn equals mu sub I plus the quantity mu sub I squared divided by D(B-4)

The term mi can be referred to as the Poisson variance function and mi2/d as the extra component arising from combining the Poisson distribution with a gamma distribution for the mean to obtain the negative binomial distribution. The overdispersion parameter d is not known a priori, but can be estimated so that the mean deviance becomes unity or the Pearson chi-square statistic equals its expectation (i.e., equals its degrees of freedom).(74)

The model regression coefficients, b0, b1 ..., bq, are estimated by the method of maximum likelihood. The asymptotic normality of maximum likelihood estimates is used to obtain tests of significance of the parameters and goodness of fit measures for the models.

The parameters a and k of the negative binomial distribution can be indirectly estimated using a generalized linear model to obtain the model regression coefficients b0, b1 ..., bq. The commercially available software SAS provides a procedure, PROC GENMOD (a generalized linear model procedure), that can be used to estimate the regression coefficients.(75)

To assess the goodness of fit of a model, a number of statistics are available:

Model Statistic Explanation
Deviance/(n - p) The deviance of the model containing all the parameters (including the intercept) divided by its degrees of freedom, n - p. This statistic (mean deviance) provides a test for overdispersion and a measure of fit of the model. Asymptotically, this value tends toward 1.(74)
Pearson chi-square/(n - p) The Pearson chi-square statistic divided by its degrees of freedom, n - p. This statistic provides another measure of fit of the model.(74)
R2 A goodness-of-fit parameter based on the ordinary multiple correlation coefficient.
R2FT A goodness-of-fit parameter based on the Freeman-Tukey variance stabilizing transformation of variables discussed in Fridstrøm et al.(76)
R2k A goodness-of-fit parameter proposed by Miaou,(77) a function of the overdispersion parameter of the regression model and that of a means only model. [This measure has not been estimated in this study, but is being considered for inclusion in the final report.]

Selection of Independent and Dependent Variables in the Regression Model

Using the reference group data, yearly accident counts were modeled as a function of three independent or explanatory variables, as appropriate:

  • Major-road traffic volume in vehicles per day.
  • Minor-road traffic volume in vehicles per day.
  • State.

Based on experience in previous intersection modeling by Bauer and Harwood (20) and preliminary modeling in this study, the regression models included separate terms for major- and minor-road traffic volumes rather than a combined term for the total traffic volume entering the intersection. In all of the NB models developed in this study, the natural logarithm of the major-road and minor-road traffic volumes was used. Thus, in the NB model described in Equation (B-1), X1 and X2 generally represent log(MajADT) and log(MinADT), respectively.

A state factor was included because the multistate database assembled for the study exhibited large state-to-state variations which needed to be accounted for in the CG and EB approaches to insure that these state effects were not mistaken for treatment effects. In most cases, the negative binomial modeling was limited to intersections in the comparison and reference groups that had no existing turn lanes. For modeling of urban signalized intersections, a fourth independent variable, the number of existing left-turn lanes was added because there were not enough of such intersections without turn lanes in the comparison and reference groups for modeling.

In summary, the multiplicative model relating the expected accident counts and the selected independent variables can be rewritten as:

#########(B-5)

where b3 and b4, the coefficients for the categorical variables—state and existing left-turn lanes—vary with the levels of the variables.

As discussed in section 5, a number of dependent variables (safety measures) were considered for modeling, including:

  • Total intersection accidents.
  • Fatal and injury intersection accidents.
  • Project-related intersection accidents.
  • Fatal and injury project-related intersection accidents.
  • Total accidents for individual intersection approaches.
  • Fatal and injury accidents for individual intersection accidents.
  • Project-related accidents for individual intersection approaches.
  • Fatal and injury project-related accidents for individual intersection approaches.

Selection of Intersection Types

Regression relationships were developed for as many combinations of the following intersection characteristics as possible using the comparison and reference site data:

  • Area type (urban/rural).
  • Type of traffic control (signalized/unsignalized).
  • Number of intersection legs (three or four).
  • Number of lanes on major road (two-lane/multilane).

Negative Binomial Repression Results

The coefficients b0, b1, b2, b3, and b4 of the negative binomial regression in Equation (B-5) and the dispersion parameter, k, were estimated by maximum likelihood using PROC GENMOD of SAS. In all cases, a 10 percent significance level was chosen. Of the 300 available sites in the reference group, models were developed for a total of 252 sites, grouped as follows:

  • Rural, unsignalized, three- and four-leg, two- and multilane sites (only sites without existing left- or right-turn lanes were included)—N=120.
  • Urban, signalized, three- and four-leg, two- and multilane sites (including sites with up to four existing left-turn lanes; no consideration was given to the number of existing right-turn lanes)——N=86.
  • Urban, unsignalized, three- and four-leg, two- and multilane sites (only sites without existing left- or right-turn lanes were included)—N=46.

The eight types of safety measures discussed above were then considered as dependent variables in the NB modeling, resulting in 96 models to be estimated (8 safety measures x 12 combinations of types of sites).

In each case, a variation of the model shown in Equation (B-5) was investigated, including either all possible independent variables, or excluding selected ones, to assess which model best fits the data. The following four models (Model Types 1 through 4) were investigated:

  • Model Type 1: Major-road traffic volume, minor-road traffic volume, and state—all intersection types.
  • Model Type 2: Major-road traffic volume and minor-road traffic volume—all intersection types.
  • Model Type 3: Major-road traffic volume, minor-road traffic volume, state, and number of existing left-turn lanes—urban, signalized intersections only.
  • Model Type 4: Major-road traffic volume, minor-road traffic volume, and number of existing left-turn lanes—urban, signalized intersections only.

In summary, an attempt was made to estimate the regression coefficients and dispersion parameter of a total of 256 models (4 variations on 96 models), and of these 256, select the best model, if one was available, for each of the 96 cases.

An investigation of the number of accidents in each of the eight safety measure categories found that the number of some types of accidents in a group of intersections defined by area type, traffic control, number of lanes, and number of legs was too small (less than 10 over the entire study period) to warrant modeling. This was true in the following situations:

  • Fatal and injury project-related intersection accidents at all types of sites (12 cases).
  • Fatal and injury project-related accidents for individual intersection approaches at all types of sites (12 cases).
  • Project-related intersection accidents at rural, unsignalized, three-leg, two-and multilane intersections (two cases).
  • Project-related intersection accidents at urban, signalized, three-leg, multilane intersections (one case).
  • Project-related intersection accidents at urban, unsignalized, three-leg, two-and multilane intersections (two cases).
  • Project-related accidents for individual intersection approaches at rural, unsignalized, three-leg, two-and multilane intersections (two cases).
  • Project-related accidents for individual intersection approaches at urban, signalized, three-leg, multilane intersections (one case).
  • Project-related accidents for individual intersection approaches at urban, signalized, three-leg, two-and multilane intersections (two cases).

Thus no modeling was attempted in any of these 34 cases. This left a total of 62 (96-34) models to estimate.

The significance of the model as a whole and of the regression coefficients in particular, the magnitude and signs of the measures of fit discussed above, whether the maximum likelihood algorithm to estimate the regression coefficients converged, and whether the coefficients made engineering sense, were all part of the decision process in choosing a model in a particular case. Using these criteria, models for the final 64 cases were selected. The models were rated as follows:

  • A statistically significant model could be estimated, satisfying engineering criteria such as the coefficients of the two traffic volumes were positive and 1 or below.
  • The model developed had all the proper attributed, e.g., the coefficients of the two traffic volumes were positive and 1 or below, but was not statistically significant. In that case, the model was considered to provide the best available estimate of accident counts and was therefore selected. Generally, the two measures of model fit, R2 and RFT2, are also low in these cases.
  • No model could be estimated.

The negative binomial regression results are shown in tables B-1 through B-6 for six of the eight types of safety measures. No tables are shown for fatal and injury project-related accidents because there were no statistically significant models for these safety measures. Each table includes the following statistics:

  • Intersection type.
  • Model type (Model Types 0 through 4, where Model Types 1 through 4 were defined above and Model Type 0 denotes that no model was available through regression analysis).
  • The number of site-years or approach-years, depending on the type of safety measure.
  • The regression coefficients, b0, b1, b2, representing the intercept and the exponents of major-road and minor-road traffic volumes, respectively.
  • The state coefficients, b3.
  • The coefficients for the number of existing left-turn lanes, b4 (applicable for Model Types 3 and 4 only).
  • The negative binomial dispersion parameter (d).
  • The two measures of model fit, R2 and RFT2.
Table B-1. Negative Binomial Regression Models for Total Intersection Accidents.
Area type Traffic control type No. of intersection legs No. of lanes on major road (two-lane or multi-lane) Model type No. of site-years Intercept Traffic volume coefficients State coefficients Existing left-turn lane coefficient (0 lanes) Dispersion parameter (d) R2% RFT2
LogMajADT LogMinADT IA IL LA MN NC NE OR VA
R U 3 M 2,2 63 -6.523 0.078 0.864 0 0 0 0 0 0 0 0 0 0.079 18.26 12.74
R U 3 T 1,1 579 -12.153 1.000 0.633 -2.232 -1.145 0 0 -0.406 0 -1.242 0 0 0.506 32.32 28.37
R U 4 M 2,1 80 -12.493 0.797 0.868 0 0 0 0 0 0 0 0 0 0.197 57.55 50.32
R U 4 M 2,1 662 -8.136 0.298 0.856 0 0 0 0 0 0 0 0 0 0.354 34.53 32.78
U S 3 M 0,3 34                              
U S 3 T 0,3 47                              
U S 4 M 3,1 747 -6.749 0.692 0.178 0.921 0.772 1.552 0.905 -0.788 0.444 -0.098 0 -0.123 0.371 32.65 31.10
U S 4 T 1,1 177 -12.231 0.835 0.811 0 -1.030 -0.908 0 -1.718 0 0 0 0 0.220 35.27 37.42
U U 3 M 0,3 25                              
U U 3 T 1,1 195 -8.887 0.745 0.293 0.815 -0.029 1.385 0 0 0 0 0 0 0.460 11.11 30.87
U U 4 M 1,2 121 -1.426 0.061 0.184 1.434 0.438 0 0.066 0 0.609 0 0 0 0.184 43.92 36.04
U U 4 T 1,1 200 -7.740 0.641 0.194 1.108 1.481 0 1.269 0.587 0 0 0 0 0.408 32.22 22.90

 

Table B-2. Negative Binomial Regression Models for Fatal and Injury Intersection Accidents.
Area type Traffic control type No. of intersection legs No. of lanes on major road (two-lane or multi-lane) Model type No. of site-years Intercept Traffic volume coefficients State coefficients Existing left-turn lane coefficient (0 lanes) Dispersion parameter (d) R2% RFT2
LogMajADT LogMinADT IA IL LA MN NC NE OR VA
R U 3 M 0.3 63                              
R U 3 T 0,3 579                              
R U 4 M 2,1 80 -13.081 0.933 0.676 0 0 0 0 0 0 0 0 0 0.432 25.95 31.38
R U 4 T 2,1 662 -8.365 0.233 0.877 0 0 0 0 0 0 0 0 0 0.377 24.89 22.33
U S 3 M 0,3 34                              
U S 3 T 0,3 47                              
U S 4 M 3,1 747 -6.055 0.521 0.178 1.060 0.752 1.498 0.904 -0.299 0.414 -0.154 0 -0.219 0.295 27.48 24.39
U S 4 T 1,1 177; -8.899 0.533 0.633 0 -1.228 -0.835 0 -1.380 0 0 0 0 0.162 10.62 13.29
U U 3 M 0,3 25                              
U U 3 T 1,2 195 -8.073 0.715 0.051 1.087 0.013 1.426 0 0 0 0 0 0 0.060 32.49 33.34
U U 4 M 0,3 121                              
U U 4 T 1.1 200 -10.709 0.824 0.294 0.997 1.113 0 0.654 0.541 0 0 0 0 0.375 23.91 17.89

 

Table B-3. Negative Binomial Regression Models for Project-Related Intersection Accidents.
Area type Traffic control type No. of intersection legs No. of lanes on major road (two-lane or multi-lane) Model type No. of site-years Intercept Traffic volume coefficients State coefficients Existing left-turn lane coefficient (0 lanes) Dispersion parameter (d) R2% RFT2
LogMajADT LogMinADT IA IL LA MN NC NE OR VA
R U 3 M 0,0 63                              
R U 3 T 0.0 579                              
R U 4 M 2,2 80 -10.732 0.652 0.539 0 0 0 0 0 0 0 0 0 2.010 3.68 3.88
R U 4 T 2,1 662 -11.201 0.648 0.462 0 0 0 0 0 0 0 0 0 0.263 3.07 1.66
U S 3 M 0,0 34                              
U S 3 T 0,3 47                              
U S 4 M 0,3 747                              
U S 4 T 0,3 177                              
U U 3 M 0,0 25                              
U U 3 T 0,0 195                              
U U 4 M 2,2 121 -11.185 0.736 0.434 0 0 0 0 0 0 0 0 0 1.280 5.80 3.63
U U 4 T 0,3 200                              

 

Table B-4. Negative Binomial Regression Models for Total Accidents on Individual Intersection Approaches.
Area type Traffic control type No. of intersection legs No. of lanes on major road (two-lane or multi-lane) Model type No. of approach years Intercept Traffic volume coefficients State coefficients Existing left-turn lane coefficient (0 lanes) Dispersion parameter (d) R2% RFT2
LogMajADT LogMinADT IA IL LA MN NC NE OR VA
R U 3 M 0,3 126                              
R U 3 T 1,1 1,158 -11.966 0.974 0.519 -23.164 -1.250 0 0 1 0 -1.493 0 0 1.049 13.59 10.59
R U 4 M 2,1 160 -13.413 0.818 0.863 0 0 0 0 0 0 0 0 0 0.142 43.52 40.55
R U 4 T 2,1 1,324 -9.347 0.297 0.918 0 0 0 0 0 0 0 0 0 0.274 25.89 23.61
U S 3 M 4,2 102 -11.606 0.880 0.409 0 0 0 0 0 0 0 0 0.343 0.348 21.12 20.20
U S 3 T 2,1 141 -14.419 0.642 0.905 0 0 0 0 0 0 0 0 0 0.462 9.42 6.39
U S 4 M 1,1 2,976 -7.620 0.740 0.107 0.715 0.576 1.456 0.655 -0.706 0.105 -0.209 0 0 0.479 29.38 32.89
U S 4 T 2,1 708 -9.908 0.974 0.156 0 0 0 0 0 0 0 0 0 0.373 21.62 21.87
U U 3 M 0,3 50                              
U U 3 T 1,1 390 -8.638 0.722 0.137 0.930 0.221 1.400 0 0 0 0 0 0 0.647 12.76 16.00
U U 4 M 1,2 242 -5.515 0.421 0.124 1.909 0.655 0 0.433 0 0.948 0 0 0 0.222 31.90 28.03
U U 4 T 1,1 400 -7.885 0.589 0.187 1.099 1.357 0 1.048 0.382 0 0 0 0 0.509 18.76 13.19

 

Table B-5. Negative Binomial Regression Models for Total Fatal and Injury Accidents on Individual Intersection Approaches.
Area type Traffic control type No. of intersection legs No. of lanes on major road (two-lane or multi-lane) Model type No. of approach years Intercept Traffic volume coefficients State coefficients Existing left-turn lane coefficient (0 lanes) Dispersion parameter (d) R2% RFT2
LogMajADT LogMinADT IA IL LA MN NC NE OR VA
R U 3 M 0,3 126                              
R U 3 T 0,3 1,158                              
R U 4 M 1,2 160 -9.994 0.695 0.348 0 0.367 0 0 0 0.958 0 0 0 0.235 24.9 29.21
R U 4 T 2,1 1,324 -9.935 0.281 0.932 0 0 0 0 0 0 0 0 0 0.381 16.04 13.90
U S 3 M 4,2 102 -11.609 0.796 0.401 0 0 0 0 0 0 0 0 0 0.318 5.96 4.97
U S 3 T 2,2 141 -13.457 0.980 0.313 0 0 0 0 0 0 0 0 0 0.383 2.33 0.68
U S 4 M 1,1 2,976 -9.073 0.798 0.080 0.996 0.685 1.554 0.811 0.092 0.329 0.151 0 0 0.459 21.94 22.10
U S 4 T 0,3 708                              
U U 3 M 0,3 50                              
U U 3 T 0,3 390                              
U U 4 M 1,2 242 -2.985 0.074 0.141 1.842 0.312 0 0.507 0 0.874 0 0 0 0.215 29.24 26.58
U U 4 T 2,1 400 -11.081 0.826 0.373 0 0 0 0 0 0 0 0 0 0.538 11.49 8.25

 

Table B-6. Negative Binomial Regression Models for Project-Related Accidents on Individual Intersection Approaches.
Area type Traffic control type No. of intersection legs No. of lanes on major road (two-lane or multi-lane) Model type No. of approach years Intercept Traffic volume coefficients State coefficients Existing left-turn lane coefficient (0 lanes) Dispersion parameter (d) R2% RFT2
LogMajADT LogMinADT IA IL LA MN NC NE OR VA
R U 3 M 0,0 126                              
R U 3 T 0,0 1,158                              
R U 4 M 2,2 160 -12.004 0.745 0.492 0 0 0 0 0 0 0 0 0 2.816 2.73 1.03
R U 4 T 2,1 1,324 -12.162 0.679 0.466 0 0 0 0 0 0 0 0 0 0.000 1.67 0.85
U S 3 M 0,0 102                              
U S 3 T 0,3 141                              
U S 4 M 1,1 2,976 -13.191 0.827 0.279 2.081 1.859 3.220 2.467 1.194 1.292 2.309 0 0 2.661 9.54 6.99
U S 4 T 0,3 708                              
U U 3 M 0,0 50                              
U U 3 T 0,0 390                              
U U 4 M 2,2 242 -11.106 0.659 0.434 0 0 0 0 0 0 0 0 0 1.429 3.21 1.45
U U 4 T 0,3 400                              

Overall Assessment of the Final Models

The combination of types of sites and safety measures required a total of 96 models to be estimated. Of these 96 models, 26 models (27 percent) could be estimated with fully satisfactory results and 13 models (14 percent) could be developed, but were not statistically significant. These latter models were used, despite the lack of statistical significance, because they represented the best available model. No models could be estimated in 23 cases (24 percent). In 34 cases (35 percent), models could not be developed because of sparse accident data over the entire study period. The R2 and RFT2 values range from 1.7 to 57.6 percent and from 0.9 to 50.3 percent, respectively, for the 26 statistically significant models. The R2 and RFT2 values range from 2.33 to 43.9 percent and from 0.68 to 36.0 percent, respectively, for the 13 models that were not statistically significant but were still used.

The types of model used in adjusting accidents frequencies for traffic volumes, state effect and, where applicable, the effect of existing left-turn lanes in the CG and EB approaches can be summarized as follows:

  • Type 1—17 models, including major-road traffic volume, minor-road traffic volume, and state (18 percent).
  • Type 2—18 models, including major-road traffic volume and minor-road traffic volume (19 percent).
  • Type 3—two models, including major-road traffic volume, minor-road traffic volume, state, and number of existing left-turn lane (urban, signalized intersections only) (2 percent).
  • Type 4—two models, including major-road traffic volume, minor-road traffic volume, and number of existing left-turn lane (urban, signalized intersections only) (2 percent).
  • Type 0—57 models; in these cases, no usable model was available (59 percent).

Models like those in tables B-1 through B-6 are intended for predicting annual accident frequencies. Caution should be exercised in interpreting the individual coefficients in the model as representing the effect on an individual factor on safety. However, it is interesting to note that the coefficient of EXLEFT for urban four-leg signalized intersections on multilane highways shown in table B-1, when evaluated with the average of the eight state effects shown in the table, represents an accident reduction effectiveness of 12 percent for installation of a left-turn lane on one intersection approach. This is in good agreement with the 10 percent effectiveness for this project type determined with the EB approach, as shown in table 47.

Use of the Negative Binomial Regression Models in the CG and EB Evaluation Approaches

The overall adjustment procedures to account for traffic volumes changes in the CG and EB evaluation approaches are discussed separately in section 5. To use any of the regression equations shown in tables B-1 through B-6, proceed as follows: (a) select the proper table (i.e., type of safety measure) and type of site within that table; (b) use the coefficients shown for the intercept, major-road and minor-road traffic volumes; (c) select the coefficient for the appropriate state, if state is included in the model; and (d) select the coefficient for the zero existing left-turn lanes, if that parameter is included in the model (four cases only—Model Types 3 and 4). In these four cases, the number of existing left-turn lanes was set equal to zero because the models were applied to sites with no existing turn lanes.

When no usable model was available, a simple proportional adjustment for traffic volume was made in the CG approach. When no usable model was available, sites of that type were not used in the EB approach.

Previous | Table of Contents | Next

Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000
Turner-Fairbank Highway Research Center | 6300 Georgetown Pike | McLean, VA | 22101