U.S. Department of Transportation
1200 New Jersey Avenue, SE
Washington, DC 20590
202-366-4000

Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations

 This report is an archived publication and may contain dated technical, contact, and link information
 Federal Highway Administration > Publications > Research Publications > Safety > 98133 > Accident Models for Two-Lane Rural Roads: Segment and Intersections
Publication Number: FHWA-RD-98-133
Date: October 1998

# Accident Models for Two-Lane Rural Roads: Segment and Intersections

## 5. Modeling

Segment Models

In this section we develop models for segments. The models are of Poisson type, negative binomial type, and extended negative binomial type. We discuss the choice of variables and explain the steps that lead to the final models presented. The choice of variables to retain, and the form in which to use them, are to some extent arbitrary since not all possibilities can be examined and some are more or less equivalent. The decisions are guided by criteria of simplicity (use of variables that are easily understood), comprehensiveness (inclusion of as many types of variables as possible), and significance (coefficients that are significantly different from zero according to statistical tests in one or more models). Many models can be generated, and we present here only a selection of models that illustrate the main phenomena and/or show the significant interactions.

In general, we will exhibit a formula for the mean number of accidents on a segment as a generalized linear function of highway variables. This formula will show the estimated coefficient of each variable in the model. In addition, we show the estimated standard error of the coefficient estimate and its P-value. The P-value is the probability that the estimated coefficient would have the value shown or any value farther from zero when the true coefficient is zero. A P-value of less than 5% is usually considered ample confirmation that the true coefficient is non-zero and that the estimated coefficient is significant. Later on, for the intersection models, we will liberalize this criterion considerably.

The State Variable

The STATE variable (value 0 for Minnesota, 1 for Washington) is used on all models that combine the two States. In effect it allows the constant or intercept term in each State to be different while constraining other coefficients to be the same. Including such a variable is equivalent to acknowledging that the accident experience of two different States is likely to be different on segments with the same traffic volumes and same highway characteristics. The STATE variable represents the demographics and habits of a different population of drivers in a different region and perhaps at a different era. Law enforcement practices, driver ages, and life styles may be quite different. Although the extra degree of freedom makes it easier to develop a combined model, it is of some interest when the coefficient of the State variable is insignificant (as it is in a few of the models below).

The Exposure Variable

For the segment modeling it is natural to include both segment length (seg_lng) and ADT as explanatory variables, and to expect that the number of accidents will be roughly proportional to the product of these factors times the time in days (365 days per year times 5 years in Minnesota or 3 years in Washington). Poisson models in Minnesota (Table 15) support this rough proportionality. If total number of accidents is modeled as a function of segment length and ADT, we obtain the following:

Table 15. Minnesota Segments, Poisson Models with Exposure Variables

 Mean No. of Accidents = 5H(365/10^3)´ exp{-.3916 + 1.0150 LSEG + .9765 LADT} Estimated standard error .0448 .0278 .0344 of coefficient estimates P-value .0001 .0001 .0001 Mean No. of Accidents = EXPO´ exp{-.3934 - .0040 AVGM} Estimated standard error .0382 .0278 of coefficients estimates P-value .0001 .6474

1 mile = 1.61 km

where LSEG is the log of the segment length and LADT is the log of AVGM (ADT in 1000's of vehicles per day). The Minnesota standard errors are consistent with the conclusion that the true coefficients of LSEG and LADT are 1. The second model shows the effect of using EXPO as an offset (i.e., as a multiplier) but retaining AVGM. The Minnesota data do not support the retention of AVGM.

Similar tables for Washington State and the combined data sets (Tables 16 and 17) indicate that LSEG and LADT have coefficients near 1 but still significantly different from 1 since the estimated standard errors are small. Also, if EXPO is taken as an offset and AVGM is retained, the latter is found to be significant. Although other choices could be made, the decision was made to use EXPO as an offset and exclude segment length as a separate variable, with the expectation that additional effects apparently due to segment length can be represented by other highway variables. AVGM was retained in some runs, although, as will be seen, it was not significant in the final model.

Table 16. Washington Segments, Poisson Models with Exposure Variables

 Mean No. of Accidents = 3H(365/10^3)Hexp{.1606 + .9121 LSEG + .8918 LADT} Estimated standard error .0462 .0310 .0299 of coefficient estimates P-value .0001 .0001 .0001 Mean No. of Accidents = EXPO´ exp{.1674 - .0269 AVGM} Estimated standard error .0390 .0059 of coefficient estimates P-value .0001 .0001

1 mile = 1.61 km

Table 17. Combined Segments, Poisson Models with Exposure Variables

 Mean No. of Accidents = (5 or 3)H(365/10^3)Hexp{-.3282 + .9685 LSEG + .9296 LADT + .4450 STATE} Estimated standard error .0346 .0206 .0226 .0366 of coefficient estimates P-value .0001 .0001 .0001 .0001 Mean No. of Accidents = EXPO´ exp{ -.3405 - .0200 AVGM + .4719 STATE} Estimated standard error .0291 .0049 .0357 of coefficient estimates P-value .0001 .0001 .0001

1 mile = 1.61 km

Lane Width and Shoulder Width

Wider lanes and wider shoulders should lower accidents. If we add these two variables to the Poisson models (Table 18), some notable differences are found between Minnesota and Washington. The lane width variable is seen to be of unexpected sign and insignificant in the Washington data.

Table 18. Poisson Models of Segments with Lane and Surface Width

 MINNESOTA Mean No. of Accidents = EXPO´ exp{3.2115 + .0202AVGM - .2501LW - .1183SHW} Estimated standard .4172 .0089 .0354 .0104 error of coefficient estimates P-value .0001 .0222 .0001 .0001 WASHINGTON Mean No of Accidents. = EXPO´ exp{-.0093 - .0157AVGM + .0461LW - .0759SHW} Estimated standard .5270 .0063 .0464 .0110 error of coefficient estimates P-value .9860 .0123 .3201 .0001 COMBINED Mean No. of Accidents = EXPO´ exp{1.5393 - .0079AVGM - .1117LW - .0915SHW + .2850STATE} Estimated standard .3236 .0050 .0277 .0075 .0606 error of coefficient estimates P-value .0001 .1108 .0001 .0001 .0001

1 mile = 1.61 km, 1 ft = .3048 m

In the last chapter we had already noted anomalies in the correlation between accidents and lane or shoulder width in Washington. Several factors contribute to this situation. One of them is the direct correlation between lane width and shoulder width that occurs in the Washington State data but not the Minnesota data. The correlation coefficients are given by:

 Lane Width LW versus Shoulder Width SHW MINNESOTA SEGMENTS WASHINGTON SEGMENTS COMBINED SEGMENTS Correlation coefficient -.06313 .11127 .07047 P-value .1166 .0029 .0101

The P-values are estimated probabilities that the correlation coefficient estimates would have the values shown or values farther from zero if there were zero correlation between the variables on the populations from which the data sets are samples. Minnesota lane widths and shoulder widths have a slight but not especially significant negative correlation, while Washington lane widths and shoulder widths have a significant positive correlation. This is also reflected when we consider univariate statistics for LW, SHW, and TOTWIDTH:

 State Variable Min Max Median Mean MN Lane Width LW 10 12 12 11.54 Shoulder Width SHW 0 12 8 7.08 TOTWIDTH 20 48 38 37.22 WA Lane Width LW 9 12 11 11.37 Shoulder Width SHW 0 10 5 5.01 TOTWIDTH 18 44 32 32.77

1 ft = .3048 m

Another relevant fact is the shoulder composition in each State:

 MINNESOTA SHOULDERS WASHINGTON SHOULDERS Mixed bituminous 243 39.3% Gravel or stone 335 54.1% Composite 34 5.5% Sod 5 .8% Missing 2 .3% 619 100.0% asphalt 402 56.5% bituminous 230 32.3% gravel 72 10.1% curb 1 .1% missing 7 1.0% 712 100.0%

Washington shoulders tend to resemble the road surface more than Minnesota shoulders. This suggests the possibility that a more appropriate variable than either lane width or shoulder width might be the variable TOTWIDTH, total width of road and shoulders. When the shoulder is paved, drivers may not make as much of a distinction between it and the road, and the combined width may be the only important variable. When variables are dependent, it is sometimes useful to replace them with one significant combination. Against this it can be argued that lane width and shoulder width have different types of effects on accidents and that it is inappropriate to treat them as one additive variable. Indeed, in the final models we do not.

Table 19 exhibits some models with only TOTWIDTH.

Table 19. Poisson Models of Segments with TOTWIDTH

 MINNESOTA Mean No. of Accidents = EXPO´ exp{1.7994 + .0152AVGM - .0614TOTWIDTH} Estimated standard .1828 .0087 .0051 error of coefficient estimates P-value .0001 .0816 .0001 WASHINGTON Mean No. of Accidents = EXPO´ exp{1.2141 - .0192AVGM - .0324TOTWIDTH} Estimated standard .1649 .0061 .0050 error of coefficient estimates P-value .0001 .0015 .0001 COMBINED Mean No .of Accidents = EXPO´ exp{1.3310 - .0078AVGM - .0464TOTWIDTH + .2853STATE} Estimated standard .1313 .0050 .0036 .0386 error of coefficient estimates P-value .0001 .1191 .0001 .0001 COMBINED (WITHOUT AVGM) Mean No. of Accidents = EXPO´ exp{1.3480 - .0476TOTWIDTH + .2650STATE} Estimated standard .1309 .0035 .0365 error of coefficient estimates P-value .0001 .0001 .0001

1 mile = 1.61 km, 1 ft = .3048 m

Comparison of these models with those using LW and SHW suggests that replacing LW and SHW by TOTWIDTH plus an adjusted intercept yields similar explanatory value. However, because of the importance of these two geometric variables and the fact that in principle their values are independent, we retain both variables to the extent possible. In a few runs below TOTWIDTH is used instead to facilitate comparisons between the two States.

NOTE: Variables ACCRES = (Number of accidents minus predicted number from a Poisson model not using lane width LW) and LWRES = (LW minus predicted LW from a regression model using other highway variables) can be developed. Their correlation coefficients and associated P-values, not reproduced here, confirm that in Minnesota lane width has a significant independent negative effect on accident counts while in Washington lane width has an insignificant independent positive effect on accident counts.

Horizontal and Vertical Curve Variables

With the exception of the extended negative binomial models, in which individual horizontal and vertical curves were modeled, the horizontal variables used in this study have been the composites H, HM1, HM1.5, and HM2 and the vertical variables have been the composites VC, VM, VMC, and VMCC. All of these variables were found to be highly significant.

The only oddity is shown in Table 20 below and concerns the joint effect of H (average horizontal degree of curve) and VC (sum of crest % grade changes per hundred feet weighted by relative crest curve lengths).

In Table 20 the coefficients of the vertical and horizontal variables differ substantially between the two States and VC is insignificant in Washington with P-value .1854. If one replaces VC by VMC, an alternative measure of crest curves that sums the crest % grade changes per hundred feet over all crests and divides by segment length, the vertical variable becomes significant and its model coefficient stabilizes somewhat (but the horizontal variable H still shows dramatic change in its coefficient). See Table 21. There is of course strong correlation between the horizontal and vertical variables in both States.

 Segment Variables MINNESOTA WASHINGTON COMBINED Horizontal Measure H versus Crest Measure VC Correlation coefficient .21320 .38635 .33840 P-value .0001 .0001 .0001 Horizontal Measure H versus Crest Measure VMC Correlation coefficient .26423 .36362 .32581 P-value .0001 .0001 .0001

It is possible that unimportant reweighting is occurring among variables that measure essentially

Table 20. Poisson Models of Segments with TOTWIDTH, H, and VC

 MINNESOTA Mean No. of Accidents = EXPO´ exp{.9330 - .0422TOTWIDTH + .1849H + 1.6051VC} Estimated standard .1983 .0052 .0248 .2376 error of coefficient estimates P-value .0001 .0001 .0001 .0001 WASHINGTON Mean No. of Accidents = EXPO´ exp{.7692 - .0257TOTWIDTH + .0985H + .2596VC} Estimated standard .1731 .0051 .0082 .1960 error of coefficient estimates P-value .0001 .00001 .0001 .1854 COMBINED Mean No. of Accidents = EXPO´ exp{.9169 - .0385TOTWIDTH + .0954H + .7770VC + .2387STATE} Estimated standard .1344 .0036 .0077 .1345 .0370 error of coefficient estimates P-value .0001 .0001 .0001 .0001 .0001

1 mile = 1.61 km, 1 ft = .3048 m

the same thing. In Washington 63.2% of the segments contain crest curves versus 83.5% of Minnesota's. However, the mean values of VC and VMC are higher in Washington and their standard deviations are much higher. It is perhaps not surprising that there would be differences between Washington and Minnesota in the coefficient estimates, but it is surprising that VC and VMC behave differently in Washington. VMC roughly measures the number of crests per mile (if one assumes that they all have about the same grade change per hundred feet), while VC measures the average grade change per hundred feet and assigns zero grade change to portions where no crest exists. VMC will be large if there are crests with large grade change per hundred feet, but VC will damp these down if they occur over short lengths (because they will be weighted by length).

Because vertical and horizontal alignment are in principle independent and both are very important, we will retain both. We do this despite the fact that the correlation coefficients are considerably larger and more significant than those between lane width and shoulder width in Washington (which led us to introduce the combined variable TOTWIDTH). But in some runs we replace VC with

Table 21. Poisson Models of Segments with TOTWIDTH, H, and VMC

 MINNESOTA Mean No. of Accidents = EXPO´ exp{.9039 - .0397TOTWIDTH + .1840H + .0544VMC} Estimated standard .2027 .0054 .0248 .0081 error of coefficient estimates P-value .0001 .0001 .0001 .0001 WASHINGTON Mean No. of Accidents = EXPO´ exp{.6895 - .0240TOTWIDTH + .0926H + .0395VMC} Estimated standard .1743 .0051 .0085 .0094 error of coefficient estimates P-value .0001 .00001 .0001 .0001 COMBINED Mean No. of Accidents = EXPO´ exp{.7478 - .0340TOTWIDTH + .0928H + .0538VMC + .2503STATE} Estimated standard .1373 .0036 .0075 .0059 .0369 error of coefficient estimates P-value .0001 .0001 .0001 .0001 .0001

1 mile = 1.61 km, 1 ft = .3048 m

VMC. The relationship between the vertical and horizontal measure will be reconsidered below when we use the extended negative binomial model, which takes into account individual curves on a segment.

Other variables systematically investigated in connection with model development include GR (average absolute straight-away grade), RHR (Roadside Hazard Rating), DD (driveway density), SPD (speed), T (commercial traffic %), and INTD (intersection density). Weather variables (NONDRYP and SNP) were also investigated in Minnesota.

The weather variables can be dismissed at once. Both NONDRYP and SNP had negative regression coefficients in models and were not significant. A higher percentage of bad weather tends to accompany a decreased number of accidents, but the P-values are large. In a few runs SNP is marginally significant. Because the weather variable was not local but pertained to a large Weather District in the State of Minnesota and because of its relative insignificance, it was dropped from the modeling and was not collected in Washington State. See Shankar et al. for a study of weather variables in Washington State that indicates sufficiently local weather can be significant.

Among the remaining variables, SPD is not significant in either State nor in the combined data set. This may in part reflect lack of variation in the speed data, as well as the quality of the speed data (speeds were not collected on some segments, but were later reconstructed from HSIS files).

GR is very significant in both States. The other variables are significant in one State or the other (but not both) and significant in the modeling of the combined data sets. One curiosity is that T has a negative coefficient in Minnesota and is not significant, but has a significant positive coefficient in Washington.

The P-values for these variables in Poisson runs on the combined data sets (with other variables LW, SHW, H, VC, and STATE; and with EXPO as an offset variable) are:

 VARIABLE P-value GR .0001 RHR .0001 DD .0107 INTD .0563 T .0697 SPD .4118

Next we attempt to include combinations of these variables in a combined Poisson model for both States. When this is done, GR and RHR do well, as do GR and DD, and GR and T. GR, RHR, and DD do well together (although STATE gets a P-value of .1417 in this case); and GR, RHR, and INTD do well together.

Thus it is certainly appropriate to include GR and RHR in the model and at least one other variable. INTD measures intersection density. However, intersection accidents and intersection-related accidents are excluded from the accident variable in the segment models. For this reason, any effect of INTD will be indirect and INTD is not strictly comparable to DD (driveway density). This rules out a sum of DD and INTD as a measure. If GR, RHR, DD, and INTD are all included in the model, they have the respective P-values .0001, .0001, .0001, and .1863. We conclude that INTD does have an independent effect distinct from that of DD, but not sufficiently significant to include in the model.

The situation is similar with the commercial traffic variable T. It appears to be significant for the combined data set, but not sufficiently - when other variables are present S for inclusion in the model.

Table 22 shows resultant Poisson models for Minnesota and Washington. The anomalous behavior of lane width and VC in Washington exhibited in Table 15 has already been discussed. However, we should note the insignificance of Roadside Hazard Rating RHR in Minnesota. An interesting set of correlations exists with a bearing on the insignificance of RHR in Minnesota and the peculiar behavior of lane width LW in Washington.

 Correlation coefficient and P-value MINNESOTA SEGMENTS WASHINGTON SEGMENTS COMBINED SEGMENTS Lane Width LW versus Roadside Hazrat RHR -.01141, .7769 .11555, .0020 -.1202, .6613
 Shoulder Width SHW versus Roadside Hazrat RHR -.23729, .0001 -.14910, .0001 -.33705, .0001
 TOTWIDTH versus Roadside Hazrat RHR -.23563, .0001 -.11560, .0001 -.32559, .0001

RHR in Minnesota has a mean of 2.14 and a standard deviation of .97, while in Washington its mean is 3.67 and standard deviation 1.57. Roadside Hazard Rating is higher and more variable in Washington State. The insignificance of RHR in Minnesota in part relates to the absence of variation. The unexpected sign of the lane width coefficient in Washington likewise may be in part due to its correlation with the quite variable magnitudes of RHR in Washington. When the data from the two States are combined, this correlation becomes insignificant and the coefficients of LW and RHR both attain more plausible values.

In Table 22 most coefficients for the combined model are intermediate between those of the two States. The most prominent anomalies are the negative sign of lane width in Washington, the

Table 22. Poisson Models for Segment Accidents

Regression Coefficients (Estimated Standard Error and P-value in parentheses)

 Variables (offset = exposure EXPO) Minnesota 1985-89 Washington 1993-95 Combined Intercept 2.0693 (.4371, .0001) -.9719 (.5444, .0742) .7064 (.3290, .0318) AVGM (ADT/1,000) .0128 (.0090, .1559) -.0210 (.0067, .0017) -.0112 (.0052, .0322) Lane Width LW -.1994 (.0359, .0001) .0678 (.0480, .1577) -.0869 (.0280, .0001) Shoulder Width SHW -.0792 (.0111, .0001) -.0390 (.0117, .0008) -.0599 (.0078, .0001) Roadside Hazard Rating RHR .0044 (.0273, .8706) .0650 (.0171, .0001) .0703 (.0141, .0001) Driveway Rate DD .0089 (.0033, .0075) .0119 (.0023, .0001) .0095 (.0019, .0001) Degree of Curve H .1363 (.0283, .0001) .0783 (.0099, .0001) .0711 (.0089, .0001) Crest VC 1.1905 (.2634, .0001) .2090 (.2073, .3135) .6843 (.1455, .0001) Absolute Grade GR .2459 (.0598, .0001) .0779 (.0234, .0009) .1009 (.0213, .0001) State (MN = 0, WA = 1) -- -- .0909 (.0453, .0447) n, p Dm/(n - p), c2/(n - p) 619, 9 1.6827, 1.6596 712, 9 1.6525, 1.7179 1331, 10 1.7135, 1.7422 T1 13.55 12.04 22.71 R2, P2, R2 P .7379, .8890,.8300 .6287, .8138,.7726 .6611, .8610, .7778 R2 W, P2 W, R2PW .8300, .8960, .9263 .7641, .8609, .8875 .7886, .8777, .8984 R2FT, P2FT, R2PFT .6426, .7609, .8446 .5846, .7049, .8293 .5999, .7341, .8172

Table 23. Additional Poisson Models for Segment Accidents

Regression Coefficients (Estimated Standard Error and P-value in parentheses)

 Variables (offset = exposure EXPO) Minnesota 1985-89 Washington 1993-95 Combined Intercept 2.1930 (.4438, .0001) .0378 (.2034, .8526) .7048 (.3293, .0323) AVGM (ADT/1,000) -- -.0252 (.0066, .0001) -- Lane Width LW -.1856 (.0350, .0001) TOTWIDTH -.0135 (.0054, .0116) -.0918 (.0281, .0011) Shoulder Width SHW -.0757 (.0106, .0001) -.0664 (.0077, .0001) Roadside Hazard Rating RHR -- .0726 (.0169, .0001) .0662 (.0143, .0001) Driveway Rate DD .0092 (.0033, .0050) .0102 (.0024, .0001) .0097 (.0019, .0001) Degree of Curve H .1445 (.0278, .0001) .0701 (.0101, .0001) .0720 (.0089, .0001) Crest VC in MN, Combined; VMC in WA 1.2257 (.2567, .0001) .0378 (.0101, .0002) .6999 (.1450, .0001) Absolute Grade GR .2438 (.0582, .0001) .0740 (.0235, .0016) .1077 (.0214, .0001) SNP in MN; T in Combined -.8851 (.5938, .1361) -- .0070 (.0029, .0153) STATE -- -- .0418 (.0448, .3500) n, p Dm/(n - p), c2/(n - p) 619, 8 1.6796, 1.6361 712, 8 1.6396, 1.6774 1331, 10 1.7126, 1.7592 T1 14.54 12.04 22.55 R2, P2, R2 P .7297, .8890,.8208 .6279, .8138,.7716 .6607, .8610, .7673 R2 W, P2 W, R2PW .8290, .8941, .9272 .7685, .8604, .8932 .7909, .8803, .8985 R2FT, P2FT, R2PFT .6421, .7609, .8439 .5859, .7049, .8311 .6006, .7341, .8182

insignificance of Roadside Hazard Rating RHR in Minnesota, and the insignificance of the crest variable VC in Washington.

Table 23 shows a few variant Poisson models with characteristics of special interest. In Table 23 the insignificant variables from Table 22 are removed and other variables are introduced. In Minnesota AVGM and RHR have been removed, and SNP has been added (P-value = .1361). In Washington TOTWIDTH has replaced LW and SHW, and VMC has replaced VC. Also in Table 23 the combined data set is presented without AVGM but with the addition of T. The variable T is quite significant but STATE loses its significance (P-value = .3500).

Poisson versus Negative Binomial

For the models in Tables 22 and 23 the values of Dm/(n - p), X2/(n - p), and T1 are computed, along with several measures of goodness-of-fit. The goodness-of-fit measures indicate that the models have a good deal of explanatory power. However, the other statistics in all cases strongly support the conclusion that the data are overdispersed. In particular, the large values of T1 establish this decisively. The sources of the overdispersion are presumably segment characteristics not included in the model. Some of these characteristics might be items not collected (e.g., sight distances, superelevations, local weather) that are possible to collect, but others are items well outside the scope of this study (e.g., driver characteristics).

Negative binomial models are a natural generalization of the Poisson that permit treatment of overdispersion. Such models can be developed with the software package LIMDEP or by trial and error with SAS and different choices of an overdispersion parameter. The negative binomial also has the advantage of lending itself nicely to application of empirical Bayesian techniques when past accident data are available at a site. An adjusted model can be developed with parameters partly derived from the past data and partly from the given negative binomial model. The new model makes use of the old but also allows the predictions of the old model to be tempered by actual experience on the roadway. See Hauer et al. (1988).

The phenomena noted in the earlier Poisson models occur in the negative binomial setting: differences between the behavior of AVGM, lane width LW, VC and VMC, and RHR from one State to the other; and marginal significance of INTD and T. So the analysis is not repeated. In general the estimated coefficients of variables are similar to what they were under the Poisson models. However, we have an estimate for one additional parameter, the overdispersion parameter K.

Table 24 shows four representative negative binomial models. The overdispersion parameters vary from 0.26 to 0.30. Variables that are omitted are not significant, and some that are retained are not as well S notably, intercept in three of the models, AVGM, and VC in the combined data set (and in Washington, not shown). AVGM is not at all significant in Minnesota, not very significant in Washington, and intermediate in the combined data set. Lane width has the wrong sign in Washington (not shown), and is less significant in the combined data set than it was in the Poisson.

Table 24. Negative Binomial Models for Segment Accidents

Regression Coefficients (Estimated Standard Error and P-value in parentheses)

 Variables (offset = exposure EXPO) Minnesota 1985-89 Washington 1993-95 Combined Combined Variant Intercept 1.9456 (.6992, .0054) .0358 (.2719, .8953) .6883 (.4779, .1492) .4733 (.4796, .3356) AVGM (ADT/1,000) -- -.0242 (.0137, .0787) -.0109 (.0107, .3067) -- Lane Width LW -.1821 (.0573, .0015) TOTWIDTH -.0127 (.0071, .0720) -.0857 (.0405, .0343) -.0700 (.0404, .0833) Shoulder Width SHW -.0800 (.0158, .0001) -.0577 (.0106, .0001) -.0569 (.0105, .0001) Roadside Hazard Rating RHR -- .0642 (.0254, .0116) .0622 (.0219, .0046) .0609 (.0219, .0055) Driveway Rate DD .0079 (.0042, .0630) .0100 (.0035, .0045) .0091 (.0027, .0007) .0072 (.0026, .0067) Degree of Curve H .1421 (.0545, .0092) .0735 (.0154, .0001) .0856 (.0126, .0001) .0772 (.0140, .0001) VC (MN/COM) VMC (WA/COMV) 1.0495 (.4964, .0345) .0333 (.0168, .0468) .3748 (.2605, .1502) .0394 (.0141, .0052) Absolute Grade GR .1990 (.0928, .0320) .0800 (.0295, .0066) .0976 (.0280, .0005) .0941 (.0280, .0008) State -- -- .1420 (.0679, .0366) .1427 (.0678, .0353) n, p Dm/(n - p - 1) 619, 7 1.4938 712, 8 1.4767 1331, 10 1.4993 1331, 9 1.4922 K .2657 (.0385, .0001) .2821 (.0385, .0001) .3022 (.0285,.0001) .2943 (.0281,.0001) R2 K .8609 .8302 .8310 .8354 R2 .7251 .6268 .6489 .6669 R2 D, P2 D R2PD .3720, .5607 .6634 .3455, .5300 .6518 .3518, .5464 .6438 .3548,.5477 .6478

Table 25. Negative Binomial Models for Segment Injury Accidents

Regression Coefficients (Estimated Standard Error and P-value in parentheses)

 Variables (offset = exposure EXPO) Minnesota 1985-89 Washington 1993-95 Combined Intercept 1.9998 (.8205, .0148) -.2375 (.3511, .4988) .1675 (.6108, .7839) Lane Width LW -.2458 (.0694, .0004) TOTWIDTH -.0279 (.0089, .0017) -.1155 (.0531, .0296) Shoulder Width SHW -.1053 (.0212, .0001) -.0740 (.0143, .0001) Roadside Hazard Rating RHR -- .0506 (.0314, .1077) .0410 (.0272, .1315) Driveway Rate DD -- .0065 (.0041, .1193) .0054 (.0035, .1192) Degree of Curve H .2158 (.0667, .0012) .0598 (.0194, .0020) .0730 (.0161, .0001) Crest VMC -- .0405 (.0219, .0648) .0399 (.0177, .0239) Absolute Grade GR -- .0725 (.0377, .0543) .0574 (.0360, .1109) State -- -- .4149 (.0879, .0001) n, p Dm/(n - p - 1) 619, 4 1.0702 712, 7 1.1593 1331, 9 1.1212 K .2398 (.0786,.0023) .2751 (.0682, .0001) .2710 (.0518,.0001) R2 K .8934 .8444 .8628 R2 .5859 .4824 .5386 R2 D, P2 D R2PD .3483, .4468 .7795 .3185, .4334 .7348 .3303, .4399 .7509

runs. The goodness-of-fit measures, including the ordinary R2, yield no dramatic conclusions. R2 K is systematically larger than the others. All the measures suggest that the Minnesota coefficients account for Minnesota accidents a bit better than the other models.

Table 25 shows negative binomial models for serious accidents, based on the variable INJACC. Variables with little significance have been omitted and only those that are significant or marginally significant have been retained. The Minnesota model, with the fewest variables, once again has the highest goodness-of-fit. The coefficients are roughly comparable to those for the models for total number of accidents (TOTACC). Differences between the deviances Dm and R2 as one passes from Table 24 (TOTACC) to Table 25 (INJACC) are not of importance. Both measures tend to give smaller values when observed data are near zero, and larger values when the observations are away from zero: INJACC has small or zero values more often than TOTACC.

The Extended Negative Binomial

instead of (5.1). With respect to the j-th highway variable, segment number i is decomposed into Cij subsegments of relative lengths {wijc : c = 1, ..., Cij} where the variables xij take the respective putatively constant values {xijc : c = 1,..., Cij}. In effect this model slices up the segments into subsegments where each variable is constant. The weights wijc are the relative lengths of the subsegments and add to 1. The value Cij can be taken to be independent of i (and j) if the maximum number of subsegments in the data set is specified: for segments with fewer subsegments the extra weights can be set equal to zero. For some variables, all weights except one are set to zero, and the model behaves like an ordinary negative binomial model with respect to them.

An advantage of the extended negative binomial model is that it permits local variation along a roadway to be taken into account. Rather than summing local effects or averaging them, one in effect sums the accidents occurring on subsegments where conditions are constant. This givesthe model form a scale independence: one may decompose segments into subsegments or aggregate adjacent segments without changing model form.

Table 26. Extended Negative Binomial Models for Segment Accidents

Regression Coefficients (Estimated Standard Error and P-value in parentheses)

 Variables (offset = exposure EXPO) Minnesota 1985-89 Washington 1993-95 Combined Intercept 2.0168 (.6593, .0022) .0846 (.2883, .7692) .6287 (.4993, .2080) AVGM (ADT/1,000) -- -.0239 (.0107, .0252) -.0111 (.0897, .2099) Lane Width LW -.1843 (.0548, .0008) TOTWIDTH -.0142 (.0077, .0669) -.0829 (.0424, .0504) Shoulder Width SHW -.0812 (.0161, .0001) -.0560 (.0116, .0001) Roadside Hazard Rating RHR -- .0689 (.0245, .0049) .0665 (.0210, .0016) Driveway Rate DD .0089 (.0044, .0423) .0119 (.0033, .0003) .0091 (.0026, .0005) Degrees of Curve DEG{i} .0474 (.0133, .0003) .0521 (.0085, .0001) .0445 (.0078, .0001) Crest Curve Rates V{j} .4834 (.1416, .0006) -- .4653 (.1255, .0002) Absolute Grades GR{k} .2404 (.0592, .0001) .0894 (.0314, .0045) .1047 (.0286, .0003) State -- -- .1585 (.0674, .0188) n, p Dm/(n - p - 1) 619, 6 1.4980 712, 7 1.4877 1331, 10 1.5012 K .2722 (.0457, .0001) .3055 (.0460, .0001) .3034 (.0331,.0001) R2K .8575 .8161 .8303 R2 .7246 .5720 .6555

Table 27. Final Extended Negative Binomial Model for Segment Accidents

Regression Coefficients (Estimated Standard Error and P-value in parentheses)

 Variables (offset = exposure EXPO) Combined Intercept .6409 (.5008, .2006) Lane Width LW -.0846 (.0425, .0465) Shoulder Width SHW -.0591 (.0114, .0001) Roadside Hazard Rating RHR .0668 (.0211, .0015) Driveway Rate DD .0084 (.0026, .0011) Degrees of Curve DEG{i} .0450 (.0078, .0001) Crest Curve Rates V{j} .4652 (.1260, .0002) Absolute Grades GR{k} .1048 (.0287, .0003) State .1388 (.0659, .0351) n, p Dm/(n - p - 1) 1331, 9 1.5012 K .3056 (.0331, .0001) R2 K .8291 R2 .6547

Table 28. Extended Negative Binomial Models for Segment Injury Accidents

Regression Coefficients (Estimated Standard Error and P-value in parentheses)

 Variables (offset = exposure EXPO) Minnesota 1985-89 Washington 1993-95 Combined Intercept 1.7147 (.8860, .0530) -.1571 (.3657, .6675) .3534 (.6546, .5893) Lane Width LW -.2233 (.0735, .0024) TOTWIDTH -.0302 (.0095, .0015) -.1306 (.0558, .0193) Shoulder Width SHW -.0996 (.0219, .0001) -.0784 (.0150, .0001) Roadside Hazard Rating RHR -- .0568 (.0309, .0659) .0598 (.0261, .0217) Driveway Rate DD -- .0085 (.0040, .0349) .0062 (.0034, .0679) Degrees of Curve DEG{i} .0580 (.0116, .0001) .0406 (.0107, .0001) .0457 (.0091, .0001) Crest Curve Rates V{j} .5528 (.1364, .0001) -- .4694 (.1687, .0054) Absolute Grades GR{k} -- .0823 (.0400, .0395) -- State -- -- .4309 (.0852, .0001) n, p Dm/(n - p - 1) 619, 6 1.0763 712, 6 1.3009 1331, 9 1.1308 K .2482 (.0751, .0010) .2951 (.0699, .0001) .2880 (.0523,.0001) R2 K .8899 .8320 .8542 R2 .5926 .4750 .5277

As with the negative binomial the goal is to estimate the coefficient vector and the overdispersion parameter K. Shaw-Pin Miaou made available a program that uses maximum likelihood to estimate these quantities. In Table 26 we show the results of the modeling.

In Table 26 AVGM and Roadside Hazard Rating RHR are strongly insignificant in Minnesota and so were removed. In Washington the crest variable V{j}, although having the correct sign, is strongly insignificant in the presence of the other variables and so was removed. In the combined data set AVGM (and the Intercept variable) are insignificant. When AVGM was removed and the commercial percentage variable T added, the estimated coefficient for T was positive but had a significance level of about 20%. When the speed variable SPD is added instead, it has a negative coefficient and a P-value of 50%.

Table 27 represents our final model for segments. It contains a large number of variables, all of them significant, and it represents the combined characteristics of rural segments in two States with a reasonable amount of variation in all variables.

Table 28 shows three extended negative binomial models for Injury Accidents. AVGM was insignificant in all three data sets. RHR and DD were insignificant in Minnesota. The straightaway grade variable GR{k} was not significant in Minnesota, and the crest vertical V{j} was not significant in Washington. Extended negative binomial runs with all variables present did not converge in the combined data set, but did when GR{k} was removed. A total of 36% of all reported segment accidents were Injury Accidents in Minnesota versus 46% in Washington, and this is reflected by the increase in the coefficient for State from Table 27 to Table 28.