U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
2023664000
Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations
This report is an archived publication and may contain dated technical, contact, and link information 

Publication Number: FHWARD98133
Date: October 1998 

Accident Models for TwoLane Rural Roads: Segment and Intersections5. ModelingPoisson and Negative Binomial Modeling Techniques The Poisson and Negative Binomial Models Poisson and negative binomial models, with parameters a generalized linear function of covariates, are by now a wellaccepted method of modeling discrete rare events such as roadway accidents. See Miaou and Lum (1993). It is assumed that accidents occurring on a particular roadway or at a particular intersection are independent of one another and that a certain mean number of accidents per unit time is characteristic of the given site and of other sites with the same properties. The mean itself is assumed to depend on highway variables. Since the mean must be greater than zero, it is taken to have a generalized linear form given by: where P(_{i}) is the probability of _{i} accidents at the given site. The negative binomial distribution adds a quadratic term to the variance representing overdispersion. The negative binomial model takes the form: The coefficients are estimated by maximizing the loglikelihood function L() for the Poisson distribution:
(5.2)
Here = (_{0}, _{1}, ...., _{n}) is the vector of coefficients, y_{i} is the observed accident count for segment (5.3)
For convenience the same letters will often be used for both the parameters and their estimated values, i.e., hats ^ will be omitted. Model Evaluation  Overdispersion A decision about whether the Poisson form is appropriate can be based on one of several statistics. As noted in SAS Technical Report P243 the deviance of a model m is: where L^{f} is the loglikelihood (5.2) that would be achieved if the model gave a perfect fit (_{i }= _{i} for each i, and K = 0) and L^{m} is the loglikelihood (5.2 or 5.3) of the model under consideration (_{i }= _{i }). If the latter model is correct, D^{m} is approximately a chisquared random variable with degrees of freedom equal to the number n of observations minus the number p of parameters. A value of the deviance greatly in excess of n  p suggests that the model is overdispersed due to missing variables and/or nonPoisson form. Thus when deviance divided by degrees of freedom is significantly larger than 1, overdispersion is indicated. Likewise, the Pearson chisquare statistic, defined by is an approximately chisquared random variable with mean n  p for a valid Poisson model. If is significantly larger than 1, overdispersion is also indicated. On the assumption that the basic form of the model is correct, Dean and Lawless (1989) recommend yet another statistic T_{1} to test the hypothesis that the model is a Poisson model against the alternative that it is overdispersed. When the null hypothesis K = 0 is true and the number of observations is large, the statistic is approximately a standard normal random variable. If T_{1 } is large positive, the hypothesis K = 0 is rejected, the data are considered to be overdispersed, and a negative binomial model with K positive is an alternative candidate model. Model Evaluation  Goodness of Fit In addition to a plausible basis for the underlying distributional assumptions, three important tests for an acceptable model are the following: · The estimated regression coefficient for each covariate should be statistically significant, i.e., one should be able to reject the null hypothesis that the coefficient is zero; · Engineering and intuitive judgments should be able to confirm the validity and practicality of the sign and rough magnitude of each estimated coefficient; and · Goodnessoffit measures and statistics, such as Rsquared (the coefficient of determination), the deviance, and the Pearson chisquare, should indicate that the variables do have explanatory and predictive power. The modeling of the data in this study was done using SAS and LIMDEP software. Along with approximate maximum likelihood estimates for the regression coefficients, these software packages yield estimates of the standard error for each coefficient. From these, Pvalues can be computed for the null hypothesis that the true value of some regression coefficient is zero. The zscore of the estimated coefficient is the estimated coefficient minus zero, divided by the estimated standard error. The Pvalue is the probability that a normal random variable has an absolute value larger than the zscore obtained. If the Pvalue is small, we have good evidence that the corresponding variable is significant, that the difference between the coefficient estimate and zero arises not from chance but from a systematic effect. Goodnessoffit measures associated with Poissontype models have been introduced and reviewed by Fridstrøm et al. (1995) and Miaou (1996). The Rsquared goodnessoffit measures, used to estimate the percentage of variation explained by a regression model, are somewhat controversial. Different Rsquared measures may yield substantially different answers, or even answers larger than 1, particularly for models that are not linear. See the article of Kvalseth (1985). Until recently, Rsquared measures appropriate for Poisson or negative binomial models had not been established. Fridstrøm et al. (1995) developed several alternative goodnessoffit methodologies for generalized Poisson regression models. Four of these approaches are used here to evaluate goodnessoffit. The first approach is based on the ordinary Rsquared, or coefficient of determination, used in linear regression models: (5.4) where y_{i} = observed accident count for highway segment or intersection no. i = average accident count for the sample _{i} = estimated mean accident count for observation no. i The numerator in the second term (of 5.4) is the variation not explained by the model. In a perfectly specified and estimated Poisson model (variance equal to mean), the most that can be explained of the given data is expected to be P^{2}, where
(5.6)
The variable _{} is approximately a standard normal random variable (at least for y_{i} larger than 1). The three measures introduced so far are strongly oriented toward Poisson models. Indeed because they do not explicitly include an overdispersion parameter they seem inappropriate for negative binomial models. But a fourth approach is tailored to the negative binomial. The fourth approach, the LogLikelihood Rsquared, is based on the deviance D^{m} of the model. Fridstrøm et al. propose the following measures: equations (5.13, 5.14, 5.15 respectively) Here D^{0} is the deviance of a model with only two parameters, the constant term (intercept) and the overdispersion parameter; k is the number of parameters of the model m under consideration (not including the overdispersion parameter in the model); and D_{E}^{m} is the expected value of the deviance in the case when a Poisson model with the same means _{yi} as the model m is the correct one. Roughly speaking, R_{D}^{2} indicates how much explanatory power results from adding the highway characteristics and R^{2}_{PD} represents this as a fraction of the highest possible expected explanatory power of any model with the same means as m. For negative binomial and Poisson models Fridstrøm et al. regard R^{2}_{PD }and R^{2}_{PFT} with favor. They express reservations about R^{2}_{P} and R_{PW}^{2} the first of these, being unnormalized, will make observations with large predicted means more influential, while the second tends to exaggerate the estimation errors associated with small predicted means. Yet another measure of goodnessoffit, this one advocated by Miaou (1996), is based explicitly on the overdispersion parameter. (5.16) Here K is the overdispersion parameter estimated in the model, and K_{max} is the overdispersion parameter estimated in the negative binomial model discussed above, namely, the model with only a constant term and an overdispersion parameter. Based on simulations Miaou concluded that this measure shows promise. It is simple to calculate, it yields a value between 0 and 1, it has the proportionate increase property (Miaou proposes as a criterion that independent variables of equal importance, when added to a model, increase the value of the measure by the same absolute amount regardless of the order in which they are added), and it is independent of the choice of intercept term in the model.
