U.S. Department of Transportation
1200 New Jersey Avenue, SE
Washington, DC 20590
202-366-4000

Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations

 This report is an archived publication and may contain dated technical, contact, and link information
 Federal Highway Administration > Publications > Research Publications > Safety > 98096 > Modeling Intersection Crash Counts and Traffic Volume - Final Report
 Publication Number: FHWA-RD-98-096 Date: September 1997

Modeling Intersection Crash Counts and Traffic Volume - Final Report

7. CONCLUSIONS ON MODELING INTERSECTION CRASHES IN RELATION TO TRAFFIC VOLUMES AS EXPOSURE MEASURES

7.1 Relations between crash counts and traffic volumes at four–leg signalized intersections

7.2 What can currently be done?

7.3 Substantive research needed

7.4 Methodological research needs

7.1 Relations between crash counts and traffic volumes at four–leg signalized intersections

The currently readily available exposure measures at intersections are the average traffic volumes on the intersecting roads. Intuitively, one expects crashes in intersections to result from the interaction of the two traffic streams. The simplest mathematical function expressing an interaction is z=a*x*y. This expression is too rigid because it has only one parameter, a. A simple generalization is z=a*xbyc. It is often used, usually in logarithmic from as a log–linear model.

A critical question is, is this form adequate to represent the actual relation between crashes and traffic volumes, or are there better relationships? The simplest way to obtain an approximation to the "true" relationship between crashes and traffic volumes is to smooth crash counts over the two traffic volumes. We did this for three data sets: four–leg signalized and stop–controlled intersections in Washtenaw County, Michigan; four–leg signalized intersections in California; and four–leg signalized intersections in Minnesota.

The Washtenaw County data suffered from a selection bias because only intersections with more than a certain number of crashes were included. Such data are rarely selected in practice. For this data set, crash counts increased with increasing major volume as well as with increasing minor volume. The conventional log–linear model appeared to be an acceptable qualitative representation. Quantitatively, however, there were complex systematic deviations between the data and the model. There was practically no relationship between crash counts and traffic volumes for crashes at stop–controlled intersections in this data set.

The large number of intersections in the California data set allowed detailed analysis. A fairly simple visual, but analytically complex, relationship was apparent. The most obvious feature of the surface was a "ridge" at minor road volumes of 20,000 vpd. Up to that value, crash counts increased nearly linearly with the minor volume. Beyond that volume, they initially dropped rapidly, and then leveled off. The relationship with the major volume was not that pronounced but showed a fairly strong increase at low volumes, no or moderate increase at middle volumes, and an irregular decrease at high volumes.

Crashes within the intersection itself showed a similar but much less pronounced pattern with a weaker variation with the volumes. Crashes on the major approach showed a pattern very similar to that of all crashes, while those on the minor approach showed a definitely different pattern. There was, however, a "ridge" beyond which again crashes declined, a very strong increase with minor volume, and relatively little variation with major volume. It was obvious that a log–linear model could not even roughly approximate the actual surface.

One possible reason for a deviation from the expected pattern is that intersections that otherwise would have very high crash counts have been "improved" so as to reduce the crash risk. However, none of the intersection characteristics given in the data file that might reduce the crash risk appeared more frequently in the areas of the diagrams where the unexpected decline of the crash counts occurred. Thus, there must be either other intersection features not available in our data file, or the relationship between crash counts and traffic volumes must be far from log–linear.

The analysis of Minnesota intersections was limited by their low number. They showed a complex pattern. The relationship of major volume to crash counts was nearly a step function, which was approximately constant for low volumes, even more so for high volumes, with a "ramp" connecting the two levels. Strong smoothing that came close to fitting a plane to the data points resulted in a surface that increased with both volumes.

Crashes within the intersection itself showed a slightly different pattern. The pattern for the major volume again had two levels connected by a ramp, but there was a fairly strong increase with minor volume. Crashes on the approaches showed no clear pattern. Strong smoothing revealed only a weak increase with major volume and a stronger increase with minor volume.

A log–linear function was qualitatively similar to the more strongly smoothed surface, but deviated quantitatively. It could not represent the less strongly smoothed surface showing two levels and a connecting ramp.

If the crash risk in specific intersection maneuvers, such as turning left, turning right, going straight, etc., were the same across and independent of the volumes, but not across maneuvers, the frequencies of crashes reflecting such maneuvers would be proportional to the frequencies of the maneuvers. If an intersection had many high–risk maneuvers, one would expect more crashes than at intersections with comparable volumes but with fewer such maneuvers. Therefore, we also explored possible relationships between the frequencies of crash types, and of total crashes, in the Minnesota and California data sets. We found none.

Our three data sets showed very different relationships between crash counts at four–leg signalized intersections and the traffic volumes on the intersecting roads. None of them could be adequately represented by the conventional log–linear model. Either other intersection characteristics that were not readily available in our data sets had a strong influence on crash counts, or average annual daily traffic is not an adequate exposure measure.

Top

7.2 What can currently be done?

Considering our negative conclusions about the usefulness of using conventional mathematical models to represent relationships between crash counts at signalized intersections and traffic volumes, what can be done? Smoothing is a promising alternative because it allows the fitting of even complicated surfaces by a simple process and avoids arbitrary assumptions. Using a function of two volumes, as done in this study, is a relatively simple matter. If more than two volumes, or other variables, especially categorical variables, are desired, the procedures have to be extended and refined, as discussed in subsequent sections.

We cannot rule out the possibility that someone may find manageable and not too ad–hoc mathematical expression for the relationship between traffic volumes and crash counts. By ad hoc we mean mathematical expressions selected specifically to fit the data sets studied, without consideration whether they could plausibly be extended to other data sets. However, such mathematical expressions have to be validated by more detailed criteria than correlative coefficients, likelihood ratios, or similar aggregate measures to be acceptable substitutes for smoothed surfaces.

What can be done in practice with such smoothed (or validated analytical) relationships? They can be used to compare the experience of an individual intersection with that expected from the relationship. If the difference is sufficiently large then the crash experience of that intersection should be studied in detail. Criteria for what is considered sufficiently large still have to be selected. Possibly, an explanation that suggests either which countermeasure should be applied to that intersection or which features of that intersection might have the beneficial effect of a crash countermeasure could be found.

This approach has its limitations. It can work only for intersections in an "area," defined by combinations of the two volumes, with enough data points in the "area" so that the individual peculiarities of the intersections will average out. It will not work for relatively isolated intersections near the boundary of the area covered by intersections. There, the smoothed surface will be "pulled" toward the value of each individual intersection. Even if the actual crash count for an intersection may be much higher than to be expected from the "true" relationship between crashes and volumes, this deviation may not be recognizable.

Top

7.3 Substantive research needed

Before one can realistically think about modeling intersection crash counts, one needs to develop a more realistic logical and functional structure for such models. A first step is to re–think the concept of exposure. As already discussed, traffic volumes on the intersecting roads are conceptually unsatisfactory. Only in the simplest case of uncontrolled intersections can one expect crash counts to be log–linear or similar functions of the volumes. An exposure measure should count the opportunities for collisions. These depend heavily on the type and characteristics of traffic control provided. Promising steps to develop more meaningful exposure measures have been taken. However, much more work on the problem using different perspectives is needed.

Another aspect is that intersection crashes, and even more so, intersection–related crashes, are very inhomogeneous. It cannot be expected that a single model will describe their frequency in a manner reflecting crash causation. Therefore, a closer examination of intersection crash types should be made and classes for meaningful modeling must be identified. This might require performing nearly a "clinical" analysis of individual crashes.

In reality, many intersections have certain features exactly because they had much crash experience. This creates relationships that make the standard statistical models uninterpretable. Either much more sophisticated models have to be developed, or different techniques used.

One alternative to the conventional approach, that of using a large set of intersections and including many variables in a complicated model, is to select intersections that are matched in many respects as closely as practical, and differ only in one or very few characteristics to be studied. This is much more likely to isolate any effect of such characteristics. If this is done, in turn, with many different subsets of intersections, a realistic model may be built in a stepwise fashion.

Top

7.4 Methodological research needs

Before smoothing can be used routinely to model relationships between crash counts and exposure measures and other intersection characteristics, additional research needs to be done.

A realistic model will contain one or several exposure measures that are continuous variables (or counts that can be treated as continuous variables), intersection characteristics that will usually be described by 0/1 categorical variables, and possibly other continuous variables, such as travel speeds. In principle, one can smooth over all continuous variables simultaneously, but one cannot smooth over the categorical variables. They have to be accommodated by either additive or multiplicative terms, or the entire data set may have to be split according to a categorical variable, or combinations of several categorical variables, and each part modeled separately. Criteria have to be developed to decide when each of these treatments is appropriate.

Though it is possible to smooth data sets with a large number of continuous independent variables, it is not very useful. The data can be stored in a computer, or in hardcopy tables, and the smoothed value can be calculated for any combination of the independent variables. However, if the number of variables is greater than two, or at most three, the smoothed surface cannot be visualized or intuitively assessed for overall shape and smoothness. As a practical matter, one wants to separate the model into additive or multiplicative components, each of which can be studied and assessed separately. Indeed, this is the same approach used in analytical modeling, where one uses additive or multiplicative terms. If interactions have to be considered, they are introduced as additional additive or multiplicative terms. Since one can easily visualize a surface smoothed over two variables, one only needs to determine how to separate a model into components representing main effects, or interactions of any two variables. Research is needed to learn how to do this best and how to assess the adequacy of such additive or multiplicative models.

If one deals with experimental data where by design the data points can cover the range of the variables of interest more or less uniformly, smoothing by standard methods can give a good representation of the relationship, and the deviations of the individual points from the surface can give a good idea of the random variability of the data points.

In the case of intersections, and probably also other highway locations, the situation is different. Most observations are concentrated in only part of the entire area covered with observations. Toward the edges of this area, observations become more sparse and may be isolated. This poses a dilemma for smoothing. In the areas densely covered with observations, a narrow smoothing window may be appropriate, because it can well represent a complex relationship and still provide adequate smoothing. Where the points are more isolated, such a narrow window is no longer appropriate, because in extreme cases it may result in a perfect or at least very good fit to any single point or to a combination of only a few points. This can result in erratic behavior of the smoothed surface toward its boundaries. To avoid this, one might enlarge the smoothing window. While this has the effect of giving a smoothed surface near the boundaries of the covered area, it can result in smoothing out important details in the area well covered with points. Techniques should be developed that avoid this, for instance, by using a window with adoptive size, or by identifying parts of the smoothed surface that depend on only a few data points.

Top

FHWA-RD-98-096