U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590

Skip to content

Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations

This report is an archived publication and may contain dated technical, contact, and link information
Publication Number: FHWA-HRT-10-024
Date: April 2010

Development of a Speeding-Related Crash Typology

PDF Version (1.53 MB)

PDF files can be viewed with the Acrobat® Reader®








The goal of this study is to determine which crash–, vehicle–, and driver–related factors are more likely to be found in SR crashes. As such, the list of possible variables (e.g., crash type) and combinations of variables (e.g., crash type by urban versus rural roadway class by speed limit) are almost limitless. Finding the most important variables is difficult since it is in some ways determined by the interest of the user or the type of treatment program being considered. For example, roadway–based treatments (e.g., traffic–calming measures) might be better identified or targeted by location–type analyses, while enforcement or educational treatments would be more related to driver variables. It was also difficult to combine vehicle– and driver–related factors in the same analyses as the broader crash factors since the decision of how to code each driver in a crash was complex. For example, while a crash can be classified as SR if one or more vehicles is SR, not all drivers in that crash should be thought of as SR. Indeed, in multivehicle collisions, only one of the drivers would be speeding in many cases, and thus, a comparison of driver age for speeding and nonspeeding drivers must be done on a vehicle basis rather than a crash basis.

Given the unlimited number of possible factors of interest, a decision was made to conduct a two–part analysis. In the first part, a series of single–variable tables was produced for key crash, vehicle, and driver variables. Each variable–specific table was examined to determine which categories of that variable had the highest number and SR percentage. Such single–variable tables provide valuable information on SR crashes; however, they do not provide a way of determining which variables are most important in terms of speeding or information on combinations of variables or on the interactions between variables. The second set of analyses attempted to do this through the use of classification trees as produced by the classification and regression tree (CART) software that is available in SAS®.(10)



As indicated previously, single–variable tables were created from each dataset/definition for a large number of variables. The choice of variables to be examined was based to some extent on the results of past studies of SR issues, particularly on the earlier study by Bowie and Walz.(4) The factors describing the overall nature of each crash (e.g., crash type, crash location, etc.) were examined using a crash–based file where any involved vehicle was speeding, and the vehicle and driver–based factors were examined in a vehicle–based file where each vehicle was classified as speeding or not. In the results section, three tables are presented for each variable–the first contains GES and FARS results if both are available, and the other two contain results for both definitions for each of the two States. In general, a category is defined as over–represented if it is characterized by a high percentage of SR crashes, drivers, or vehicles. Whether this is the most helpful way to characterize these findings if they are to be used in treatment development or targeting is discussed below in the interpretation of results section. A brief discussion describing the consistency of findings across the databases and definitions is included below each table.





Although the analyses of single–variable tables provide useful information about SR crashes and vehicles/drivers in crashes, they do not automatically indicate which factors/variables are the most critical with regard to SR crashes or speeding drivers. They also do not indicate which combinations of variables are the most important. One way to identify the critical roadway, vehicle, and driver factors associated with an increased likelihood of an SR crash is to estimate a logistic regression model with the roadway, vehicle, and driver factors as independent variables and then to identify the statistically significant factors. Logistic regression is a parametric approach that is based on assumptions about error distributions. The CART methodology is nonparametric and does not require any such assumptions. In addition, CART is able to include a relatively large number of independent variables and identify complex interactions between these variables more efficiently compared to logistic regression. For example, CART is able to determine not only the most important variable and categories within that variable in terms of the risk of an SR crash, but also the most important second–level variable within the most important categories of the first–level variable, etc. That is, given the most important variable with respect to the proportion of SR crashes (e.g., manner of collision) and the subgroup of categories within that variable with the highest proportion of SR crashes (e.g., run–off–road crashes), CART is able to determine the next most important variable within these high–risk categories (e.g., road surface condition) and the categories of that variable that are most important (e.g., snow and sleet). It is hoped that these variables and categories are helpful in determining needed treatments. For these reasons, it was decided that classification trees would be used as the second type of analysis in this project.

Thus, the goals of the CART analysis are as follows: (1) to determine which variables available for examination are most important in terms of predicting SR crashes, (2) to determine which categories within that variable predict the highest risk/proportion of SR crashes, (3) to determine the second most important variable and subset of categories in terms of predicting SR crashes within this highest risk subset of categories of the first variable, and (4) to repeat the process to determine the third, fourth, and subsequent variables. This produces a tree with multiple branches that can be traced down to determine the most important combinations (or subsets) of variable categories in terms of predicting SR crashes. In the most simplistic terms, the CART procedure splits the categories of each variable in the database into all possible binary (two–category) combinations (nodes), calculates the SR risk within each part (node) of each pair, and determines which pair (i.e., which two sets of categories) produces the largest difference in SR risk within that variable. By repeating this process for each variable in the database, CART determines the two sets of categories producing the largest difference in risk of SR crash within each variable. This largest difference in risk is then compared across all variables to determine the one variable (and the set of categories) that produces the largest of all differences. This is the top of the tree, and the two categories within that variable are the first two branches of the tree. This process is then repeated within each of the two categories (branches) of the first variable to identify the second, third, and subsequent variables.

For a categorical variable (e.g., manner of collisions, month of crash, etc.), all possible binary combinations of categories are compared (e.g., category 1 versus categories 2–5, category 1 and 2 versus category 3–5, category 1 and 3 versus categories 2, 4, and 5, etc.). For ordinal variables (e.g., speed limit), all cases with the value of that variable smaller than or equal to a certain value go to one node, all other cases go to the other node (e.g., speed limit £ 30 mi/h versus speed limit ≥35 mi/h; speed limit £ 35 mi/h versus ≥ 40 mi/h, etc.).

CART then outputs a tree showing all branches (i.e., both high and low SR branches). This report shows the section of the tree illustrating up to the first four levels of branches with the highest percentage of SR crashes. Note that CART divides the database being analyzed into a training subset and a validation subset to refine the final output. The results of the training subset are presented in this report, meaning that the total frequency at the top of each tree only shows approximately 2/3 of the total case count shown in the single–variable tables. A description of the results of the CART analysis is provided in the results of CART analyses section.

Further information about CART is available in Breiman et al.(11) For applications of these trees in road safety research, see Stewart and Yan and Radwan.(12,13) Additional statistical details are provided in the appendix of this report.





To examine the question of what occurs in an SR crash, the following crash characteristics are of interest:

  • Manner of collision indicates the orientation of the vehicles in a collision.

  • First harmful event indicates the first instance of property damage or injury–producing event in the crash.

  • Number of nonmotorists involved captures the presence of pedestrians and cyclists in the crash.

To answer the question concerning where SR crashes mostly occur, the following variables are of interest:

  • Location of first harmful event indicates if the crash is an on– or off–roadway crash.

  • Relation to junction indicates if the first harmful event is located within a junction or interchange area.

  • Roadway functional class indicates the functional class of the road where the crash occurs.

  • Speed limit indicates the posted speed limit in miles per hour.

  • Number of lanes indicates the number of lanes of travel. In GES and FARS, if the roadway is a divided trafficway, the number of travel lanes counts only lanes in the direction of travel of the first harmful event. If the roadway is an undivided trafficway, the number of travel lanes are all the lanes regardless of their direction of travel. Since this could produce misleading results (e.g., comparing undivided two–lane, two–way roads to two lanes of a divided road), the number of lanes was doubled for physically divided trafficways in all the analyses.

  • Annual average daily traffic (AADT) per lane is produced by dividing the average daily flow by the number of lanes, indicating the traffic density at the time of the crash.

  • Roadway alignment indicates the horizontal alignment of roadway.

  • Roadway profile indicates the vertical alignment of roadway.

  • Work zone identifies first harmful events that were related to, but did not necessarily occur in, a construction or work zone.

The variables which might be helpful in deciding when SR crashes mostly occur are as follows:

  • The light condition at the time of the crash.

  • The condition of roadway surface at the time of the crash.

  • The atmospheric condition at the time of the crash

  • The season in which the crash occurred.

  • The day of the week when the crash occurred on.

The question concerning who is most likely to be involved in SR crashes is related to the following variables:

  • The age of the driver.

  • The gender of the driver.

  • Police–reported driver use of available vehicle restraints.

  • Distraction which may have influenced driver performance and contributed to the cause of the crash.

  • Physical impairments for all drivers and nonmotorists which may have contributed to the cause of the crash.

  • Police–reported alcohol involvement indicating that the driver had consumed an alcoholic beverage.

  • Police–reported drug involvement indicating that the driver had taken drugs.

  • Number of previous speeding convictions of the driver (FARS only).

  • License type and compliance with license restrictions (FARS only).

Since the "who" question could also involve the vehicle being driven, vehicle characteristics that might be of interest include the following:

  • Body type of the vehicle (i.e., vehicle type).

  • Hazardous cargo involvement for buses and trucks over 9,909 lb gross vehicle weight rating.





Previous | Table of Contents | Next

Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000
Turner-Fairbank Highway Research Center | 6300 Georgetown Pike | McLean, VA | 22101