U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
202-366-4000


Skip to content
Facebook iconYouTube iconTwitter iconFlickr iconLinkedInInstagram

Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations

Report
This report is an archived publication and may contain dated technical, contact, and link information
Publication Number: FHWA-HRT-10-024
Date:April 2010

Development of a Speeding-Related Crash Typology

PDF Version (1.53 MB)

PDF files can be viewed with the Acrobat® Reader®

 

RESULTS OF CART ANALYSES

 

NORTH CAROLINA AND OHIO SR VEHICLES/DRIVERS

North Carolina Vehicle-Based CART Results

Ohio Vehicle-Based CART Results

SUMMARY CONCLUSIONS ON VEHICLE/DRIVER-RELATED CART ANALYSES

CART Results for Data Subset Analyses

Pedestrian Crashes

Intersection-Related Crashes

Lane Departure Crashes

 

NORTH CAROLINA AND OHIO SR VEHICLES/DRIVERS

North Carolina Vehicle-Based CART Results

The initial analysis with the North Carolina vehicle file using the combined SR definition includes airbag availability/deployment as the second–level branch variable. Because this variable is difficult to interpret (i.e., it is both an indication of car age and of the severity of the crash), it was removed, and a second CART analysis was conducted. As shown in figure 10, the revised analysis shows that the most important SR predictive variable is driver age. The categories with the highest SR percentage are 16–25–year–old drivers (14 percent). The next most important variable is gender, and the category with the highest SR percentage is male. Within that branch, the most important variable is driver distraction, and the categories with the highest SR percentage include not distracted and sleepy/fell asleep. Finally, the most important fourth–level branch variable is vehicle body type, and the categories with the highest SR percentage are automobiles, utility vehicles, motorcycles, and other vehicles except trucks and buses. This final level provides little additional information. Thus, in general, this analysis using the liberal combined SR definition indicates that the driver/vehicle subset most likely to be involved in SR crashes are young drivers who are males and who are not distracted by anything other than being fatigued/sleepy. Approximately 18 percent of the vehicles in this final subset are SR, and the subset includes approximately 9,200 of the 492,000 vehicles in the North Carolina training set–approximately 2 percent of the total North Carolina vehicle sample and 21 percent of the SR sample. As shown in figure 11, when the North Carolina vehicle sample is analyzed using the conservative over speed limit definition, the most important SR predictive variable is alcohol involvement, with the highest SR category being yes (12 percent SR). The next variable is restraint system used, with the highest SR categories including none, motorcycle helmet, and shoulder belt only. The motorcycle helmet indication is a vehicle–type indicator. Thus, in general, this analysis using the over speed limit SR definition indicates that the driver/vehicle subset most likely to be involved in SR crashes are drivers who are drinking and who do not use restraints. Approximately 20 percent of the vehicles in this final subset are SR, and the subset includes approximately 500 of the 492,000 vehicles in the North Carolina training set–less than 0.1 percent of the total vehicle sample and approximately 5 percent of the SR sample, which are both small percentages.

This figure shows part of a classification and regression tree (CART) with data from the North Carolina database, with the top node showing the number of total speeding-related (SR) drivers/vehicles in North Carolina using the combined SR definition (44,257) and the percentage of total North Carolina crash-involved drivers/vehicles that are SR using this definition (9.0 percent). The tree then branches into four levels. The most important SR predictive variable (top tree branch) is age of driver, which has two branches. The category with the highest SR percentage is 16–25-year-old drivers (13.6 percent). Within that branch, the most important variable is gender, which has two branches. The category with the highest SR percentage is male (15.7 percent). Within that branch, the most important variable is driver distraction, which has two branches. The categories with the highest SR percentage include not distracted and sleepy/fell asleep (but not including inattentive or driver distracted) (16.7 percent). Finally, within that branch, the most variable is vehicle body type, which has two branches. The categories with the highest SR percentage are automobiles, utility vehicles, motorcycles, and other vehicles except trucks and buses (17.5 percent).

Figure 10. Chart. CART results for North Carolina vehicle–based variables using the combined SR definition (2002–2004).

 

This figure shows part of a classification and regression tree (CART) with data from the North Carolina database, with the top node showing the number of total speeding-related (SR) drivers/vehicles in North Carolina using the over speed limit definition (9,019) and the percentage of total North Carolina crash-involved drivers/vehicles that are SR using this definition (1.8 percent). The tree then branches into two levels. The most important SR predictive variable (top tree branch) is alcohol involvement, which has two branches. The category with the highest SR percentage indicates alcohol is involved (12.2 percent). Within that branch, the most important variable is restraint system use, which has two branches. The categories with the highest SR percentage include none, motorcycle helmet, and shoulder belt only (19.9 percent).

Figure 11. Chart. CART results for North Carolina vehicle–based variables using the over speed limit SR definition (2002–2004).

 

Top

 

Ohio Vehicle–Based CART Results

The initial output from the Ohio vehicle file using the combined SR definition includes airbag availability/deployment as the second–level branch variable. Because this variable was difficult to interpret (i.e., it was both an indication of car age and of the severity of the crash) it was removed, and a second CART analysis was conducted. In this analysis reviewing airbag removal in figure 12, the most important SR predictive variable is driver age. The category with the highest SR percentage is 16–25–year–old drivers (10 percent SR). Within that branch, the most important variable is gender, and the category with the highest SR percentage is male. Within that branch, the most important branch variable is vehicle body type, and the categories with the highest SR percentage include automobiles and motorcycles (i.e., no utility vehicles, trucks, etc.). Finally, within this branch, the fourth–level variable is again driver age, but the difference in SR percentage between the 16–19–year–old drivers and the 20–25–year–old drivers is small. Thus, in general, this analysis using the liberal SR definition indicates that the driver/vehicle subset most likely to be involved in SR crashes is younger drivers (up to age 25) who are male and who are driving either automobiles or motorcycles. Approximately 12 percent of the vehicles in this final subset are SR, and the subset includes approximately 6,400 of the 547,000 vehicles in the Ohio training set–approximately 1 percent of the total vehicle sample and 17 percent of the SR sample.

When the Ohio vehicle file was analyzed using the over speed limit definition, the initial run also included the difficult–to–interpret airbag availability/deployment variable. This variable was then removed, and a second CART analysis was conducted. In figure 13, the most important SR predictive variable is driver age. The category with the highest SR percentage is 16–25–year–old drivers (7 percent). The most important variable is gender, and the category with the highest SR percentage is male. Within that branch, the most important branch variable is driver restraint system use, and the categories with the highest SR percentage include none, motorcycle helmet, and lap belt only. The none category dominates this grouping. Finally, within this branch, the fourth–level variable is vehicle body type, and the categories with the highest SR percentage include automobiles, utility vehicles, van–based light trucks, and motorcycles, which is a mix of types that does not provide much useful information. Thus, in general, this analysis using the conservative over speed limit SR definition indicates that the driver/vehicle subset most likely to be involved in SR crashes is younger drivers (up to age 25) who are male, who do not use restraints, and who drive a variety of vehicle types. Approximately 20 percent of the vehicles in this final subset are SR, and the subset includes approximately 615 of the 547,000 vehicles in the Ohio training set–approximately 0.1 percent of the full sample and 3 percent of the SR sample, which are small percentages.

 

This figure shows part of a classification and regression tree (CART) with data from the Ohio database, with the top node showing the number of total speeding-related (SR) drivers/vehicles in Ohio using the combined SR definition (38,001) and the percentage of total Ohio crash-involved drivers/vehicles that are SR using this definition (6.9 percent). The tree then branches into four levels. The most important SR predictive variable (top tree branch) is age of driver, which has two branches. The category with the highest SR percentage is 16–25-year-old drivers (10.4 percent). Within that branch, the most important variable is gender, which has two branches. The category with the highest SR percentage is male (11.7 percent). Within that branch, the most important variable is vehicle body type, which has two branches. The categories with the highest SR percentage are automobiles and motorcycles (no utility vehicles, trucks, etc.) (12.4 percent). Finally, within this branch, the most important variable is age of driver, which has two branches. The difference in SR percentage between the two branches is small (20–25-year-old drivers (12.4 percent) and 16–19-year-old drivers (12.3 percent)).

Figure 12. Chart. CART results for Ohio vehicle–based variables using the combined SR definition (2003–2005).

 

This figure shows part of a classification and regression tree (CART) with data from the Ohio database, with the top node showing the number of total speeding-related (SR) drivers/vehicles in Ohio using the over speed limit definition (23,018) and the percentage of total Ohio crash-involved drivers/vehicles that are SR using this definition (4.2 percent). The tree then branches into four levels. The most important SR predictive variable (top tree branch) is age of driver, which has two branches. The category with the highest SR percentage is 16–25-year-old drivers (6.5 percent). Within that branch, the next most important variable is gender, which has two branches. The category with the highest SR percentage is male (7.7 percent). Within that branch, the most important variable is driver restraint system use, which has two branches. The categories with the highest SR percentage are none, motorcycle helmet, and lap-belt only (18.0 percent). Finally, within this branch, the most important variable is vehicle body type, which has two branches. The categories with the highest SR percentage include automobiles, utility vehicles, van-based light trucks, and motorcycles (19.8 percent).

Figure 13. Chart. CART results for Ohio vehicle–based variables using the over speed limit SR definition (2003–2005).

 

Top

 

SUMMARY CONCLUSIONS ON VEHICLE/DRIVER–RELATED CART ANALYSES

The findings from the vehicle–based analyses differed across databases and definitions. Using the liberal combined SR definition, FARS noted driver alcohol use as the most important descriptor, GES indicated distracted drivers as the most important descriptor, and North Carolina and Ohio data indicated young drivers as the most important descriptors. The over speed limit analysis in North Carolina indicated young drivers up to age 35 who are drinking while not using restraint systems as the most important descriptors, while the Ohio data indicated young males not using restraints as the most important descriptors. The one theme in most (but not all) of the results is that they all include young males.

 

Top

 

CART Results for Data Subset Analyses

As described in the single–variable results, additional analyses were conducted for SR crashes in five subsets of the FARS and GES data: (1) pedestrian SR crashes, (2) intersection–related SR crashes, (3) lane departure SR crashes, (4) rural SR crashes, and (5) urban SR crashes. In additional to the previously described single–variable analyses, CART analyses were conducted for the FARS and GES crash–related variables in the first three of these subsets. Unlike the previous CART analyses in which all crash–related variables were examined by the CART software, only the selected group of variables were examined. These included the following:

  • Pedestrian crashes: rural/urban, functional class, day/night (using light condition), speed limit, relationship to junction, and number of travel lanes.

  • Intersection crashes: rural/urban, functional class, day/night, speed limit, traffic control device, number of travel lanes, alignment, and grade.

  • Lane departure crashes: rural/urban, day/night, speed limit, number of travel lanes, alignment, grade, and surface condition.

Note that the actual subset that CART uses was all pedestrian, intersection, and lane departure crashes. It then examined each of the variables and determined which was the best SR predictor in the initial step. The results of these limited CART analyses are described below. By definition, these are analyses of a restricted set of variables and that CART results for the full set of crash–related variables within each subset can produce different findings.

 

Top

 

Pedestrian Crashes

The CART results for the fatal FARS SR crashes using the limited set of crash–related variables only produced one branch/level (see figure 14). The most important SR predictive variable is speed limit, and the categories with the highest SR percentage are speed limits of 65–75 mi/h. Approximately 16 percent of these pedestrian crashes are SR, and this subset accounts for approximately 2 percent of the total fatal pedestrian crashes and approximately 17 percent of the total SR sample, which is a relatively small subset.

 

This figure shows part of a classification and regression tree (CART) with data from the Fatality Analysis Reporting System (FARS), with the top node showing the number of fatal speeding-related (SR) crashes involving pedestrians (273) and the percentage of total fatal pedestrian crashes that are SR (10.0 percent). The tree then branches into only one level. The most important SR predictive variable is speed limit, which has two branches. The category with the highest SR percentage is speed limits of 65–75 mi/h (15.5 percent).

Figure 14. Chart. CART results for FARS pedestrian subset using limited crash–based variables.

While a CART analysis was conducted for the companion SR pedestrian crashes in the GES database, it did not produce any branches. Thus, the analyses of these limited sets of variables for SR pedestrian crashes indicates that fatal crashes are more likely to be roads with the highest speed limits, but there are no variables that predict higher SR involvement for the full crash dataset found in GES.

 

Top

 

Intersection–Related Crashes

The CART analyses for the fatal FARS SR intersection crashes using the limited set of variables produced a two–level tree (see figure 15). The most important SR predictive variable is light condition, and the categories with the highest SR percentage include dark, dark but lighted, and dawn (28 percent). Within that branch, the next variable is roadway alignment, with the highest SR category being curves. This final subcategory is 53 percent and accounts for approximately 2 percent of the total fatal intersection–related sample and approximately 10 percent of the total SR intersection sample.

The companion CART analyses of the GES intersection crashes using the limited set of variables produced a three–level tree (see figure 16). The most important SR predictive variable is light condition, and the categories with the highest SR percentage include dark, dark but lighted, and dawn–the same as in the FARS results. Then within that branch, the most important variable is speed limit, with the highest SR category being zero to 35 mi/h. Within that branch, the most important variable is traffic control, with the highest SR categories being no control (as opposed to traffic signals and stop/yield). This final subset is 30 percent SR. It includes approximately 0.8 percent of the total GES intersection crashes in this training set and approximately 5 percent of the total SR intersection crashes.

The FARS and CART results were consistent with respect to the most important predictor variable, which shows that SR crashes are more likely at night. There is less consistency after that, with the fatal intersection results showing that intersections on curves are of interest (which might imply intersection in rural areas where curves are more likely), while the total (GES) results indicate the most important intersections are those with lower speed limits (likely urban) and no stop/yield sign or traffic signal in uncontrolled urban intersections.

 

This figure shows part of a classification and regression tree (CART) with data from the Fatality Analysis Report System, with the top node showing the number of fatal speeding-related (SR) crashes occurring at intersections (1,088) and the percentage of total fatal intersection crashes that are SR (19.2 percent). The tree then branches into two levels. The most important SR predictive variable (the top branch) is light condition, which has two branches. The categories with the highest SR percentage include dark, dark but lighted, and dawn (28.0 percent). Within that branch, the most important variable is roadway alignment, which has two branches. The category with the highest SR percentage is curves (53.1 percent).

Figure 15. Chart. CART results for FARS intersection subset using limited crash–based variables.

 

This figure shows part of a classification and regression tree (CART) with data from the General Estimates System (GES), with the top node showing the number of speeding-related (SR) crashes occurring at intersections (266,601) and the percentage of total intersection crashes that are SR (16.3 percent). The tree then branches into three levels. The most important SR predictive variable (the top tree branch) is light condition, which has two branches. The categories with the highest SR percentage include dark, dark but lighted, and dawn (18.1 percent). Then within that branch, the most important variable is speed limit, which has two branches. The category with the highest SR percentage is zero to 35 mi/h (20.1 percent). Within that branch, the most important variable is traffic control, which has two branches. The category with the highest SR percentage is no control (as opposed to traffic signals and stop/yield (29.8 percent).

Figure 16. Chart. CART results for GES intersection subset using limited crash–based variables.

 

Top

 

Lane Departure Crashes

The CART analyses for the fatal FARS SR lane departure crashes using the limited set of variables produced a four–level tree (see figure 17). The most important SR predictive variable is the lane departure variable produced for this analysis, and the category with the highest SR percentage is the single–vehicle, run–off–road category. Within that branch, the most important variable is light condition, and the categories with the highest SR percentage include dark, dark but lighted, and dusk. Within that branch, the next variable is roadway alignment, and the category with the highest SR percentage is curves. Finally, within the curve branch, the final variable is speed limit, and the category with the highest SR percentage includes all speeds of zero to 50 mi/h. This final subcategory is 63 percent of SR crashes and includes approximately 6 percent of the fatal lane departure crashes and approximately 16 percent of the fatal SR lane departure crashes.

The companion CART analyses of the GES lane departure crashes using the limited set of variables produced a two–level tree (see figure 18). The most important SR predictive variable is surface condition, and the categories with the highest SR percentage include wet, snow, slush, sand, dirt, oil, and other. Within that branch, the next variable is speed limit, with the highest SR percentage being for speed limits of 65–75 mi/h–the highest speed limit categories. This final subcategory is 62 percent of SR crashes and includes approximately 3 percent of the total lane departure crashes in this training set and approximately 10 percent of the SR lane departure crashes.

The results from FARS and GES analyses for fatal and total lane departure crashes are not consistent. The most important fatal lane departure crashes in terms of SR are those at night, on curves, and on roads with speed limits less than 55 mi/h. The most important total lane departure crashes are those during bad weather on roads with the highest speed limits (i.e., interstate roads).

 

This figure shows part of a classification and regression tree (CART) with data from the Fatality Analysis Reporting System (FARS), with the top node showing the number of fatal speeding-related (SR) crashes involving a lane departure (5,457) and the percentage of total fatal lane-departure crashes that are SR (38.3 percent). The tree then branches into four levels. The most important SR predictive variable (the top branch) is run off road, which has two branches. The category with the highest SR percentage is single vehicle run-off-road (44.7 percent). Within that branch, the most important variable is light condition, which has two branches. The categories with the highest SR percentage include dark, dark but lighted, and dusk (49.7 percent). Within that branch, the most important variable is roadway alignment, which has two branches. The category with the highest SR percentage is curves (58.4 percent). Finally, within that branch, the most important variable is speed limit, which has two branches. The category with the highest SR percentage is zero to 50 mi/h (63.3 percent).

Figure 17. Chart. CART results for FARS lane departure subset using limited crash–based variables.

 

This figure shows part of a classification and regression tree (CART) with data from the General Estimates System (GES), with the top node showing the number of speeding-related (SR) crashes involving a lane departure (291,635) and the percentage of total lane-departure crashes that are SR (34.1 percent). The tree then branches into two levels. The most important SR predictive variable (the top tree branch) is surface condition, which has two branches. The categories with the highest SR percentage are wet, snow, slush, sand, dirt, oil, and other (47.4 percent). Within that branch, the most important variable is speed limit, which has two branches. The category with the highest SR percentage is 65–75 mi/h (62.0 percent).

Figure 18. Chart. CART results for GES lane departure subset using limited crash–based variables.

 

Top

 

FHWA-HRT-10-024

 

Previous | Table of Contents | Next

Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000
Turner-Fairbank Highway Research Center | 6300 Georgetown Pike | McLean, VA | 22101