U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
202-366-4000


Skip to content
Facebook iconYouTube iconTwitter iconFlickr iconLinkedInInstagram

Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations

 
REPORT
This report is an archived publication and may contain dated technical, contact, and link information
Back to Publication List        
Publication Number:  FHWA-HRT-14-081    Date:  November 2014
Publication Number: FHWA-HRT-14-081
Date: November 2014

 

Enhancing Statistical Methodologies For Highway Safety Research – Impetus From FHWA

APPENDIX B. OVERVIEW OF COMMONLY APPLIED METHODS IN ESTIMATING CMFS

The aim of this section is not to go in depth into the most commonly applied methodologies but rather to illustrate how they are applied in road safety research. Selected references are also provided where further information may be found.

EB Before-After Studies

The EB methodology for observational before-after studies is considered rigorous in that it accounts for regression-to-the-mean. In the process, SPFs are used and the use of these addresses the following:

In the EB approach, the change in safety for a given crash type at a site is given by the following:

Figure 7. Equation. Change in safety using the EB approach. This equation shows that the change in safety (delta safety) for a give crash type at a site equals the expected number of crashes that would have occurred in the after period without the strategy (lambda) minus the number of reported crashes in the after period (pi).

Figure 7. Equation. Change in safety using the EB approach.

Where:

λ = expected number of crashes that would have occurred in the after period without the strategy.

π = number of reported crashes in the after period.

In estimating λ, the effects of RTM and changes in traffic volume are explicitly accounted for using SPFs, relating crashes of different types to traffic flow and other relevant factors for each jurisdiction based on untreated sites (reference sites). Annual SPF multipliers are calibrated to account for temporal effects on safety (e.g., variation in weather, demography, and crash reporting).

In the EB procedure, the SPF is used to first estimate the number of crashes that would be expected in each year of the before period at locations with traffic volumes and other characteristics similar to the one being analyzed (i.e., reference sites). The sum of these annual SPF estimates (P) is then combined with the count of crashes (x) in the before period at a strategy site to obtain an estimate of the expected number of crashes (m) before strategy. This estimate of m is computed as follows:

Figure 8. Equation. Expected number of crashes before strategy implementation using the EB approach.  This equation shows that the estimate of m is equal to the weight factor, w, multiplied by the sum of the annual SPF estimates, P, plus 1 minus w multiplied by the count of crashes in the before period, x.

Figure 8. Equation. Expected number of crashes before strategy implementation using the EB approach.

Where w is estimated from the mean and variance of the SPF estimate as follows:

Figure 9. Equation. Estimated weight using the EB approach. This equation shows that the weight factor, w, is equal to 1 divided by the quantity of 1 plus the overdispersion parameter, k, multiplied by the sum of the annual SPF estimates, P.

Figure 9. Equation. Estimated weight using the EB approach.

Where k = constant for a given model and is estimated from the SPF calibration process with the use of a maximum likelihood procedure. In that process, a negative binomial distributed error structure is assumed with k being the overdispersion parameter of this distribution.

A factor is then applied to m to account for the length of the after period and differences in traffic volumes between the before and after periods. This factor is the sum of the annual SPF predictions for the after period divided by P, the sum of these predictions for the before period. The result, after applying this factor, is an estimate of λ. The procedure also produces an estimate of the variance of λ.

The estimate of λ is then summed over all sites in a strategy group of interest (to obtain lambda subscript sum, open parentheses, the variance of lambda subscript sum divided by lambda subscript sum squared, close parentheses.) and compared with the count of crashes observed during the after period in that group ( Pi subscript sum,  open parentheses, the variance of lambda subscript sum divided by lambda subscript sum squared, close parentheses.). The variance of λ is also summed over all sites in the strategy group.

The CMF (θ) is estimated as follows:

Figure 10. Equation. CMF estimate using the EB approach. This equation shows that the CMF estimate, theta, is equal to the count of crashes observed during the after period in a group, pi subscript sum, divided by the sum of the expected number of crashes that would have occurred in the after period without the strategy, lambda subscript sum, divided by 1 plus, open parentheses, the variance of lambda subscript sum divided by lambda subscript sum squared, close parentheses.

Figure 10. Equation. CMF estimate using the EB approach.

The standard deviation of θ is given by the following:

Figure 11. Equation. Standard deviation of the CMF estimate using the EB approach.  This equation shows that the standard deviation of the CMF estimate, theta, is equal to the square root of the following: in the numerator, theta squared multiplied by, open parentheses, the variance of the count of crashes observed during the after period in a group, pi subscript sum, divided by pi subscript sum squared plus the variance of the sum of the expected number of crashes that would have occurred in the after period without the strategy, lambda subscript sum, divided by lambda subscript sum squared; close parentheses. In the denominator, open parentheses, 1 plus the variance of lambda subscript sum divided by lambda subscript sum squared, close parentheses, squared.

Figure 11. Equation. Standard deviation of the CMF estimate using the EB approach.

The percent change in crashes is calculated as 100(1 - θ); thus a value of θ = 0.7 with a standard deviation of 0.12 indicates a 30 percent reduction in crashes with a standard deviation of 12 percent.

Cross-Sectional Studies

Cross-sectional studies are particularly useful for estimating CMFs where there are insufficient instances where the treatment was applied to conduct a before-after study. For example, there may be few or no projects where the shoulder is widened, for example, from 4 to 6 ft. However, there would be many road segments with 4-ft shoulders and many with 6-ft shoulders. The reason that before-after studies are impractical in such cases is that there are often not enough before-after situations to allow for credible results.

In practice, it is difficult to collect data for enough locations that are alike in all factors affecting crash risk. Hence, cross-sectional analyses are often accomplished through multiple variable regression models. In these models an attempt is made to account for all variables that affect safety. If such attempts are successful, the models can be used to estimate the change in crashes that results from a unit change in a specific variable. The CMF is derived from the model parameters. The regression approach for estimating a CMF is consistent with the belief that the CMF is a function of the traits of the treated unit. A cross-sectional approach can be used to develop a CMFunction, and is preferable if the cause-effect relationship with crashes can be determined with confidence.

CMFs estimated from cross-section studies could be inaccurate for a number of reasons, including inappropriate functional form, omitted variable bias, or correlation of variables. It is common practice to use generalized linear modeling techniques, assuming a negative binomial error structure, to estimate multivariate crash prediction models. However, it is difficult to account for all factors that affect safety using such modeling techniques. For example, intersections with left-turn lanes also tend to have illumination. If a crash prediction model is used to estimate a CMF for left-turn lanes, and the presence of illumination is not accounted for in the model, the difference in model predictions with and without left-turn lanes could be partly due to illumination differences. Ironically, it is precisely because a variable is found to be correlated with another variable that it may be omitted during the model fitting exercise. Including correlated variables could in fact lead to effects that are counterintuitive (e.g., illumination increases night time crashes).

Overview of Case-Control Studies

Case-control methods have been used in certain areas of highway safety, but few have focused on the effects of geometric design elements. For example, case-control studies have been applied to investigate the effectiveness of motorcycle-helmet use and the crash risk of hours of service for truck drivers. More recently, the case-control method was employed to estimate CMFs for geometric design elements, including lane and shoulder width. Case-control studies assess whether exposure to a potential risk factor is disproportionately distributed between the cases and controls, thereby indicating the likelihood of an actual risk factor.

The likelihood of an actual risk factor is expressed as the odds ratio between two levels of a variable. For example, it may be found that the odds of a crash occurring on horizontal curves with a degree of curvature greater than 15 degrees is 1.5 times the odds of a crash occurring on curves less than 15 degrees. The odds ratio is a direct estimate of the CMF. Risk factors may take the form of binary variables (e.g., median barrier, roadway lighting, or guiderail) or multi-level variables such as lane width (e.g., 9-, 10-, 11-, and 12-ft lanes). The sample is summarized by risk factor and case-control status to calculate the odds ratio. To illustrate the concept of the odds ratio, consider the data in table 2.

Table 2. Tabulation for simple case-control analysis.
Risk Factor Number of Cases Number of Controls
With A B
Without C D

The odds ratio (CMF) is expressed as the expected increase or decrease in the outcome in question due to the presence of the risk factor. An odds ratio greater than 1.0 suggests that the presence of the risk factor increases risk, while a value less than 1.0 would suggest a decrease in risk. Using the notation in the table the odds ratio (OR) is calculated as:

Figure 12. Equation. Odds ratio calculation. This equation shows that the CMF is the odds ratio, taken as the number of cases with the risk factor A divided by the number of cases without the risk factor C, divided by the number of controls with the risk factor B divided by the number of controls without the risk factor D.

Figure 12. Equation. Odds ratio calculation.

Case-control studies cannot be used to measure the probability of an event (e.g., crash, severe injury, etc.) in terms of expected frequency. They are more often used to show the relative effects of risk factors. Statistical analyses, such as multiple logistic regression techniques, are commonly used to clarify these relationships because they are able to examine the risk associated with one factor while controlling for other factors.

Finally, the case-control method cannot demonstrate causality because there is no time sequence of events in the analysis. Instead, the odds ratio indicates the increased likelihood of a crash occurring when a risk factor (e.g., roadway characteristic is present. It does not, however, recognize differences between locations with many crashes or a single crash. This is a loss of potentially important information and thus, the true increase in risk could be underestimated.

 

 

Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000
Turner-Fairbank Highway Research Center | 6300 Georgetown Pike | McLean, VA | 22101