U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590

Skip to content
Facebook iconYouTube iconTwitter iconFlickr iconLinkedInInstagram

Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations

This summary report is an archived publication and may contain dated technical, contact, and link information
Back to Publication List        
Publication Number:  FHWA-HRT-14-077     Date:  July 2014
Publication Number: FHWA-HRT-14-077
Date: July 2014


The Exploratory Advanced Research Program Fact Sheet

Utilizing Various Data Sources for Surface Transportation Human Factors Research

WORKSHOP SUMMARY REPORT   •   November 6-7, 2013

Day Two: Discussion and Summary

Following the presentations, the second day of the workshop used expert panel and small group discussion to identify research gaps and recommendations.

Expert Panel Discussion

Moderated by Dr. Donald Fisher
University of Massachusetts Amherst

Day two began with an expert panel discussion session, in which the seven presenters served as panelists. The discussion focused on three examples of where researchers need to take advantage of the information provided by multiple datasets. The panel moderator stated that the goal is to link risk, prevalence, and ecology of behaviors to crashes, an effort that may require resolving contradictions (contradictory datasets), creating linkages among the datasets (complementary datasets), and generating entirely new datasets (comprehensive datasets). These datasets are outlined below.

Contradictory Datasets

In some cases, different datasets lead to different conclusions, for example, the information available on the increase in crash risks caused by cell phones. Simulator studies of cell phone records lead to one conclusion, naturalistic studies to another conclusion, and retrospective studies to contradictory conclusions. The moderator asked the panelists to provide examples of contradictory datasets and methods one might undertake to resolve the controversies.

Complementary Datasets

In some cases, information is available on driver and other road-user behaviors in different datasets that appears to be complementary but that is not formally linked. These complementary datasets can radically expand the ability to understand increases in risk tied to particular behaviors in a given scenario, the likelihood of those behaviors in the selected scenarios, and the prevalence of the scenarios. For example, information can be gathered on advanced yield markings at marked midblock crosswalks from the glance and yielding behaviors of drivers on the simulator. In addition, information can be gathered in the field using semi-controlled studies and naturalistic studies, and also from field observational studies.

Panelists noted simulator studies are well-suited for providing information on the increase in risky behaviors in particular scenarios but not the likelihood of such behaviors in these scenarios, or the prevalence of the scenarios. Naturalistic studies can also provide information on the prevalence of particular scenarios but cannot be so easily used to identify the increase in risk in the scenarios that can be attributed to particular behaviors. With the rapid increase over the last decade in multiple complementary datasets, it is now possible to provide information on the increase in risk that a particular behavior creates in a given scenario, the likelihood that the driver engages in the behavior, and the prevalence of the scenario.

The moderator asked panelists to identify which datasets are best suited to providing information about the risk of particular behaviors, the likelihood that drivers engage in those behaviors in particular scenarios, and the prevalence of the scenarios.

Comprehensive Datasets

Panelists noted an ever-increasing ability to predict the incidence of crashes and near-crashes from knowledge gained about the risk and prevalence of driver behaviors. A comprehensive dataset contains information on behaviors and crashes at a particular location; however, although such datasets do not yet exist, they could in the near future. This could include a dataset at a busy intersection that could provide information on a range of risky driver behaviors, the prevalence of those behaviors, and the frequency of crashes. An overarching model of driver behavior is required that is sensitive to factors, such as driver state and the roadway environment. This model would not only predict when drivers engage in risky behaviors but also predict the likelihood of a crash or near-crash when the driver is engaging in a particular behavior. The datasets will need to include where actual crashes are recorded, and panelists were asked how they might go about creating such datasets.

In summary, the moderator asked panelists to discuss what sorts of issues they were studying that required the use of contradictory, complementary, or comprehensive datasets. Panelists gave several examples of these issues encountered in their research, as outlined below.

Contradictory Datasets—Examples from Panelists

The moderator highlighted the following examples of contradictory datasets:

The panellists noted several reasons for these contradictions:

Complementary Datasets—Examples from Panelists

The moderator highlighted the following examples of complementary datasets:

Comprehensive Datasets—Examples from Panelists

The moderator highlighted the following examples of comprehensive datasets:

The panelists noted that researchers need to understand when data are corrupted, for example, a top-down approach can help when the bottom-up approach is in error.

General Discussion

Following the panelist session, the group discussed the following topics:

Current Datasets

Future Datasets

In summary, the panelists noted that there are two different ways of viewing how best to deal with multiple, contradictory datasets as follows:


In conclusion, the panel members considered what type of study would be needed to (1) understand how to resolve long-standing contradictions among different datasets; (2) allow for the use of complementary datasets to generate information on the risk of different behaviors, their likelihood, and the prevalence of the scenarios in which they occur; and (3) generate a comprehensive dataset that links behaviors and crashes. Panelists were unanimous in recommending that there should be an attempt to understand how to use the different types of data in a study, which includes the following components:

The panelists suggested intersections, road departures, and connected vehicles as possible areas of focus for the study. They noted that intersections have many characteristics suitable to the study, are one of the best places to study V2V communications, and perhaps are one of the only places to study vehicle-to-pedestrian communications. Moreover, the three major issues surrounding multiple datasets can be studied at intersections. In particular, the data from naturalistic and simulator studies, which often lead to different estimates of risk, can easily be compared at intersections to determine why the various contradictions among datasets exist. Information on both the risk of a particular behavior (e.g., the risk that failing to take a secondary glance has on crashing, given that a vehicle materializes when the driver fails to take the glance), the likelihood of a particular behavior in a given scenario (e.g., the likelihood of taking a secondary glance), and the prevalence of the scenario (e.g., the prevalence of situations in which the driver fails to take a secondary look and a vehicle materializes) can be used to create complementary datasets. Finally, given the high incidence of crashes at selected intersections, the behavioral data can be combined with the crash and near-crash data to generate comprehensive datasets.

Group Session Summary


For the final session of the workshop, participants gathered into groups to discuss three key topics relating to data needs for human factors research: (1) driver–driver and other road user data, (2) driver–vehicle data, and (3) driver–infrastructure and roadway data. Following extensive group discussion, involving multidisciplinary experts from government, academia, and industry, the participants reconvened, summarized their findings, and made recommendations. This section presents the overall findings of the breakout groups.


Driver–Driver and Other Road Users' Data for Human Factors Research

Need for Pedestrian and Bicyclist Exposure Data

Normative Variations in Pedestrian Behavior

Hazards and Solutions in Roadway Design and Setup

Technology for Pedestrian and Bicyclist Safety

There are potential ITS technologies that can improve pedestrian and bicycle safety. DSRC devices that transmit information through mobile devices may be an option in the future to improve pedestrian and bicyclist safety. At present, the ITS Joint Program Office's Connected Vehicle program is exploring the feasibility of introducing pedestrian DSRC by using a smartphone. In addition, Volvo has recently introduced a pedestrian detection system, and bicycle manufacturers are starting to introduce auto brakes for bicycles, in addition to external airbags.

Driver–Vehicle Data for Human Factors Research

Participants in this group illustrated how to sequence methodologies to conduct research, by using multiple data sources. The process to address driver–vehicle research is detailed as follows:

Naturalistic Scenario Sampling and Problem Definition

There are many methods that can be used to monitor interactions among road users, such as TMC, traffic cameras, and social media. Traffic cameras can be used because of their availability; they can reveal possible navigation issues, traffic movements and conflicts; and they can recognize trends in traffic. Participants also suggested social media offers an alternate source of naturalistic data, for example, applications such as Twitter and Waze can also provide traffic and road information.


Simulation is useful to examine an issue in detail and to explore conflict situations not readily detected from observation. With simulation, it is possible to vary the frequency of the driver's exposure to an intervention and to analyze the resultant driving behavior. For example, simulation is useful to study gap acceptance. Because of a disproportionate number of fatal accidents at rural intersections, Wisconsin DOT used simulation as an initial tool to test whether different types of signage would affect gap rejection and to encourage the acceptance of safer gaps, prior to using the more expensive on-road testing of alternative signage.

There are other issues in which the utility of simulation is constrained because of the lack of exposure. For example, roadway departure crashes account for about half of all vehicular fatalities but these scenarios are challenging to replicate in simulation.12 It is very difficult to replicate the contributing factors, such as fatigue, to roadway departure. Simulation gives useful null results, but it is not possible to get a good understanding of roadway departure because it is impossible to replicate the scenarios, and the simulation process lacks sufficient exposure data.

Intervention Development and Evaluation

By using response data obtained from simulators and information culled from naturalistic data, it is possible to design and develop interventions to modify behavior and to meet safety needs. Although the simulator results in Wisconsin on gap acceptance were not definitive, they did provide trend information, which became the basis for subsequent road testing of signage most likely to foster acceptance of safer gaps.

Field Operational Test Data: Data from the field can provide complementary information about the effects of a new or modified intervention on drivers and vehicles.

Model-Based Benefit Estimation: Societal benefits can be estimated based on the effectiveness of interventions evaluated in experiments. Short-term benefits provided by treatments that have a significant impact (e.g., crash worthiness) can be identified and measured after implementation. Long-term benefits will change and evolve as the population of users adapt to the intervention.

Policy Design: Transportation policies are outlined based on evaluation of interventions, in addition to societal benefits.

Bayesian and Model-Based Surveillance: Model-based analysis provides a measurement to evaluate scenarios after an intervention has been implemented. Continuous surveillance can serve as a method to simultaneously collect new naturalistic data for a newly identified problem. In this case, this step will provide feedback to the procedure for a new strategy.


Driver–Infrastructure and Roadway Data for Human Factors Research

Connected Vehicle Technology and Driver Behavior

One of the key topics that participants raised during discussion was the importance of researching how connected-vehicle technology will impact driver behavior. For example, connected-vehicle technology has the ability to warn following vehicles if they are not slowing down in response to a slowing lead vehicle, even if the lead vehicle is several vehicles ahead. It is important to understand whether drivers believe that the warning is specific to them, because they will be more likely to acknowledge it. There is also an issue of whether the warning should be placed in the infrastructure or in the vehicle and how the alternative venues might affect compliance. Finally, there is also a need to explore connected-vehicle signage options, in terms of alternative methods of presentation and where the signage should reside in the infrastructure.

Operation and Safety

Participants identified several areas where operation and safety are linked. The operational simulation models, which include car following and lane modeling, raise human factors issues. It is important to calibrate these models correctly in the simulator to obtain the surrogate measurements to be applied to the real-world behaviors.

Research Priorities

Participants identified the following areas of priority concerning safety: roadway departure, urban intersections, vehicle and pedestrian to bicyclist interaction, and data analysis. Participants suggested a good synthesis project for roadway departures could ask a basic question such as, “When does the driver begin reacting to the curve?” There is also a need to evaluate the effectiveness of current signage used on roadways. Specifically, researchers want to know if and how current signs affect driving behavior, for example, when approaching a curve or in urban intersections, where all types of roadway users meet, and in left-turn conflicts. It was noted that there is difficulty running scenarios, such as speed perception, in simulators because of the inability to measure lateral acceleration in these settings.

Data Sources

There are existing data sources that can be mined at low cost to provide insight, such as TMC data. Researchers need to look into how to combine different data sources to gain more powerful insights than can any one dataset provide. Simulation was suggested as a good tool for identifying issues.

Workshop Recommendations

Participants identified many areas of priority for human factors research, which could make use of the expanding datasets now available and soon to be available. These include modeling, safety, roadway departure, urban intersections, vehicle, pedestrian and bicyclist interaction, and data analysis. Several items were suggested for further research, as follows:

To advance understanding and use of multiple data types, the participants recommended a study, possibly focused at intersections, that includes multiple sites, multiple data types gathered at each site, multiple user types, and multiple methods of analysis. This study could provide critical information on how to resolve contradictions among datasets, how to put together complementary datasets that describe risky behaviors, and how to generate comprehensive datasets that link behaviors and crashes.

12 Federal Highway Administration Safety Program. Retrieved June 13, 2014, from http://safety.fhwa.dot.gov/roadway_dept/.


Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000
Turner-Fairbank Highway Research Center | 6300 Georgetown Pike | McLean, VA | 22101