SUMMARY REPORT

This summary report is an archived publication and may contain dated technical, contact, and link information

Publication Number: FHWA-HRT-14-077 Date: July 2014

Publication Number: FHWA-HRT-14-077
Date: July 2014

The Exploratory Advanced Research Program Fact Sheet

Utilizing Various Data Sources for Surface Transportation Human Factors Research

WORKSHOP SUMMARY REPORT • November 6-7, 2013

Day Two: Discussion and Summary

Following the presentations, the second day of the workshop used expert panel and small group discussion to identify research gaps and recommendations.

Expert Panel Discussion

Moderated by Dr. Donald Fisher
University of Massachusetts Amherst

Day two began with an expert panel discussion session, in which the seven presenters served as panelists. The discussion focused on three examples of where researchers need to take advantage of the information provided by multiple datasets. The panel moderator stated that the goal is to link risk, prevalence, and ecology of behaviors to crashes, an effort that may require resolving contradictions (contradictory datasets), creating linkages among the datasets (complementary datasets), and generating entirely new datasets (comprehensive datasets). These datasets are outlined below.

Contradictory Datasets

In some cases, different datasets lead to different conclusions, for example, the information available on the increase in crash risks caused by cell phones. Simulator studies of cell phone records lead to one conclusion, naturalistic studies to another conclusion, and retrospective studies to contradictory conclusions. The moderator asked the panelists to provide examples of contradictory datasets and methods one might undertake to resolve the controversies.

Complementary Datasets

In some cases, information is available on driver and other road-user behaviors in different datasets that appears to be complementary but that is not formally linked. These complementary datasets can radically expand the ability to understand increases in risk tied to particular behaviors in a given scenario, the likelihood of those behaviors in the selected scenarios, and the prevalence of the scenarios. For example, information can be gathered on advanced yield markings at marked midblock crosswalks from the glance and yielding behaviors of drivers on the simulator. In addition, information can be gathered in the field using semi-controlled studies and naturalistic studies, and also from field observational studies.

Panelists noted simulator studies are well-suited for providing information on the increase in risky behaviors in particular scenarios but not the likelihood of such behaviors in these scenarios, or the prevalence of the scenarios. Naturalistic studies can also provide information on the prevalence of particular scenarios but cannot be so easily used to identify the increase in risk in the scenarios that can be attributed to particular behaviors. With the rapid increase over the last decade in multiple complementary datasets, it is now possible to provide information on the increase in risk that a particular behavior creates in a given scenario, the likelihood that the driver engages in the behavior, and the prevalence of the scenario.

The moderator asked panelists to identify which datasets are best suited to providing information about the risk of particular behaviors, the likelihood that drivers engage in those behaviors in particular scenarios, and the prevalence of the scenarios.

Comprehensive Datasets

Panelists noted an ever-increasing ability to predict the incidence of crashes and near-crashes from knowledge gained about the risk and prevalence of driver behaviors. A comprehensive dataset contains information on behaviors and crashes at a particular location; however, although such datasets do not yet exist, they could in the near future. This could include a dataset at a busy intersection that could provide information on a range of risky driver behaviors, the prevalence of those behaviors, and the frequency of crashes. An overarching model of driver behavior is required that is sensitive to factors, such as driver state and the roadway environment. This model would not only predict when drivers engage in risky behaviors but also predict the likelihood of a crash or near-crash when the driver is engaging in a particular behavior. The datasets will need to include where actual crashes are recorded, and panelists were asked how they might go about creating such datasets.

In summary, the moderator asked panelists to discuss what sorts of issues they were studying that required the use of contradictory, complementary, or comprehensive datasets. Panelists gave several examples of these issues encountered in their research, as outlined below.

Contradictory Datasets—Examples from Panelists

The moderator highlighted the following examples of contradictory datasets:

Effects of driver interaction with warning signs in cooperative intersections in the field often contradict data from simulated experiments.
Roadway departures at night with one person in the car are practically impossible to replicate in the simulator.
Research that studies driver behavior with pavement markings and delineator posts in roadway curves can lead to contradictory datasets. In particular, brighter road markers lead to over compensation in one setting but not the other.
Two major NHTSA crash databases, the National Automotive Sampling System General Estimates System and the Fatality Analysis Reporting System, often yield apparently contradictory data and different results depending on the context of the crash.

The panellists noted several reasons for these contradictions:

Exposure—Relatively frequent in the simulator and relatively infrequent on the open road.
Feedback loops—Drivers change their behavior based on feedback from the environment.
Abstract reality—Scenarios are abstractions of reality and do not allow for real-world interactions.
Sampling bias—Bias in the sampling of laboratory studies (often college-age students) can produce different effects depending on the driving population.

Complementary Datasets—Examples from Panelists

The moderator highlighted the following examples of complementary datasets:

Different datasets can sometimes be combined to produce better estimates of high-risk periods. For example, the number of bicycle crashes reported is highest for noon and midnight; however, bringing crash reports and exposure data together shows that the real risk of bicycle crashes is highest very early in the morning.
Different datasets may need to be kept separate to understand different aspects of a problem. For example, understanding the cause of a crash can frequently come from naturalistic data, but finding the pattern of where crashes occur may come from crash data.
Contradictory datasets can become complementary if the methods used to collect the data are changed in ways that potentially account for the contradiction. For example, roadway departure studies conducted in the field complement simulator studies in that drivers will depart from the simulated roadway if the drive is long enough.
When the issues are multifaceted, the best combination of complementary methods and datasets for any given set of issues can be identified by using the table of methods and issues shown in figure 15.

Comprehensive Datasets—Examples from Panelists

The moderator highlighted the following examples of comprehensive datasets:

Single bicycle crashes—Datasets do not capture many of the potential causes of theses crashes; thus, comprehensive datasets are still a long way off.
Pedestrian and vehicle crashes—Police crash databases in Japan are not useful for understanding causation but can still be used to generate possible countermeasures; however, comprehensive datasets are not a near-term possibility.

The panelists noted that researchers need to understand when data are corrupted, for example, a top-down approach can help when the bottom-up approach is in error.

General Discussion

Following the panelist session, the group discussed the following topics:

Current Datasets

Need for theory: Researchers require a theory of behavior to inform data mining, serve as a framework, and make different types of data coherent.
Need for careful problem identification: Problem identification, by using data to identify that there is a problem, can lead to the wrong conclusion if the research community is not careful in choosing the database or methodology used.
Need for understanding limitations: Data can be easily misinterpreted, not only in analysis but also in collection. For example, SHRP 2 data were collected for a specific purpose, and researchers need to recognize that there are limitations to these data.

Future Datasets

Standardization: There is a strong need for standardization. Human factor researchers and traffic engineers use terminology differently: The former define headway as front bumper to front bumper, but the latter define headway as rear bumper to rear bumper. Another example is crash reports, which differ across different municipalities. Missing data can create a gray area for researchers.
Broad goals: To take full advantage of the data that are collected in the future, researchers should remember that their research questions are not the only questions that need to be answered in regard to the data collected.

In summary, the panelists noted that there are two different ways of viewing how best to deal with multiple, contradictory datasets as follows:

Bottom up—It is possible to take various known instances in which there are contradictions across datasets and identify why these inconsistencies arise and what can be done to avoid them in the future.
Top down—A study across multiple sites would allow for the collection of various different types of data. It would then be possible to look for inconsistencies across sites in the same dataset and inconsistencies within sites across datasets.

Recommendations

In conclusion, the panel members considered what type of study would be needed to (1) understand how to resolve long-standing contradictions among different datasets; (2) allow for the use of complementary datasets to generate information on the risk of different behaviors, their likelihood, and the prevalence of the scenarios in which they occur; and (3) generate a comprehensive dataset that links behaviors and crashes. Panelists were unanimous in recommending that there should be an attempt to understand how to use the different types of data in a study, which includes the following components:

Multiple sites (e.g., locations, geometries, traffic density, and environment).
Multiple types of data gathered at each site (e.g., survey, simulator, and field).
Multiple users (e.g., bicyclists, pedestrians, motorists, and drivers).
Multiple methods of analysis (e.g., descriptive and inferential statistics, and quantitative behavioral models).

The panelists suggested intersections, road departures, and connected vehicles as possible areas of focus for the study. They noted that intersections have many characteristics suitable to the study, are one of the best places to study V2V communications, and perhaps are one of the only places to study vehicle-to-pedestrian communications. Moreover, the three major issues surrounding multiple datasets can be studied at intersections. In particular, the data from naturalistic and simulator studies, which often lead to different estimates of risk, can easily be compared at intersections to determine why the various contradictions among datasets exist. Information on both the risk of a particular behavior (e.g., the risk that failing to take a secondary glance has on crashing, given that a vehicle materializes when the driver fails to take the glance), the likelihood of a particular behavior in a given scenario (e.g., the likelihood of taking a secondary glance), and the prevalence of the scenario (e.g., the prevalence of situations in which the driver fails to take a secondary look and a vehicle materializes) can be used to create complementary datasets. Finally, given the high incidence of crashes at selected intersections, the behavioral data can be combined with the crash and near-crash data to generate comprehensive datasets.

Group Session Summary

Overview

For the final session of the workshop, participants gathered into groups to discuss three key topics relating to data needs for human factors research: (1) driver–driver and other road user data, (2) driver–vehicle data, and (3) driver–infrastructure and roadway data. Following extensive group discussion, involving multidisciplinary experts from government, academia, and industry, the participants reconvened, summarized their findings, and made recommendations. This section presents the overall findings of the breakout groups.

Driver–Driver and Other Road Users' Data for Human Factors Research

Need for Pedestrian and Bicyclist Exposure Data

Researchers do not have ways to measure the exposure of pedestrians and bicyclists. For example, although there are automatic counting systems for vehicles at intersections, there is no analogous system to record the numbers of cyclists and pedestrians passing through.
The lack of exposure data makes it impossible to calculate a risk ratio for pedestrians or bicyclists, although the numbers of fatalities are known. For this reason, it is impossible to compare the United States' pedestrian and bicyclist fatalities with the risks to pedestrians and bicyclists in other countries.
Technology is becoming more available to gather pedestrian and bicyclist data. For example, in Gothenburg, Sweden, researchers installed several boxes with sensor equipment that can detect and record the number of bicyclists that pass each of these boxes. This current technology is rudimentary, for example, it is not able to distinguish between a bicycle or a stroller and is subject to erroneous input because of people intruding in bike lanes where they should not be.

Normative Variations in Pedestrian Behavior

Participants noted that pedestrian behavior varies depending on culture and education. For example, in Australia, pedestrians do not have the right of way, resulting in more cautious pedestrian behavior. In Sweden, pedestrians do not jay walk, unlike places such as Boston or Washington, DC, where jaywalking is a huge problem resulting in many pedestrian–vehicle strikes.
Participants also discussed who is at fault for incidents in which pedestrians are struck, injured, or killed and referred to the largest study of fault attribution conducted to date in the United States. This showed that blame fell equally on pedestrians and motorists. The assignment of fault is different in Japan where drivers are found to be at fault 90 percent of the time, and 10 percent of the fault falls to the pedestrians.
Pedestrians in the vision-impaired community had specific concerns with the practice of permitting right turns on red. Although FHWA has looked into this issue, the subset of data is small, and police reports detailing these incidents lack sufficient detail and consistency to permit a fuller understanding of the risks of this type of vehicle behavior. Other issues affecting pedestrian and bicyclist safety include visibility and distraction at intersections.

Hazards and Solutions in Roadway Design and Setup

Participants noted that many intersections lack sufficient visibility for pedestrians and bicyclists because intersections were designed to give vehicles a good line of sight, rather than to give all road users that good line of sight. For example, the High-intensity Activated crossWalK (HAWK) signal provides a protected pedestrian crossing as a way to increase safety. It is used only for pedestrian crossings and does not control traffic on side streets. The HAWK signal for pedestrians at larger crossings on a multilane-divided highway may pose risks to pedestrians—drivers, once they come to a stop, may be cleared to go but cannot see the pedestrian.
Improved reflectivity for pedestrians and bicyclists was another suggestion made to improve roadway safety for pedestrians and bicyclists. Pedestrians and bicyclists need to be more visible and could employ ways to emit more reflectivity. At present, bicycles must have one reflector but pedestrians have no obligation to emit reflectivity. It was recommended that pedestrians be encouraged to wear clothing with at least one reflective element to increase their visibility for drivers.
Infrastructure could be changed to separate cars and bicycles so that cars and bicycles do not cross paths.
Participants agreed that if the roadway was set up so that speed was controlled, and if that speed was slower at intersections, then this would reduce many incidents. The group showed participants an example of a traffic-calming roadway setup in Japan, where intersections are compact so a driver cannot increase his or her speed. The participants suggested that future studies investigate ones that make vehicles or bicycles speed up or slow down, especially in roundabouts, which are not considered safe for pedestrians or bicyclists.

Technology for Pedestrian and Bicyclist Safety

There are potential ITS technologies that can improve pedestrian and bicycle safety. DSRC devices that transmit information through mobile devices may be an option in the future to improve pedestrian and bicyclist safety. At present, the ITS Joint Program Office's Connected Vehicle program is exploring the feasibility of introducing pedestrian DSRC by using a smartphone. In addition, Volvo has recently introduced a pedestrian detection system, and bicycle manufacturers are starting to introduce auto brakes for bicycles, in addition to external airbags.

Driver–Vehicle Data for Human Factors Research

Participants in this group illustrated how to sequence methodologies to conduct research, by using multiple data sources. The process to address driver–vehicle research is detailed as follows:

Naturalistic Scenario Sampling and Problem Definition

There are many methods that can be used to monitor interactions among road users, such as TMC, traffic cameras, and social media. Traffic cameras can be used because of their availability; they can reveal possible navigation issues, traffic movements and conflicts; and they can recognize trends in traffic. Participants also suggested social media offers an alternate source of naturalistic data, for example, applications such as Twitter and Waze can also provide traffic and road information.

Simulation

Simulation is useful to examine an issue in detail and to explore conflict situations not readily detected from observation. With simulation, it is possible to vary the frequency of the driver's exposure to an intervention and to analyze the resultant driving behavior. For example, simulation is useful to study gap acceptance. Because of a disproportionate number of fatal accidents at rural intersections, Wisconsin DOT used simulation as an initial tool to test whether different types of signage would affect gap rejection and to encourage the acceptance of safer gaps, prior to using the more expensive on-road testing of alternative signage.

There are other issues in which the utility of simulation is constrained because of the lack of exposure. For example, roadway departure crashes account for about half of all vehicular fatalities but these scenarios are challenging to replicate in simulation.¹² It is very difficult to replicate the contributing factors, such as fatigue, to roadway departure. Simulation gives useful null results, but it is not possible to get a good understanding of roadway departure because it is impossible to replicate the scenarios, and the simulation process lacks sufficient exposure data.

Intervention Development and Evaluation

By using response data obtained from simulators and information culled from naturalistic data, it is possible to design and develop interventions to modify behavior and to meet safety needs. Although the simulator results in Wisconsin on gap acceptance were not definitive, they did provide trend information, which became the basis for subsequent road testing of signage most likely to foster acceptance of safer gaps.

Field Operational Test Data: Data from the field can provide complementary information about the effects of a new or modified intervention on drivers and vehicles.

Model-Based Benefit Estimation: Societal benefits can be estimated based on the effectiveness of interventions evaluated in experiments. Short-term benefits provided by treatments that have a significant impact (e.g., crash worthiness) can be identified and measured after implementation. Long-term benefits will change and evolve as the population of users adapt to the intervention.

Policy Design: Transportation policies are outlined based on evaluation of interventions, in addition to societal benefits.

Bayesian and Model-Based Surveillance: Model-based analysis provides a measurement to evaluate scenarios after an intervention has been implemented. Continuous surveillance can serve as a method to simultaneously collect new naturalistic data for a newly identified problem. In this case, this step will provide feedback to the procedure for a new strategy.

Driver–Infrastructure and Roadway Data for Human Factors Research

Connected Vehicle Technology and Driver Behavior

One of the key topics that participants raised during discussion was the importance of researching how connected-vehicle technology will impact driver behavior. For example, connected-vehicle technology has the ability to warn following vehicles if they are not slowing down in response to a slowing lead vehicle, even if the lead vehicle is several vehicles ahead. It is important to understand whether drivers believe that the warning is specific to them, because they will be more likely to acknowledge it. There is also an issue of whether the warning should be placed in the infrastructure or in the vehicle and how the alternative venues might affect compliance. Finally, there is also a need to explore connected-vehicle signage options, in terms of alternative methods of presentation and where the signage should reside in the infrastructure.

Operation and Safety

Participants identified several areas where operation and safety are linked. The operational simulation models, which include car following and lane modeling, raise human factors issues. It is important to calibrate these models correctly in the simulator to obtain the surrogate measurements to be applied to the real-world behaviors.

Research Priorities

Participants identified the following areas of priority concerning safety: roadway departure, urban intersections, vehicle and pedestrian to bicyclist interaction, and data analysis. Participants suggested a good synthesis project for roadway departures could ask a basic question such as, “When does the driver begin reacting to the curve?” There is also a need to evaluate the effectiveness of current signage used on roadways. Specifically, researchers want to know if and how current signs affect driving behavior, for example, when approaching a curve or in urban intersections, where all types of roadway users meet, and in left-turn conflicts. It was noted that there is difficulty running scenarios, such as speed perception, in simulators because of the inability to measure lateral acceleration in these settings.

Data Sources

There are existing data sources that can be mined at low cost to provide insight, such as TMC data. Researchers need to look into how to combine different data sources to gain more powerful insights than can any one dataset provide. Simulation was suggested as a good tool for identifying issues.

Workshop Recommendations

Participants identified many areas of priority for human factors research, which could make use of the expanding datasets now available and soon to be available. These include modeling, safety, roadway departure, urban intersections, vehicle, pedestrian and bicyclist interaction, and data analysis. Several items were suggested for further research, as follows:

Evaluation of the effectiveness of current signage used on roadways.
Research on speed perception.
Solutions to improve roadway safety for pedestrians and bicyclists.
Evaluation of current ITS technologies for pedestrian and bicycle safety.
Developing a methodology to conduct research that uses multiple data sources.
Methods to measure exposure of pedestrians and bicyclists.

To advance understanding and use of multiple data types, the participants recommended a study, possibly focused at intersections, that includes multiple sites, multiple data types gathered at each site, multiple user types, and multiple methods of analysis. This study could provide critical information on how to resolve contradictions among datasets, how to put together complementary datasets that describe risky behaviors, and how to generate comprehensive datasets that link behaviors and crashes.

¹² Federal Highway Administration Safety Program. Retrieved June 13, 2014, from http://safety.fhwa.dot.gov/roadway_dept/.

Page Owner: Office of Research, Development, and Technology, Office of Corporate Research, Technology, and Innovation Management

Topics: research, exploratory advanced research
Keywords: research, exploratory advanced research, Surface transportation, human factors research, data sources, human errors, datasets, data integration, driving simulators, field studies, field operational tests, naturalistic driving studies
TRT Terms: research, Information organization, Activities leading to information generation, Research, Research projects
Scheduled Update: Archive - No Update needed

This page last modified on 09/18/2017