Skip to contentUnited States Department of Transportation - Federal Highway Administration FHWA Home
Research Home
SUMMARY REPORT
This summary report is an archived publication and may contain dated technical, contact, and link information
Publication Number: FHWA-HRT-13-037
Date: December 2012

 

The Exploratory Advanced Research Program

Automated Video Feature Extraction Workshop Summary Report

October 10-11 2012

PART ONE: SPEAKER PRESENTATIONS

SHRP2 Safety: Making a Significant Improvement in Highway Safety

Ken Campbell, Chief Program Officer, National Academy of Sciences
Transportation Research Board, SHRP2

SHRP2 Background

Ken Campbell, Chief Program Officer at the National Academy of Sciences, began with a background to the second Strategic Highway Research Program (SHRP2). SHRP2 was authorized by Congress to address some of the most pressing needs related to the Nation's highway system. SHRP2 is administered by the Transportation Research Board (TRB) of the National Academies under a Memorandum of Understanding with FHWA and the America Association of State Highway and Transportation Officials (AASHTO). Contractors for the safety area include Virginia Tech Transportation Institute (VTTI), the Iowa State University Center for Transportation Research and Education (CTRE), Fugro, CUBRC, Battelle, the University of South Florida, Westat, Penn State, and Indiana University.

Why Study Naturalistic Driving?

Campbell explained that the safety area of SHRP2 is conducting the largest ever naturalistic driving study (NDS) to better understand the interaction among various factors involved in highway crashes—driver, vehicle, and infrastructure—so that better safety countermeasures can be developed and applied to save lives. The study is a focus on driver behavior and addresses the notion that it is possible to obtain more and better information on what people do when they drive—not just in the moments before they get into a collision but on a day-to-day basis.

NDS helps provide the solid baseline reference that is required to provide context to assess what is high-risk behavior and what is considered normal. For example, when analyzing what a person does every day, it may become apparent that a perceived risk factor just forms part of the subject's normal driving habits.

NDS Design

There are two major databases forming the SHRP2 data. Campbell informed workshop participants that the NDS ultimately will involve 2,800 primary drivers of all age and gender groups in passenger cars, minivans, SUVs, and pickup trucks. By the end of the study, this will add up to a database containing an estimated 5 million trip files that will prove invaluable to the next generation of researchers.

The data being collected by instrumentation within the participating vehicles include information from multiple videos; lane trackers; accelerometers; global positioning systems (GPS); radar; cell phone records; alcohol sensors; turn-signal status; and specific vehicle data on accelerator and brake pedal inputs, gear selection, steering angle, speed, seat belt information, and more, as illustrated in figure 1.(1)

Figure 1. Diagram. A birds-eye view of the outline of a car provides an overview of the data acquisition system. At the front of the vehicle, labels point to the radar unit, radar interface box, and front turn signals. A two-way arrow, labeled "Bluetooth," runs from the front of the car to the windshield. Here, text boxes point to the head unit, sub-head unit, and OBD connector. At the back of the vehicle text boxes point to the DAS main unit, GPRS/Wi-Fi antenna, and rear camera.

Figure 1. Chart. Full range of automation and cooperation alternatives.

The Instrumentation Package

Campbell explained that a video stream is recorded at 15 Hz and 640 x 320 pixels. This is compressed before it is stored, so only the compressed vehicle data are available for analysis. The images are placed into a single grid, with most of the pixels allocated to the forward camera view for maximum clarity, as shown in figure 2.(1)

Figure 2. Photo. A frame from video footage showing four camera angles in one grid. The main view shows the road ahead. To the right is a view of the driver. Below is the view looking down from the rear-view mirror and bottom right shows the rear view.

Figure 2. Photo. Footage is placed into a single grid.

In addition to multiple video camera angles, an intentionally blurred still image of the inside of the car is taken once every several seconds to show if there are other occupants in the vehicle, as shown in figure 3.(1)

Figure 3. Photo. An intentionally blurred interior view of a vehicle looking back at the driver from the rear-view mirror.

Figure 3. Photo. An intentionally blurred still image of the inside of the car.

In addition to vehicle data, roadway data are formed from a combination of three data sources. Researchers sent vans, as shown in figure 4, to measure 12,500 centerline miles (20,100 km) (the length of a highway regardless of the pavement width or the number of lanes) of roads across the test sites with a focus on data needed for lane departure and intersections. These include variables such as curvature location, grade, cross slope, lane, shoulder type, speed-limit signs, medians, rumble strips, lighting, intersection location, and number of approaches. The researchers also added supplemental data from State inventory and data from each of the sites—including crash history, information from work zones, weather and traffic information, and other specific topics.

Figure 4. Photo. A view of a heavily-modified van with various equipment attached to the outside.

Figure 4. Photo. An instrumented van.

The roadway data collection also includes a traditional photo log where still pictures are taken every 21 ft (6 m) to produce a sequence of photographs of where the vans are driven, as shown in figure 5.

Figure 5. Photo. A frame from a sequence of photographs shows an intersection ahead with overhead stoplights.

Figure 5. Photo. The van produces a sequence of photographs for a photo log of its journey.

Next Steps

Campbell stated that four separate analysis projects have started based on the collected data. These projects focus on testing analysis methods on rural two-lane curves; offset left-turn bays; driver inattention; and crashes on congested freeways. Future applications could lead to more cost-effective roadway measures to prevent crashes, cost-effective intersection design, new vehicle technology to track driver attention, and effective methods to warn drivers of congestion ahead.

In the long term, there will be many applications to address significant safety issues using the NDS and roadway data, which will help gain insights into driver behavior that cannot be obtained any other way. Some short-term goals include assisting users in defining research questions and accessing the databases and identifying who or what the user will be.

The research team will produce specific user tools and data files to support the analysis activities. These will include trip summary files based around trips of interest for easy access—containing information on trip, roadway, vehicle, and driver variables. Specific event files could also help identify particular areas of interest (e.g., collision data) and suggestions for events and triggers are being solicited. The possibility of reduced datasets for individual users is also under development.

Questions

Following the presentation, workshop participants had an opportunity to ask questions. One question focused on how the vehicle instrumentation and roadside image data could be linked together. Campbell explained that this is one of the challenging problems of the study. Although the vehicle instrumentation does include inexpensive GPS sensors (recording location and heading once per second) and the location of the roadway data is stored in a spatial database, how to actually link them together is yet to be resolved.

Behavioral Safety and Driver Distraction

Richard Compton, Director, Office of Behavioral Safety Research
National Highway Traffic Safety Administration

Analyzing Crash Data

Richard Compton, Director of the National Highway Traffic Safety Administration (NHTSA) Office of Behavioral Safety Research, began by informing workshop participants that studying naturalistic driving data offers NHTSA a valuable opportunity to greatly improve the understanding of how crashes occur. He also noted that there are many potential uses for the data in the future. Participants were told that most of the knowledge of the past 50 years comes from post-crash analysis; although information gathered from the scene of a crash is very useful, there is a lot of information leading up to a crash that cannot be gathered using this method.

Compton stated that most crashes are not due to roadway or vehicle defects. 85 to 95 percent of crashes are due to operator behavior, but it is very hard to understand how the person was behaving leading up to the crash after the event. Measurements like pre-crash speed are extremely difficult for a police officer to calculate at the scene after a crash has occurred. Visual indicators, such as skid marks, used to provided valuable information but most cars today have anti-lock braking systems which take such clues away.

It was noted that modern cars typically contain an electronic data recorder (EDR); however, these recorders operate on a first-in first-out policy and are usually triggered by an airbag deployment, saving just a couple of seconds of extremely limited data. The EDR systems do not perform like the more sophisticated black boxes installed in airplanes and trains that provide detailed information on the last hour of travel.

Many hours were spent performing manual coding of video to allow researchers to see exactly what drivers were doing in their cars on a normal basis and leading up to a crash.

Assessing Driver Distraction

Compton informed participants that this naturalistic behavior study will supply that critical information and also provide an opportunity to study and analyze what normal behaviors are and assess abnormal behavior.

Previously, a 100-car study operated as a pilot study for the NDS and part of the preliminary study aimed to get a sense of driver distraction factors. Many hours were spent performing manual coding of video to allow researchers to see exactly what drivers were doing in their cars on a normal basis and leading up to a crash. One of the key factors to emerge from this study was the importance of keeping eyes on the forward roadway. Eyes that strayed off the forward roadway for more than 2 seconds showed a clear link to the crash data. For the first time this provided objective evidence and a clear correlation between an event happening when eyes were off the forward roadway.

Compton added that NHTSA has a great interest in getting involved in different types of driver distraction issues—whether they are manual, visual, or cognitive distractions. For example, many drivers now use electronic navigation systems in their vehicles but there is no clear understanding of exactly how people are using these devices. This type of naturalistic driving data gives insight into those uses.

Assessing a Driver's Perception of Safety

Compton highlighted that naturalistic behavior data are also extremely useful for studying the use of seat belts. He noted that, although NHTSA performs an annual survey to assess how many people use their seat belts and conducts self-report surveys, it is difficult to get a real and accurate understanding of seat belt use without having objective data to work from. Researchers for the original 100-car study recorded seat belt use of participants over a year of driving and were able to classify people into three categories:

The occasional group was of particular interest to researchers because it provided an opportunity to assess when and why driver behavior changed. One hypothesis for the variation in seat belt use between journeys included a change in use when driving was considered more risky (e.g., at night, or in adverse weather conditions); however, people's perceptions are not always accurate. For example, a driver is much more likely to get into a crash on a low speed road as opposed to the interstate, but the occasional group's belt use was significantly higher on high speed, limited-access roads versus low speed roads. It becomes much easier for the safety community to do a better job to improve countermeasures to encourage seat belt use when it is known how and why drivers behave the way they do.

New Insights Into Driver Behavior

Compton stated that naturalistic driving data offer tremendous potential to provide new information and insights into how driver behavior and inattention are contributing to crashes. NHTSA wants to know what people do when they are driving and when and why this behavior changes. For example, the behaviors associated with speeding, aggressive driving, and drowsy driving are specific areas of interest that could tremendously benefit from the insight offered by naturalistic driving data. There is also an invaluable opportunity to help refine the metrics for vehicle safety systems. Additionally, safety researchers are able to gain new information on how people understand and interact with automated vehicles and collision avoidance systems.

Questions

One question addressed whether the study assesses how drivers respond to advanced avoidance warning systems. Compton confirmed that there was an initial intent to examine this in the original study design, but the study had trouble recruiting people with those systems within their vehicles.

Data Collection

Jon Hankey, Senior Associate Director for Research and Development
Virginia Tech Transportation Institute

Video Examples

Jon Hankey, Senior Associate Director for Research and Development at VTTI, provided several video examples of the VTTI data collection system. The video footage included various in-vehicle camera angles, including driving during the day and at night (using infrared cameras). One video example showed a forward-facing video of a distracted driver leaving a lane and striking a tire on the curb. Examples of these videos can be viewed at http://forums.shrp2nds.us/.

Securing Approval

Due to Institutional Research Board (IRB) regulations concerning personally identifiable information (PII), a researcher needs to obtain IRB approval to access the SHRP2 data. Study participants signed a consent form for future use of the data; however, the form states that data will always be used under IRB approval and a data-sharing agreement. To date nobody has had to undergo a full review. In addition to the video of the driver, GPS information has also been deemed to be personally identifiable information under certain situations. Data access differs according to how identifiable it is, and VTTI is therefore trying to make as much of it as unidentifiable as possible. It was noted that part of the workshop's mission is to move toward enabling a computer to perform the automated analysis of the data and bypass human involvement altogether.

Improved Data Access

Workshop participants were informed that it is hoped workshop discussion will help gain a better idea of research needs and figure out a way to make the data available but still meet all the privacy requirements. Currently, the use of the sensitive SHRP2 NDS data is only permissible by physically traveling to a secure facility at VTTI; however, FHWA and SHRP2 are exploring the potential of "remote secure enclaves," which offer all of the essential privacy protections offered by the current situation. VTTI is also conducting a 24-driver experiment in which each driver is behind the wheel for 45 minutes using the same equipment and installation procedures as the larger study. These drivers will sign especially broad releases, providing easier access to the data than is permissible with the SHRP2 data. It should be noted that although this is a small sample relative to the SHRP2 data, it is substantially larger than the samples generally used in the publication of academic journal articles. Although there was substantial interest among the academics present in accessing the data, questions arose about the scalability of models based on small datasets to the analysis of thousands, or millions, of hours of video data. It was highlighted that appropriate validation will be essential.

Final Video Examples

Another demonstration video showed a rear-end collision. Researchers asked how to provide an indication of how fast the car was travelling when it ran into the back of the research vehicle. The ability to calculate that sort of information would be very useful. Finally, the blurred photos of the vehicle interior taken every 10 seconds were shown to attendees. It was noted that it would be useful if there was an automated way to go through the blurred photos and count the number of passengers, identify which seats were being used, whether seat belts were used, if a child seat was installed, and if occupants were male or female. Hankey also noted that the images were blurred intentionally in real time because IRB permission was obtained for drivers not passengers.

Video Analytics: Putting Driving in Context

John D. Lee, Professor, Department of Industrial and Systems Engineering
University of Wisconsin–Madison

Video Analytics

John Lee, a Professor at the University of Wisconsin–Madison's Department of Industrial and Systems Engineering, began by highlighting that video analytics has potential to be an amazing tool within the realm of naturalistic data. Importantly, it could help put driving into context, more so than any other analysis technique such as simulators. Lee noted that naturalistic driving data are interesting but it is also hugely challenging to make it scientifically meaningful and useful. It was also noted that although video analytics can code driver "state," its bigger value may be about driving "context"—something that would provide great value to the safety community.

The data are there but extracting it manually would take centuries. This is, therefore, a coding challenge for researchers to develop ways to extract what is meaningful.

Overcoming a Data Deluge

Lee stated that naturalistic data are creating a deluge of data that is challenging to deal with. This issue is not specific to the driving community and is being witnessed in all domains with access to lots of data. Accordingly, there is much to be learned from others involved in these "big data" projects. For example, many of the big data techniques that have been developed for the Large Hadron Collider project in Switzerland would be equally applicable to driving data analysis.

Lee also noted that there is a need to get around the "event centric" approach to obtaining data. Naturalistic data acquisition has mostly been examined from the perspective of event-triggered analysis but this is limiting because it provides a very limited context of what is going on around an event such as a rear-end collision. The data are there but extracting it manually would take centuries. This is, therefore, a coding challenge for researchers to develop ways to extract what is meaningful.

An example of where this has already been performed successfully can be seen in the work of Rob Radwin at the University of Wisconsin–Madison. Radwin developed video analysis capable of carrying out automatic measurement of hand activity in the workspace and producing, in real time, the risk posed to people doing manual labor tasks. This represents a prime example of a human factors expert collaborating with a video analytics expert to develop an application that works very well—a successful partnership that could also be applied to driving analysis.

Driver Distraction

Driver distraction is an issue of increasing concern as technology continually moves into cars and understanding distraction is a huge challenge with major implications. Lee explained that driver distraction has previously been defined as "the diversion of attention away from activities critical to safe driving toward a competing activity."

Looking at the road environment and the driver state are equally important, and video analytics provides a lot of potential for coding the driver state. Lee provided a list of requirements (a "wish list") that would help characterize driver state and lead to better understanding of distraction:

Lee highlighted that video data could offer a lot of value to researchers but to understand distraction it is necessary to look at roadway demands in context of the competing demands of the task.

Distraction Analysis

Lee stated that the SHRP2 distraction analysis looks at continuous severity measure where severity is defined by safety margin and injury risk, defined by the field of safe travel. Video analytics can look at the driving environment from the driver's perspective and recreate some of these elements to better understand what the driving environment and context are that might combine with the distractions to reduce safety. Lee explained that understanding the road context can include the following:

Computational Models

Lee noted that computational models of road context, in terms of visual attention, are capable of predicting where people are going to look based on the characteristics of various scenes. It is possible to predict where the eyes are likely to fall, what the driver sees, and where the driver is likely to look in a scene. This sort of analysis could be applied to video data that are coming out of SHRP2.

In addition to analyzing the facial expression of drivers, Lee noted that cars also give clues to their driver's behavior through external "facial" expression. Drivers communicate through their cars using turn signals, lights, and horns. If video analytics can tap into that communication it may be possible to gain a deeper understanding of what is going on, as opposed to just looking at the driver's face.

Disciplinary Myopia

Lee informed workshop participants that there are thousands of potential projects to undertake using video analytics, some offering huge safety benefits, some with very small safety benefits. Some projects may be interesting from a purely video analytics perspective, some may be of interest only from a safety perspective. He noted that because the really interesting driving safety problems may be uninteresting from a video analytics perspective, this is a challenge that needs to be addressed. Video analytics needs to be considered as one of several approaches to deal with the naturalistic data deluge. For example, an exclusive focus on video analytics to identify driver state may miss the bigger opportunity to look at the driver context. Therefore, efforts to avoid disciplinary myopia need multidisciplinary teams to include driver behavior, unstructured data design experts, and video analytics experts.

Questions

During questions that followed the presentation, the importance of context was raised with the example of how driver demand on a familiar road will be completely different to someone tackling a road for the first time. It was also noted that there is a hierarchy of needs required from video analytics, from the simplest to the most complex. The simplest need is establishing where the person is looking at any given time. This can be extended to whether they are happy or sad—things that are not definable from an engineering perspective. Manufacturers and safety experts also have needs in different places than video analytics people, whose interest in serving those needs may be very small. Effective matchmaking is therefore critical.

It was discussed that the possibility of developing an avatar that could be placed on faces, to retain head pose and eye direction, would solve a lot of problems, and access to data and resulting privacy issues could be minimized.

Challenges of Video-Based Human-Computer Interfaces and Experiences in Analyzing Videos of Driver Faces

Margrit Betke, Professor, Computer Science Department
Boston University

Video Analytics Experience

Margrit Betke, a Professor at Boston University's Computer Science Department, provided a brief background on human–computer interaction (HCI) and tracking experiences, specifically in the fields of multiobject tracking and video-based HCI. The concept of the camera mouse was then explained. This is a video-based interface for non-verbal people with severe motion impairments. The system tracks body features and movements and converts them into mouse pointer movements. It is then possible to use this to interact with on-screen keyboards and other communication interfaces. Development of this technology involved a lot of facial analysis but it has proved highly successful. Since 2007, there have been over 1 million downloads of the free mouse software from www.cameramouse.org.

Developing Computer Vision

In 2000, Betke originally examined active computer vision in vehicles, specifically in terms of studying drivers and traffic in realistic conditions. The goal at that time was to develop a system that locates the face and eyes of a driver in realistic conditions and in real time. Betke made driving videos with students but several challenges emerged. These included when operating at night, adjusting cameras after leaving a garage, and handling light blooming effects, as shown in figure 6. There were also considerable difficulties producing results in real time with the technology available in 2000, as shown in figure 7.

Figure 6. Photo. A close-up view of a driver's face with bright sunlight shining in from behind causing glare. Red boxes are superimposed over the driver's eyes, a red cross marks the forehead, a blue cross marks the next, a green cross marks the camera glare, and a red circle is on the driver's right shoulder.

Figure 6. Photo. Light "blooming" can impede facial recognition.

Figure 7. Screen Capture. A screen capture of computer vision detecting a face. A text box indicates a face has been detected. Red lines show the estimated top and bottom, green lines show the estimated left and right sides.

Figure 7. Screen Capture. A facial recognition system from earlier research.

Research Challenges

Betke informed workshop participants that there was a long pause in this research from 2000 to 2011. This was due to several reasons, in particular a lack of funding opportunities that support basic science approaches and also problems reproducing previous papers due, in part, to privacy. It was also difficult to obtain the large amounts of data required for such research.

The benefit of adopting multiple cameras that have views in different spatial domains was noted by Betke for creating three-dimensional (3D) gesture analysis. This involves using stereoscopy (also known as 3D imaging) to see what users are doing. Betke highlighted that a lot of motion can be lost using a single camera; but with 3D analysis, test subjects were able to move a pointer around the screen with their facial gestures using 3D-trajectory analysis, as illustrated in figure 8.

Figure 8. Chart. A 3D chart plots trajectory of the right eye, nose, and left eye. The X-axis measures cm from 30 to 45, the Y-axis measures cm from 15 to 30, and the Z-axis measures depth in cm from -12 to 2.

Figure 8. Chart. 3D trajectory analysis.

Recommendations

Betke concluded with a number of recommendations. Recording the driver's face with two or three calibrated cameras was suggested as a potentially good idea for the future naturalistic driving studies. This would enable researchers to create a 3D reconstruction and also make the implementation of an avatar much easier. Such a multi-camera system would also offer a level of redundancy in case of occlusion of the driver's facial features.

Another recommendation was to ensure that all cameras used in a vehicle are calibrated spatially and synchronized temporally. This includes the cameras with the fields of view of the road behind and in front of the vehicle, and the cameras pointing to the steering wheel, the driver's face, and the passenger seats. Spatial and temporal camera calibration will enable research of correlations of events inside and outside the vehicle.

Potential database characteristics were also discussed. Although including basic factors such as age, gender, and hair type are important, other features such as personality should also be covered. For example, some people are very still behind the wheel, others move a lot, so tracking these varying parameters will be very different and should be catered for. Other suggested factors to consider include what the car interior is like, the lighting and weather conditions outside the vehicle, and various passenger configurations within the vehicle.

Automatic Video-Based Driver State Analysis and Recognition

Qiang Ji, Professor, Department of Electrical, Computer, and Systems Engineering
Rensselaer Polytechnic Institute

System Overview

Qiang Ji, a Professor at Rensselaer Polytechnic Institute's Department of Electrical, Computer, and Systems Engineering, said that his research has been underway for 10 years, with a focus on real-time driver-state monitoring and recognition systems. The current system uses multiple cameras to monitor the driver and acquire video from different angles. The system uses narrow-view cameras to focus on the eyes and wide-view cameras to focus on the face and upper body. This video data are then processed to recognize different driver behaviors, particularly focusing on facial and body behavior. These behaviors are categorized and then fed into a probabilistic model which integrates the parameters with contextual information to provide a comprehensive and robust characterization of the driver's state, as illustrated in figure 9. The system is able to continually monitor the driver and decide what is the best information to provide to keep them safe and productive.

Figure 9. Diagram. A flow chart illustrates the system approach to understanding driver state. The driver is videoed, visual sensing is used to monitor facial and body behaviors and then infer driver state. At the top, contextual data feeds into the driver state inference. At this point intervention is taken and the chart returns to videoing the driver, or it progresses to the final box marked "driver state."

Figure 9. Diagram. System approach.

Monitoring Multiple Behaviors

Ji noted that it is important to monitor multiple behaviors. Accordingly, several visual behaviors are monitored—including eyelid movement, head movement, eye gaze, facial expression, and upper body movement. Computer vision methods include multiview face detection and tracking, eye detection and tracking, facial feature point tracking, head-pose estimation and tracking, facial expression analysis and recognition, eye-gaze tracking, and upper-body tracking and gesture recognition. Multiview face detection and tracking is considered to be very important because the face of the driver is not always front facing and therefore needs to be tracked from different angles.

Detecting and Tracking a Range of Movement

Video demonstrations of face detection and tracking using different face angles showed the system working in real time and successfully detecting and tracking faces with significant movement and expression changes. Another video showed that the analysis can also be applied outdoors to detect faces and facial orientation as people walk by.

Eye detection and tracking forms a key part of this research, and two specific techniques are employed by the system. One technique detects and tracks eyes during the day under normal sunlight conditions, another is able to detect and track eyes under poor illumination at night using infrared cameras. The system has proved capable of automatically detecting above 99 percent of eyes from the detected face, day or night.

Facial feature point detection and tracking are another key component of this research. Using a system of 28 points located around each major facial component—near the eyes, mouth, and eye brows—the system is able to detect and then track specific movements. It automatically locates the face, normalizes the image, and then superimposes facial feature locations onto the live face image, as shown in figure 10. The 28 points also enable the system to estimate head pose, and a technique has been developed to work out the 3D angle in real time.

Figure 10. Photo. Twenty-six red dots mark the outline of a face, two green dots pinpoint the eyes.

Figure 10. Photo. Facial feature detection automatically superimposes 28 points onto a face.

Characterizing Driver State

Another component of the system involves facial expression analysis—enabling the system to specifically characterize the state of the driver. Ji explained that much work has been done in this area, specifically focusing on two areas for analysis: (1) global facial expression analysis—recognizing six facial expressions (e.g., happy or sad); and (2) local facial recognition—capturing each muscle movement around the eyebrows, mouth, and cheek to indicate the state of the person.

Ji noted that it is a challenge to perform this subtle local recognition; however, to mitigate this challenge a probabilistic model has been developed for inference. As a result, the system is able to automatically detect a person's expression in real time and indicate the orientation of the face.

After visual expression detection, the next step is to perform eye detection. A driver's eye gaze can reveal the intent of a driver and indicates the focus of his attention. The eye-gaze tracking is designed to be able to operate under natural head movement with minimum personal calibration. The goal is to determine the visual axis of the person and use that to determine the line of sight. To do this, a narrow-angle camera focuses on the eyes and infrared detects cornea reflection to reveal the center of the cornea, as shown in figure 11. This can then be connected with the center of the pupil to detect the optical axis. The system then intersects the visual axis with the optic axis to produce the gaze point.

Figure 11. Photo. A close up black and white photo shows a white box highlighting the right eye. This expands to indicate the location of two glints.

Figure 11. Photo. Infrared is used to detect cornea reflection.

 

Monitoring Driving Behavior

In addition to monitoring facial expressions, Ji explained that by using cameras to track arm and hand movement the system also could recognize upper body gestures. It can use this information to recognize whether the driver is performing normal driving behaviors or eating, using the phone, applying makeup, texting, or adjusting the radio. So far the system has been successfully applied to several different real-world experiments, including those investigating fatigue, stress, workload, and distraction factors.

Our Roadmap of Automatic Feature Extraction for Driving Safety

Yuanqing Lin, Department Head, Department of Media Analytics
NEC Laboratories America

Background

Yuanqing Lin, Department Head of Media Analytics at NEC Laboratories America, began by explaining that the NEC Laboratories America research team is relatively new to the field of driver safety, having been initially inspired by the Google automated car project. Lin explained there are currently nine full-time researchers in the media analytics lab with a mission to solve fundamental problems in computer vision and develop state-of-the art systems. A strong collaboration with universities also exists, and the major research direction is investigating recognition and 3D reconstruction. Lin noted that driving safety is considered an interesting project because it specifically integrates lots of technologies from both of these research areas.

Sensing Levels

Lin said that sensing is a two-stage problem that requires low-level sensing and high-level understanding, both inside and outside the car. For the low-level sensing, there are many different sources of data that can be established. These can be from onboard sensors monitoring inside elements such as the gas or brake pedals, or from monitoring driver features such as eyes, head movement, or foot pose. Another factor involves looking at the outside surrounding environment and using 3D reconstruction to detect factors such as the lane, road, nearby pedestrians, bicycles, or cars.

Lin explained that with the low-level sensing complete it is then possible to move to high-level understanding to establish inside factors such as a driver's mental state and behavior, and also understand outside factors such as the road scene. Although low-level sensing can be performed without driver video data, the data could still prove to be very helpful. Video from naturalistic driver study data can play a critical role in high-level understanding of behaviors.

Automatic Feature Extraction

Lin said that there is a need to determine which low-level features are useful for understanding certain types of behavior. Lin said that not a lot of work has so far been conducted by NEC Laboratories America on high-level understanding—partly because of a lack of access to the data. Accordingly, the current focus is on the ongoing work in low-level sensing, specifically focused on event detection, image recognition, and object detection.

NEC Laboratories America has conducted research on using automated detection to pick out people performing tasks, such as operating a cell phone or pointing. A key feature of the technology behind NEC Laboratories America's success is feature extraction, which takes place unsupervised using local coordinate coding or super-vector coding.

The existing state-of-the-art method for detecting objects such as cars, bikes, and pedestrians is to use a deformable part-based model—a class of detection algorithm in which various parts of an image are used separately to determine if and where an object of interest exists. This method of detection has had a low object detection success rate, lower than the NEC Laboratories America's object-centric pooling method.

Once objects have been successfully detected, Lin noted that the SHRP2 naturalistic driving study data will be very important to achieve the high-level of understanding. It may then be possible to figure out a real danger from the low-level sensing data. For example, high-level understanding could potentially establish that an accident or near accident happened in a similar case and automatically extract a danger scene from the available information.

3D Reconstruction

Lin highlighted that one particularly important ingredient required to put detected objects into 3D worlds is the concept of 3D reconstruction. Participants were shown a single camera video demonstration of real-time structure-from-motion being used for 3D reconstruction. The technique uses a challenging real-world dataset, the KITTI Vision Benchmark Suite (an open-access software suite from the Karlsruhe Institute of Technology), to evaluate interference from pedestrians, other cars, large illumination changes, large speed variations, and other environmental factors. This enables comprehensive evaluation for rotation and translation errors.

Vision for Driver Assistance: Looking at People in a Vehicle

Mohan M. Trivedi, Professor, Laboratory for Intelligent and Safe Automobiles
University of California at San Diego

Opening Remarks

Mohan Trivedi, a Professor at the Laboratory for Intelligent and Safe Automobiles (LISA) at the University of California at San Diego, began with a few key remarks. These included that successful computer vision is difficult to realize and that researchers need to recognize that vision is purposeful, and it is important to understand a picture in context.

A Holistic Approach to Driving

Trivedi highlighted that driving is very complex and the most dangerous act the average person does every day. Capturing naturalistic driving is a very different task to capturing driving behavior in laboratory or simulator conditions. Trivedi noted that driving is not just made up of one act at one time but is made up of three different types of tasks:

Robustness and reliability are considered the hardest part of such a system because it needs to perform critical tasks day in and day out. Addressing this is a research problem, and metrics are needed to establish reliability measures and prove exactly how good something is. This raises the question of how to define the metrics that are associated with the cameras and sensors put in a car. For example, what are the performance parameters for face recognition?

Research Focus

Trivedi noted that very specific experiment design and data requirements need to be established. Clarifying the experiment objectives is an important first step in the design. For example, driving consists of a lot more than just lane changes but this is still a critical issue and driving component that needs to be looked into in detail. It illustrates the point that researchers must sometimes narrow their focus and hone in on one element.

According to Trivedi, a simple search of papers published in the Institute of Electrical and Electronics Engineers Transactions on Intelligent Transportation Systems revealed that papers and citations referencing "driver, eyes, and cameras" produced 11 papers per year from 2000 to 2004; 35 per year from 2005 to 2009; and 80 per year from 2010 onwards. This represents a clear trend and growing interest in the area.

LISA's Research Agenda

Trivedi explained that the LISA research agenda adopts a multidisciplinary focus on development of a complete driving context capture system. Research includes robust computational algorithms for context and intent analysis, detailed behavioral analysis of driver and driving tasks, mental models for attention and multitasking, and multimodal interfaces for driver attention management.

LISA's first project, in 2000, was to examine automobiles as a context-aware space. Researchers placed a camera and a cellphone in the vehicle system to inform a caller if it is the right time to call or not. This experiment addressed how to use a camera to tell what the surrounding information is indicating. The second project was with Volkswagen and required the development of smart airbags. A robust real-time vision system was created for sensing occupant body posture in vehicles and providing safe airbag deployment.

Trivedi stated that a lot of previous studies about attention and drowsiness were looking at eyes. This is generally considered a hard problem; however, an alternative approach suggested by Trivedi is to try to solve the easy problems first. For example, eyes are part of the head so the head is a good starting point for cameras to focus on. If focusing on the head cannot provide everything that is needed to detect drowsiness, for example, then it may be necessary to move on and then look at the eyes.

 

ResearchFHWA
FHWA
United States Department of Transportation - Federal Highway Administration