Skip to content U.S. Department of Transportation/Federal Highway AdministrationU.S. Department of Transportation/Federal Highway Administration

Office of Planning, Environment, & Realty (HEP)
PlanningEnvironmentReal Estate

HEP Events Guidance Publications Awards Contacts

Driver Visual Behavior in the Presence of Commercial Electronic Variable Message Signs (CEVMS)

Summary of Peer Review Comments - May 16, 2011

INSTRUCTIONS TO REVIEWERS

Reviewer Rating Form for Technical Rigor

The Department of Transportation relies in part on high quality information from research it conducts and commissions for policy-related decisions and related actions. These decisions are intended to better U.S. Transportation but will likely be viewed within the context of the constituencies impacted. For this reason, and since driver distraction is a central concern of the Department of Transportation, studies that will lead to significant policy decisions require rigorous peer review. You have been identified as a professional who can contribute deep and broad knowledge for this review.

The review consists of two components, rating scales and open ended questions for free-form responses. Please use both components; the scales will help the Review Team assess the strength or importance of particular ratings, while the open ended responses will allow you to provide specific details and expand on items your find particularly positive or troublesome.

General Instructions

You are asked to evaluate this report on writing quality (i.e., ability to clearly articulate results) and quality of research (including theoretical underpinnings appropriately reflect current thinking in field, appropriateness of experimental design and statistical applications on the data, logical and defensible conclusions). Please use the guidelines below to rate the report using a 7-point scale (1 = lowest, 7 = highest). Provide comments supporting your ratings in the space provided.

REVIEWER 1

Written Communication

Does the report communicate its ideas, actions and results in a clear and understandable way? Consider these questions as you evaluate the proposal: Does the report reference and utilize the latest available theory, research and experimental design and analysis procedures appropriate for this application? Would you be able to closely replicate the study from the details provided? Does the report present a logical flow from introduction to methodology to results and conclusions? Would someone not expert on this topic be able to understand the conclusions clearly?

Written Communication: 5

Written Communication Free-Form Response (Please elaborate on the questions posed above as well as other noteworthy criterion-related positive or concerning elements, including justification for the provided rating):

The paper is well-written and easy to follow.

The authors provide a good review of existing studies and nicely articulate the gaps in our knowledge as well as the shortcomings from this earlier work. That said, I think that the paper would benefit if it was buttressed with a stronger theoretical framework (currently, there is very little mention). The authors do instill Land's paper in support of some ideas, but there are many relevant theories of top-down or bottom-up control of attention that are worthy of mention (including models of supervisory control and the like). I recognize that they do not want to go overboard with too much theory, but it would be nice if at least some of these broader theoretical constructs are incorporated in a cogent and succinct way. This applies to both the Introduction as well as Discussion sections.

The analysis would greatly benefit from a clearly articulated analysis plan; one that is driven by the specific research hypotheses. As it stands, the results section includes many analyses that, although not incorrect, do not necessarily relate to the primary questions being asked or fail to clearly advance our knowledge. There is a lot of redundancy as well. Please see section below for further comments.

Some more specific comments:

p.9, it would be helpful if the authors provided a few more details regarding the previous work in this area. For example, what kind of protocol or task instructions have been employed in previous work (e.g., are participants told to drive as they normally would or are these studies carried out under the auspices of some other research question)?

p.12, paragraph on the progression of eye tracking technology seems out of place and disrupts the flow. Is it even necessary?

I'm not sure what the reporting standard are for such a report, but the scientific standard is usually to report in metric units.

Research Quality

Does the study utilize appropriate experimental methodology and/or statistical analyses for the nature of the topic and data, and specifically to address the stated goals? Were the samples adequate? Are the glance duration comparison thresholds reasonable and appropriate? Does the report sufficiently and appropriately explain the practical implications of the results?

Research Quality: 3

Research Quality Free-Form Response (Please elaborate on the questions posed above as well as other noteworthy criterion-related positive or concerning elements, including justification for the provided rating):

Determination of the factors that drive momentary eye glance behavior can be a challenge in the laboratory. For real world applications, consideration and (where possible) control of extraneous factors is tremendously problematic. As such, we certainly recognize the difficulties and challenges facing the investigators here and I appreciate their efforts. In some cases, the critical comments below are more a reflection of just how difficult this type of research is.

I lead with what I see as the most pressing concern with the report-the glance durations that are being reported. It is normal that the distributions of glance durations are positively skewed (Figure 15, 29), but of particular concern is the loading of very short (<100ms) duration glances. The vast majority of glances captured in Figures 15,29 are so short that they would not generally meet the criterion for a fixation (let alone a glance, which can consist of multiple fixations). Not all, but many researchers use a minimum threshold of 100 ms to define a fixation when reducing eye data (i.e., consecutive samples where the fixation falls within an x-degree region of space, lasting at least 100 ms). This threshold value can vary, but it is intended to remove from consideration any potential saccades, transitions or irregularities from the data. The current report adopts a different approach without such a threshold and, consequently, considers every measured frame (~40 ms) as a viable data point (p.30). While (and as the authors argue) this does not necessarily deter from the analysis of percent time looking-the pattern of results certainly raises questions over the quality and legitimacy of the underlying data. This issue is compounded by the fact that some of the major reported outcomes are driven by these very short fixations (e.g., average glances of 50 to 80 ms to the billboards). Even if the investigators can argue convincingly that these are indeed legitimate fixations (or glances/gazes, using their terminology), they will still need to reconcile with the fact that: (a) the reported glances to billboards here are on the order of 10-times shorter than values reported elsewhere and (b) basic research on visual information processing (e.g., reading, scene viewing) suggest that observers generally don't process information within a fixated region unless the fixation to that region is at least 200 ms. While this number is debatable, I don't know that too many researchers in this field would readily accept claims that billboard information has been processed through 50-80 ms fixations (glances) here. All of this could be a result of the manner in which eye data were coded, calculated, or operationally defined, or could be indicative of issues with the equipment or measurement. It is possible that this issue is correctable with some data checking and, if appropriate, recoding and analysis. As it stands, however, this issue is undermining my confidence in the data.

The stated aim to measure and account for variations in luminance, location, size and other variables (p.12) is laudable. I admire the investigators efforts to measure a variety of photometric properties of the signs and scenes. That said, a number of details regarding the photometric evaluation need to be reported and there are questions about the adequacy and success of this approach. This might be more relevant given that all of this information regarding the signs does not appear to be well-integrated into the analysis of the data. Instead a subset is incorporated into vaguely described correlation analyses.

To elaborate:

-First, under what lighting conditions were the daytime photometric measurements taken (cloudy, sunny, shadowy, or whatever was current)? Given even these few factors, it seems plausible that photometric properties for a given sign/location can vary significantly. In light of this, have the investigators examined whether the photometric results are stable and representative of what was encountered during the experiment? If not, then one can wonder how informative or useful this data is. It could be that a clearly articulated analysis plan could help offset this. (Case in point: I could be mistaken, but Figure 6 seems to depict the same billboard, however with very different contrast ratios, based on the viewing angle and the location of the trees in the background.)

-Why didn't the measurement periods for both day and night correspond with the actual session times (12:45- /19:00- )?

-Why were only lateral adjacent areas used for the contrast ratio calculation? Would the vertical adjacent areas offer different values?

-How were the measurement distances (p.18) determined?

-It would be nice if the authors could provide references in support of this technique. Has it been successfully applied elsewhere? If so, this might be sufficient for justification.

-Why was only a single frame used in the analysis of subband entropy? How was it selected: randomly or determined? Did the investigators examine whether a single frame was representative of the entire data collection zone (or whether subband entropy fluctuated)? Perhaps Rozenholtz et al. can inform here.

As noted above, the paper would benefit from a clear analysis plan that relates to the specific hypotheses. To go further, the current analyses-while not incorrect-provides a rather coarse and perhaps insufficient assessment. Perhaps more importantly, it does not rule out the possibility that other differences account for the pattern of results. Obviously one can't control the signs in the real world; they differ on along a number of dimensions, including size, location, height, color, luminance, relative contrast (to name a few). It might be argued that the latter two variables reflect aspects that would clearly distinguish CEVMS from standard signs, given their function (though even the measures of luminance and contrast, in some cases, seem to capture a wide range of [overlapping] values). Ideally, the statistical comparison of CEVMS to standard signs would account for as many of the other properties as possible (i.e., those not related to a sign being CEVMS, such as size or location). The current analysis does not really consider these other factors. It is possible that an analysis of covariance or a regression approach (e.g., GEE) might offer a better means of accounting for the potential influence of extraneous factors. This would allow investigators to conclude something to the effect that "irrespective of differences in size, location, etc, etc, CEVMS has this, this, this impact on eye scanning behavior". Note also that some of the (null) correlation results involving luminance and relative contrast might indicate that differences across billboard type is due to some other property of the signs or the scenes that co-varies with its being or not being CEVMS.

Some specific questions or comments:

The accuracy of the eye tracking equipment is twice reported as <2-deg, but it is not clear whether this is based on the manufacturer specifications or whether this was measured and assessed in the current study. There is often a discrepancy between what is promised and what is real.

p.14, Statement check: can Google Earth really identify the locations of CEVMS, or is this a typo/slip?

Please provide a more extensive description of the category "no off-premises advertising". Does this mean no advertising at all, inclusive of on-premises advertising, or some mix? Related, I am unclear on the relevance of the on- versus off-premises distinction. Is there an a priori or theoretical reason that these are distinct? Would drivers, in their natural scanning, respond differently to these types of signs?

p.21, "third column of Table 2" should be "first column of Table 2"?

It is not clear what criteria are used to define low or high complexity in the scenes. For example, it is sometimes thought that this is determined by the entropy measurement, but in other places seems to be a subjective judgment (driven by taxonomies). The same can be stated for Exp.2. Based on some of the descriptions, why are only background features used in the determination? Traffic and other elements would certainly impact scene complexity (consider Figure 26, which is classified as low complexity, but has a lot of traffic that could enhance complexity). Further details would be appreciated.

Table 2: upon first encounter, it is unclear how the length of the data collection zones was determined. Later, this is covered (p.28), but questions remain. For example, I understand that the tracking system imposes necessary constraints; however, the resolution of the equipment does not map onto the resolving abilities of the human observers. In the least, this should be acknowledged as a limitation. The description of the rules for determining zones is a little difficult to follow as well. Finally, is it important that the 3 exceptions to the normal distance occur in the categories that have the fewest samples (standard, CEVMS complex)?

pp.25-26, the implications of the subband entropy analysis are unclear. As shown in Figure 12, there are clearly differences between Times Square and a desert scene, though the magnitude of the differences is only 1.4 points on the scale. How are readers to interpret the ~.3 range across the different categories? Does this mean that the areas are all about equal or is this interpreted as validation of the "complex" classification?

Was there any assessment of the driver's familiarity with the selected routes? One might expect that drivers would be less compelled to look at a billboard that they have seen umpteen times before (versus a completely novel one).

p.27, what criterion did experimenters use in determining near misses or driving errors? Later, this is elaborated (p.31), but the language is still too vague to be useful to would-be replicators of the work (e.g., "felt slightly uncomfortable"). Since no errors were reported, it might be easier to downplay this aspect of the methods/data.

Check for reverse text "bolding" (i.e., text bold, header not) for some sections (e.g., p.27).

p.28, twice on this page it is implied that reading is a fundamental aspect of glances to billboards (i.e., drivers could not be expected to read billboards at very long distances; using guidelines for letter legibility in setting zone limits). To me, this seems at odds with the stated concerns over how billboards divert or draw visual attention. Diversion of attention is not necessarily predicated on reading in any way. (As noted above, the short glance durations would not be indicative of reading either.)

It is assumed that the eye tracking system can accommodate the high degree of parallax between the position of the eyes and that of the cameras. A few words to state so might be appreciated by readers.

Many of the figures are redundant with what is presented in the respective data table.

Analysis of long duration glances is problematic because: (a) a 1 s threshold is not generally considered as criterion for an extreme or long duration glance, based on previous literature and (b) the analysis is based on such a small number of data points (Exp.1: 5; Exp.2: 7) that one reasonably could not make any firm conclusions. I would suggest deleting both of these sections. Reevaluation of the eye data itself (noted above) might change the underlying distribution and render this type of analysis more compelling.

Group level comparisons on the unknown category of glances (essentially a missing data category) are not all that informative without further explanation. As I see it, this analysis could be used to identify disproportionate problems with the tracking equipment across conditions (which would not be ideal-see Table 12, high complexity), rather than offer insight into different scan strategies.

Table 7 and the ensuing analyses are largely replications of the previous analyses, albeit with the breakdown of CEVMS complexity. If complexity is a justifiable variable at the outset, why not simply perform one set of analyses with its inclusion?

The correlation analyses are not well-described. What data are being used to generate the correlations? What information is gleaned from this analysis?

p.49, were the data collection zones "specified" rather than "selected"?

In some places, particularly Exp.2, the discussion section is mostly a repetition of the results. I would be interested in seeing how the current results fit into the literature/knowledge base, how they can be interpreted in light of different theories of visual attention, speculation on the practical importance of the findings and a frank account of the study limitations.

Overall Quality

Please provide your overall assessment of the quality of this report.

Overall Quality: 4

Overall Quality Free-Form Response (Please outline specific concerns, if any, that you feel must be addressed prior to release of report):

I am hopeful that the comments can be used to enhance the paper or, in the least, point out areas where limitations of the study can be expressed.

Please rate your level of expertise on this report's topic or methodology: 6

REVIEWER 2

Written Communication

Does the report communicate its ideas, actions and results in a clear and understandable way? Consider these questions as you evaluate the proposal: Does the report reference and utilize the latest available theory, research and experimental design and analysis procedures appropriate for this application? Would you be able to closely replicate the study from the details provided? Does the report present a logical flow from introduction to methodology to results and conclusions? Would someone not expert on this topic be able to understand the conclusions clearly?

Written Communication: 6

Written Communication Free-Form Response (Please elaborate on the questions posed above as well as other noteworthy criterion-related positive or concerning elements, including justification for the provided rating):

The report addresses a challenging topic and the authors should be commended for their clear and direct explanation of the approach and results. The writing is nearly flawless and the report is written in a way that is largely free of technical jargon. Where technical terms are used they are generally defined clearly. The report is both technically precise and accessible to a broad audience.

The introduction provides a good summary of related research concerning the distraction potential of billboards, adequately covering the relatively limited research on this topic. The introduction might benefit from a discussion of the billboard properties relative to the cognitive processes underlying visual attention and glance behavior. Not including this information is likely a strategic decision of the authors and might be outside the scope of this report; however, such a review has important implications for selecting measures and describing billboards used in this study. For example, exogenous cues, such as abrupt onsets are known to capture attention. The periodic change of a CEVMS might generate such an abrupt onset cue that could capture drivers' attention in a way that conventional billboards do not. Likewise, the content of the billboards might act as an endogenous cue that influences the duration drivers attend to the billboard. A billboard that includes a sentence that drivers' read might lead to more and longer glances than a billboard with a picture.

The research questions might have been operationalized differently if the report had had a greater emphasis on the psychology of attention. Considering this property of visual attention might motivate a different statistical analysis that could reveal the potential of these signs to distract that might otherwise go undetected. In addition, a deeper discussion of billboard characteristics relative to the properties of visual attention would have important implications for how the results of this study can be generalized. Placing the contents of this report in the context of the theories of visual attention and visual sampling could enhance its value, but at the risk of making less accessible to a broad audience.

The report follows a generally logical flow that is oriented around three clearly defined questions. The introduction, method, results and discussion are all clearly linked to these questions. The only gap in this flow concerns the labeling and linking of experiments 1 and 2. From the introduction, it seems that experiment 1 and 2 simply represent two samples to assess the generalizability of the results. This could be more clear in the stated objectives of experiment 2 and in the associated discussion. One could even argue that the two data sets could be merged and the city where the data were collected could be treated as random variable. If the current structure is maintained, I suggest that "Experiment 1" and "Experiment 2" be replaced with a more descriptive title-maybe the city where the data were collected or the fact that experiment 2 is a replication. Slightly more emphasis in the discussion of experiment 2 results and in the general discussion should be placed on the concept of generalization and the degree that experiment 2 replicated experiment 1.

Some minor issues include:

  1. Change rate is mentioned in the methods (e.g., Table 2 page 21) but not discussed in the introduction or considered in the analysis. This could be a critical variable that merits more discussion and analysis.
  2. The sign description in Table 2 should include its maximum visual angle rather than just physical dimensions and distance from the road. This information might be included in Table 3 and combined with these data and the onset of changes in the sign to describe sign salience, which one would expect to be associated with the likelihood and duration of a glance to the sign.
  3. The report describes "scaling" (page 25) data collection zones in terms of visual complexity, but this is not actually documented in the report. Does this refer to using subband entropy as a covariate? Maybe use "described" rather than "scaled". It would also be more effective to include these data in Table and report the min, max, and mean.
  4. Procedures should include the level of service of the traffic on the roads traversed.
  5. Data collection should describe the precise degree of counterbalancing achieved, rather than "roughly one half…" Maybe a table with the number of participants in each condition?
  6. The analysis on page 28 to justify the resolution of the eye tracker is not quite appropriate. Rather than the visual angle of the billboard, it seems that the pertinent measurement is the visual angle of the distance between the billboard and the forward roadway. Both will give you a similar answer, but it seems the issue is differentiating between glances to the road and to the billboard.
  7. The list of glance locations on page 29 should match the glance locations reported in the results.
  8. Page 30, Gaze calculation paragraph and elsewhere, the title needs to be bold and the paragraph not.
  9. The basis of the standard error bars is not clear. These should use the appropriate error term from the ANOVA rather than the standard deviation divided by the square root of n. A sentence could help clarify.
  10. Table 5 reports total gazes, which are confounded by the number of sign occurrences. This clouds the true contribution of CEVMS and might unnecessarily confuse the reader.
  11. Page 37, "did not significantly influence" should be "did not differentially influence" both types did seem to influence to a similar degree.
  12. Page 42, the mention of "these are small percentages…" seems unnecessary and undermines the focus of the statistical comparison, which is whether they are different from zero, but whether they are different from each other. The difference between them is approximately 75%, which is not small.
  13. Page 43, "Clutter was defined in terms…" this sentence is not consistent with the definition of clutter in terms of subband entropy. This should be reconciled here and in the method section.
  14. Page 48, visual complexity, as reflected by subband entropy, is based on a single measure for the entire zone. This merits justification because I would expect the measure to vary considerably across the zone.
  15. Page 49, "a large number of participants". It seems that most of the participants were recruited from the university given than the mean age was 22. The discussion should mention that this population is substantially different from that in experiment 2, where the mean age was above 40. This represents a major confound.
  16. Page 59, first line: This statement is not clear because it seems that complexity was considered in both studies in a similar way. If this statement is true then it should be stated as an objective of Experiment 2. The statement that subband entropy correlated well with the categorization of data collection zones does not seem to be supported by the data. A formal analysis is needed to justify this statement and to support use of complexity as an independent variable in experiment 2.
  17. Page 61, I suggest using each question as a heading for the paragraph that directly addresses it. The current discussion is more of a summary of results rather than a direct response to the questions.
  18. Page 62, the argument that visual clutter affects search performance does not address why it was included in this study. The issue in the study seems to be not whether visual complexity might cause drivers to have a hard time finding the billboard. Rather it seems that the issue is whether some element of the billboards will draw drivers' attention from the road. The statement that drivers might have difficulty finding street signs in a highly cluttered seems reasonable, but this is not clearly relevant to the issue in this study. This paragraph needs to be more clear in stating that billboards can contribute to visual clutter and undermine search.

Research Quality

Does the study utilize appropriate experimental methodology and/or statistical analyses for the nature of the topic and data, and specifically to address the stated goals? Were the samples adequate? Are the glance duration comparison thresholds reasonable and appropriate? Does the report sufficiently and appropriately explain the practical implications of the results?

Research Quality: 2

Research Quality Free-Form Response (Please elaborate on the questions posed above as well as other noteworthy criterion-related positive or concerning elements, including justification for the provided rating):

The authors provide a clear rationale for the choice of experimental approach-a controlled field study with a high-resolution eye tracker. This approach addresses the limits of previous research and represents a very reasonable tradeoff between the experimental control of a simulator and the ecological validity of the actual roadway. This method is clearly appropriate for the research questions.

A field study of eye glance behavior to naturally occurring billboards presents substantial logistical and experimental design challenges. The report is impressive in how expertly these challenges have been addressed. Arguably some measures might have addressed the questions more directly could have been collected, but were not. For example, the importance of attending to the road depends on whether there is a vehicle ahead of the driver. A likely failure mode associated with billboard-related distraction would be rear-end crashes, and so the degree to which different types of billboards lead to less attention to the road ahead when there is a vehicle ahead might be more relevant than overall attention to the road ahead. If dynamic changes in the sign act as an exogenous cue that involuntarily attracts drivers' attention, then CEVMS might pose a greater distraction than would a conventional billboard. The study should mention this possibility and the level of service of the roadways under study.

The report does not describe the power of the statistical tests nor the rationale for the sample size. This seems to be an important omission because some will certainly read this report with the aim of gathering evidence to prove the null hypothesis-to show that CEVMS are not different than conventional billboards and do not diminish attention to the forward roadway. Including appropriate statistics in the results section and expanding the discussion to address this issue seems critical to answer the questions that motivated the study and support the conclusions of the report.

The definition of visual complexity is interesting, but also problematic, particularly in experiment 2. Figure 27 shows measures of complexity as defined by subband entropy that define the high and low levels of this factor used in the analysis of variance. This figure suggests several problems with this variable. First, the differences between sign conditions are not uniform, which confound complexity and sign conditions. In addition, the level of low complexity in the no advertising condition is approximately equal to the high complexity condition in the CEVMS condition. Either subband entropy does not characterize the complexity of these conditions well or the actual levels of complexity do not represent distinct experimental conditions. Furthermore, even if the mean values of complexity differed, it seems that the complexity associated with each sign differed (although the data are not presented) leading to overlapping distributions of complexity for the high and low complexity conditions. Defining complexity as discrete experimental condition seems inappropriate. Considering complexity as a covariate might be more appropriate.

Given the similarity of experiment 1 and 2, it seems that complexity should be used as a covariate in both experiments. This, and follow-up analyses are particularly important in experiment 1 where natural and built-up conditions are compared with the sign conditions. It is hard to know how to interpret these differences if the surrounding environment of the signs is not described. The conclusions that might be drawn would be quite different if the conditions were built up vs natural.

Beyond the issue of how complexity is included in the statistical analysis, it seems the introduction or method sections should include more detail concerning the choice of subband entropy. Give the characteristics of the driving scene and the nature of driving-related search tasks the subband metric seems less appropriate than the feature congestions metric. This is a minor detail but one that merits a few sentences of discussion in the introduction. Aside from the specific measure of visual complexity that is considered, it would be useful to elaborate more on why such a measure is important. The paper mentions several times that complexity is confounded with hazard likelihood, but does not provide a complete description of what this measure adds to the analysis.

I would expect a more systematic mapping between the general research questions and the measures used to address them. Specifically, "do drivers look more at…" this could be addressed in terms of: glance frequency per sign, average glance duration, total glance duration. Likewise, "are there long glances" could be addressed in terms of: glance frequency per sign longer than 1.6 seconds and 2.0 s, 95% glance duration, and kurtosis. The tradeoff between road and billboard should be more precisely defined in terms of: the conditional probability of a glance away from the road given a vehicle ahead and frequency of glances to mirrors. Mirror checking behavior seems like a useful measure of a potential tradeoff with attention to the billboards and also seems more safety critical than gauge checking.

The most concerning issue of the report is the analysis and reporting of the eye glance data. The data reduction (down sampling from 60 Hz to 25 Hz, page 28) and the coding of gaze location without respect for saccade and fixation duration seem problematic, page 30. Down sampling certainly undermines the accuracy of gaze duration and might introduce artifacts. The reference used to justify the coding without considering fixations and saccades is focused on different measures (i.e., percent road center) and does not justify such an analysis for glance duration for interpretation as in this study. This is important because the data are being compare to other studies that have code glances differently. For such a comparison it seems that the SAE standard should be followed in defining gaze/glance duration.

The details of gaze definition and data reduction used to estimate gaze duration might be trivial concerns of only theoretical importance; however this does not seem to be the case in this study. The gaze measures reported in this study do not seem plausible and do not seem appropriate to address the questions of the study.

I may have missed the discussion, but it seems the report does not define or calculate gaze duration in a way that can be used to address the questions. Considering gaze duration directed to a billboard, I assume that this is the time that a driver's eyes are fixated on the sign. I would expect gaze duration to be comprised of at least one fixation and likely more than one. This measure should also include either the saccade to or away from the sign, ideally both because during that time the driver is not picking up information from the road. If this is the case, then the minimum gaze duration would need to be one fixation plus two saccades. Saccade duration is proportional to visual angle and for a five-degree saccade it would be approximately 40ms. Fixation duration is typically 300 ms with a lower limit of 200 ms. The saccades and fixations of a glance/gaze to a sign would result in a minimum off road glance to a billboard of approximately 280 to 380 ms-substantially longer than the 40 ms minimum mentioned on page 30 or the mean glance duration of 50 to 80 ms. Assuming a glance is often comprised of several fixations, the average glance duration should be closer to 600-900 ms. Below is a typical report of fixation durations, which are necessarily shorter than glance duration. The data reported as average glance durations are not plausible. Consequently, conclusions based on the glance duration distributions, such as Figure 29, are inconsistent with previous research, do not support comparison to the 1.6 and 2.0 s metrics of glance duration associated with increased risk, and are unlikely to be replicated.

This graph depicts the difference in fixation durations between highways and non-highway roadways.  The first highway study  indicated that the imagery fixation was longer (400ms) than the no task (350ms) or verbal task (330ms) indicators.  The second highway study indicated that the imagery fixation was longer (375ms) than the no task (300ms) or verbal task (325ms) indicators. The first non-highway roadway study  indicated that the imagery fixation was longer (400ms) than the no task (350ms) or verbal task (330ms) indicators.  The second non-highway roadway study indicated that the imagery fixation was longer (375ms) than the no task (300ms) or verbal task (325ms) indicators.

Findings from a comparable study show mean glances that are an order of magnitude larger than those reported in this study. Both this and the preceding figure are from experiments using a similar methodology: on-road controlled experiments.

This graph compares the mean single glance duration in the direction of the baseline eyesight, a comparison event, a digital billboard and a static billbaord.  The mean baseline glance was 0.63 seconds (the lowest), the mean conventional billboard glance was 0.73 (the second to lowest), the mean comparison event glance was 0.87 (the second to highest) and the mean digital billboard glance was 0.92 (the highest).

Overall Quality

Please provide your overall assessment of the quality of this report.

Overall Quality: 4

Overall Quality Free-Form Response (Please outline specific concerns, if any, that you feel must be addressed prior to release of report):

The greatest strengths of the report are the clear research questions and the very effective writing. The experimental design and data collection address the considerable challenges of this research issue very nicely.

I have minor concerns with the theoretical framing (e.g., no discussion of the underlying mechanisms of visual attention) and associated failure to consider potentially important variables limit the generalizability of the results and undermine its practical and scientific contribution. As an example, it would be very useful to consider the potential for CEVMS changes to act as an exogenous cue that is more likely to pull drivers' attention away from the road when there are vehicles ahead, compared to conventional billboards.

The data concerning gaze duration should not be included in the report without careful assessment. Either the definition of gaze duration in this study is considerably different than that used in other studies or there has been an error in data reduction that makes the gaze duration variable meaningless relative to the 1.6 and 2.0 second thresholds that have been used in the past. Previous studies report mean glance durations collected in similar situations that exceed those in this report by an order of magnitude. More generally, average glance durations less than 250 ms are not plausible and undermine the credibility of the other results.

REVIEWER 3

[Note: Reviewer 3 only sent comments and did not use the provided feedback form so there are no scores.]

This document details the findings of a review of FHWA report FHWA-HEP-11-014. The first section provides overall impressions of the research and the report, and details major concerns. The remaining sections provide critique of sections of the report, offering suggestions for improvement or clarification and raising questions for the authors to consider in a revision.

Reviewers' overall impression of report

The research seems to have been conducted very carefully and cautiously. The report is well written generally and is detailed in explaining the equipment, instrumentation, sampling, and methods used. The empirical study aims to answer the objective, which is to investigate the relationship between various types of billboards and driver behavior, specifically glance behavior.

There is room for improvement, however, and the documenting of potential limitations. First, the relationship really sought after is crash risk and billboard type, not glance behavior and billboard type. The relationship between glance behavior and crash risk rests on the results of at most 3 studies, is vaguely and somewhat clumsily described, yet entirely dictates the conclusions. For example, it does not seem intuitive that the relationship between glance behavior and crash risk is the same for all drivers-younger and older drivers may have significantly increased risk with 1 second glances, for example. Perhaps for these cohorts 2 second glances are 10 times as likely to result in a crash. Importantly, how were these relationships determined in prior studies? Because this relation is so important here, more detail needs to be provided as to how reliable this relationship is. It also needs to be continuous.

Limitations associated with not collecting additional data should be described. It is not known, for example, if there are other measures of distraction (deceleration rates, following distances, lane drift, etc.) that differed across signs. Might these other unobserved measures also inform the crash risk question?

The question of cognitive load has not been addressed. Do CEVMS signs present equivalent cognitive loads as do static billboards? For example, we know that cell phone conversations yield different cognitive engagement than do conversations with a passenger. Is it possible that dynamic messages take more cognitive skills to process, beyond that measured by differences in glance duration? There might be research to inform this debate, but this discussion and literature search is missing from the manuscript.

While this reviewer would agree that glance duration appears to be for the most part statistically indistinguishable based on these results (although always higher than for static signs), it is not clear that the underlying relationship between crash risk and CEVMS has been revealed or articulated. It is a bit of a larger leap than I'm willing to take, conditional on the discussion and evidence provided in the research report, to trust the vague relationship between crash risk and glance duration which underpins the entire effort. The authors may be able to rectify this; however, it will require spending more time developing this relationship and ruling out cognitive load increases from CEVMS.

Moreover, in cases when differences were significant, the authors should calculate the increased risk using a straight line function rather than the step function assumed in this research (e.g. 2 second threshold). This will enable enumeration of the increased risk given the assumed relationship. If there is uncertainly in this relationship (glance duration and crash risk), then sensitivity analysis could be performed to establish a range of increased risks of the sample. This approach would serve to acknowledge the uncertainty in this underlying relationship that drives the entire research.

Introduction

The report is focused on the evaluation of CEVMS in light of current federal guidance that limits the frequency of message change to once per 8 seconds. Concerns have been raised as to whether a changing sign will distract drivers from the driving task and increase crash risk. The study is an empirical evaluation of LED billboards and their impact of driving behavior.

Literature review

The authors report that the literature is mixed; however, it appears to be mixed between neutral and negative effects of CEVMS, and so tends to favor an outcome of increased glance duration when taken in aggregate (e.g. a meta analysis). Moreover, the authors did not mention whether anyone has examined cognitive load of static versus dynamic signs-glance behavior may only be part of the inattention prompted by the signs, the other being increased cognitive load. Thus, glance behavior may not be the sole measure of ‘distraction' induced by the signs. The authors might wish to discuss this and identify research that might inform this question.

Study approach

It is not entirely clear why the instrumented vehicle approach is ‘most effective' for the purposes of the study, although it clearly will inform the topic. While the behavior exhibited by drivers may be an indicator of risk, it is an indirect measure of risk with a possibly complicated relationship between glance behavior and crash risk. A more direct observation might be a carefully controlled before-after study at sites with and without CEVMS, where crash risk could be measured directly instead of indirectly. While the point of this comment is not to argue which approach is superior, it is intended to suggest that some empirical results based on crash statistics could inform this research. While it is recognized that a large portion of PDO crashes are not reported, this is not the case for injury and fatal crashes-the ones of greatest impact. It is not suggested that the authors do anything about this in particular-it is simply a comment mainly to enlighten the murky relationship between glance behavior and crashes. The authors acknowledge this on page 12, first paragraph. It is important also to note that Klauer's conclusions, if taken at face value, that glances lasting more than 2 seconds double the crash risk, then glances shorter than this will also increase crash risk and the function between crash risk and glance duration is a linear, smooth one. Important questions are at what glance length does crash risk begin to increase (.5 seconds, 1 second, 1.5 seconds,...), how was this relationship determined, and how does this relationship differ across driving cohorts?

It in fact appears that the entire study design is based on the results of Klauer, as described above. The study design therefore rests on the fundamental assumption that long glances are sufficient for estimating the driving risk associated with CEVMS. As described later in the study, the main outcome measure is glance duration. This study would be sufficiently improved if more studies could be sited to support the relationship between glance duration and crash risk. This fundamental relationship between crash risk and CEVMS is pivotal in all conclusions resulting from the study design exploiting this same relationship-thus the more support there is for the relationship, and the more detail that is known about the relationship, the more credible will be the results.

It is a bit surprising that the researchers did not consider adding a camera to the instrumented vehicle that observed the driver actions during glancing or a black box to monitor vehicle inputs during glancing. For example, were decelerations greater during glances to CEVMS compared to static billboards-perhaps indicative of greater levels of distraction? Did drivers veer out of their travel lanes more when looking at CEVMS compared to static signs? Unfortunately these kinds of questions cannot be answered by the study. It is of course too late to improve the study, but perhaps this could be identified as a limitation.

Results and Discussion

Table 4 has limited value without some measure of standard deviation of these estimates. Is a 1% difference between "road ahead" between CEVMS and Standard Billboards a meaningful difference or within the noise? I would suggest adding standard errors to the table (could be obtained via resampling if necessary). Figure 14 is improved, and shows as suspected that much of the differences are not statistically significant.

Again on page 35 (first paragraph) the authors refer to a single, non peer reviewed study, as the sole basis for establishing the link between crash risk and glance duration. Again, at 2 seconds the ‘risk' is presumably doubled. This ‘step' relationship appears inadequate to assess risk here. Risk of course is not constant up to 1.9999 second long glances and then doubled once 2.0 is reached. Risk is increased (again, assuming the one published study is accurate) as a smooth function starting at some lower value (e.g. 1.0 second) and continues to increase. If this relationship was known (or postulated), one could calculate the increased risk of the entire sample of drivers much more accurately. Currently, a discrete threshold of 2 seconds is being used, and thus underestimates the increased risk of the entire sample.

Billboard complexity appears to affect the glance duration of drivers, although the standard deviation in percentage time is large. The variation is also larger for complex billboards, suggesting that some drivers have relatively long glance durations compared to static billboards.

Same comment for Tables 12 and 13 as Table 4.

General Discussion

The authors finally site more references to established glance duration and risk on page 61. Why these sources were not mentioned previously or delved into more deeply early on is a bit of a mystery. The authors should spend some time trying to articulate this very important relationship on which all the conclusions rest. In this discussion section the authors now mention 1.6 seconds as a critical value, in addition to 2.

This section seems to be void of a credible self critique of the study. There are no study limitations identified with the sample, the methods, the assumptions, or conclusions. The authors should provide realistic limitations of the work and are encouraged to consider some of the comments provided within this review for informing expansion of this section.

Updated: 09/05/2014
HEP Home Planning Environment Real Estate
Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000