U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations
|This report is an archived publication and may contain dated technical, contact, and link information|
Publication Number: FHWA-RD-03-065
Date: September 2004
In-Vehicle Display Icons and Other Information Elements: Volume I
PDF Version (8.33 MB)
PDF files can be viewed with the Acrobat® Reader®
Introduction: Augmenting icons with auditory information refers to including some type of auditory signal with an icon to make the message clearer or more salient. Almost all of the literature suggests that operator performance can be improved by combining auditory and visual messages. These channels can be used together to provide either redundant or complimentary cues to the driver.
Priority is a function of the urgency of a response and the consequences of failing to make a response.
Complexity is a function of how much information is being provided and how difficult it is to process. The phrase "information units" is used to describe the amount of information presented in terms of key nouns and adjectives contained within a message. The design guideline entitled "Design of Speech Messages" on page 6-14 provides a tool for determining the number of information units.
Discussion: It is widely believed that combining an auditory and visual presentation of information could improve operator performance. Reference 1 recommends that the auditory modality be used as: (1) an auditory prompt to look at a visual display, or (2) supplemental information for a visual display. Providing information in this redundant fashion will lessen the need for a driver to scan the visual display and allow him or her to review the information if it is not fully understood or remembered. Reference 2 emphasizes the importance of redundant coding by stating that presenting information in the auditory and visual modalities will accommodate transient shifts in noise within the processing environment (e.g., visual glare, background noise, verbal distractions), which may influence one format or another. Display format redundancy also accommodates the strengths and abilities of different population groups (e.g., high spatial ability vs. high verbal ability).
Design Issues: Reference 3 suggests that, to determine the most appropriate display modality for presenting a particular information element, it is extremely important to predict whether the driver will need the information predrive or in-transit. Then, based upon other issues such as the complexity and urgency of the information, a decision can be made regarding which modality will accomplish the goal with the least amount of compromise to driver safety.
In reference 4, a driving simulator was used to study the benefits of multimodal displays (both auditory and visual). The multimodal displays were associated with better driving performance than auditory-only or visual-only displays, as well as better performance on a navigation task. Both the multimodal and auditory-only displays were associated with better emergency responses than the visual-only display.
Conveying Urgency with Icons, p. 5-14; Determining the Appropriate Auditory Signal, p. 6-4; Design of Speech Messages, 6-14
Introduction: To determine the appropriate auditory signal means to choose the type of signal (simple tone, earcon, auditory icon, or speech message) that will best augment the visual message presented to the driver. The following auditory signals represent the most frequently used options:
Discussion: According to reference 3, there are a limited number of tones (five to six) that are absolutely recognizable; therefore, they are not a good choice for presenting quantitative information. Also, unless they are presented in close temporal sequence, it is difficult to make qualitative judgments regarding deviations. They are good, however, for gaining the attention of the driver, whether it be simply for the purpose of getting him or her to attend to information being presented or to warn of an impending danger. Like tones, earcons are also limited because it is difficult to make qualitative judgments regarding deviations from a desired state or value. It is also difficult to obtain accurate quantitative information for earcons. Earcons are most effective when presenting a family of related sounds (see reference 4). One powerful feature associated with the use of earcons is that "related information can be given related sounds and hierarchies of information can be represented" (reference 5). They are extremely flexible. However, their meaning is not apparent and must be learned. Therefore, they are not a good choice for presenting critical, time-dependent information to the driver. Auditory icons are most effective when they can be mapped to everyday, naturally occurring sounds (see reference 2). When this is the case, they are extremely easy for the user to both learn and remember. They have been shown to be successful in collision warning applications (see references 6 and 7) in reducing reaction times to collision events. The problem with auditory icons, however, is that not all information items to be presented in IVIS systems can be mapped to a naturally occurring sound. In these instances, the designer has to create metaphors for the icons, which can end up being just as abstract as a pure tone or earcon. Speech messages are most effective for rapid, but not automatic, communication of complex, multidimensional information; the meaning of the message is intrinsic in the signal and context, and minimum learning is required. However, speech messages can be inefficient, more easily masked, and have problems associated with repeatability and confusions with other sounds in the automobile such as conversations and noise from the radio.
Design Issues: Some advantages and disadvantages associated with the use of each of the methods for presenting auditory information are given above. This is by no means an exhaustive literature review associated with the use of the auditory modality, but a tool for aiming the designer in the most appropriate direction. It should also be mentioned that the auditory signals discussed are being presented as a method for augmenting visual messages or to act as a redundant cue, not as a sole means for presenting in-vehicle information to the driver.
Cross References: Chapter 6: The Auditory Presentation of In-Vehicle Information
Introduction: Simple tones are auditory signals that convey information through the use of single or grouped frequencies presented simultaneously. For the purposes of this guideline document, simple tones are discussed as a means for augmenting the visual presentation of in-vehicle messages and are not meant to be used as the only means for presenting in-vehicle messages.
Discussion: Simple tones are similar to arbitrary symbols and only become meaningful through learning. Their main function is to alert the driver to a situation or event. The event could be an impending collision, or it could be simply a display of additional information via text, voice messages, or even in-dash indicators. There are many instances for which a simple tone may be appropriate; however, it is important to limit their use to no more than six per display (reference 1).
Design Issues: One of the central problems associated with a simple auditory tone is loudness. In the vehicle, the noise level is constantly changing; the driver may be speeding down the interstate with the windows down, chatting with a passenger, listening to the radio, or sitting quietly at a stoplight. In each of these situations, the appropriate level for presenting auditory information varies. Warnings that are too loud can: (1) be shut off; (2) cause the driver to be attending to the warning when he/she should be attending to the situation it is warning of; (3) distract from the main task; or (4) startle the driver, causing an inappropriate response. However, warnings that are not loud enough are likely to be missed. Therefore, determining the appropriate auditory threshold is extremely important. See references 2 and 3 for guidelines regarding the range for predicting thresholds and constructing auditory warning systems.
Cross References: Determining the Appropriate Auditory Signal, p. 6-4
Introduction: Complex tones are auditory signals that present information through the use of a hierarchical nesting of pulses and bursts that combine to form the signal or sound. The parameters of complex tones can have an important effect on perceived urgency, annoyance, and appropriateness.
Figure 6-1. Temporal Parameters of Auditory Signals - Pulse, Burst, and Sound Parameters Defined Graphically
Temporal parameters of the pulse, burst, and sound:
Discussion: Pulses combine to form bursts and bursts combine to form the overall auditory signal. Pulses, bursts, and the sound all have temporal sound parameters that affect confusion, urgency, and annoyance. This hierarchy of sound parameters is defined by the timescale, where the timescale of the pulse is from 100 to 300 ms and the timescale of the burst is 500 to 2,000 ms. The complete warning signal ranges from 2,500 ms to tens of seconds.
The harmonic content that defines the timbre or formant also has a powerful effect on the perception of the alert (reference 1). A signal composed of a harmonic frequency series is substantially less urgent than one composed of a random or partially random frequency series (reference 1). The formant determines the characteristic quality of vowel sounds and is composed of several frequency regions of relatively great intensity. More specifically, a formant is a resonant peak in the frequency spectrum of a voice and is a critical acoustic feature of most speech sounds. Formants are different than more common combinations of frequency components, such as harmonic series. Rather than being equally spaced across the frequency spectrum, formants are distributed in an apparently random distribution. The critical attribute of a formant is that it determines the characteristic quality of vowel sounds produced by humans. Formants reflect anatomical properties of the vocal tract and are fundamental characteristics of natural speech sounds and seem more likely to influence emotional content of a sound compared to artificial frequency combinations, such as octaves. Research has shown that formants affect the perception of sound characteristics related to urgency and annoyance (reference 2).
Design Issues: The parameters that affect perception at the level of the sound may not have the same effect at the level of the burst. For example, pulse onset has a different effect than the onset of the entire signal. A slow onset at the level of the pulse increases urgency, whereas longer onsets at the burst or signal level decrease urgency. This result suggests that empirical findings regarding the effect of temporal parameters on pulse perception may not generalize to burst perception.
Cross References: The Auditory Presentation of In-Vehicle Information, p. 6-1; Perceived Urgency of Auditory Signals, p. 6-16; Perceived Annoyance of Auditory Signals, p. 6-22
Introduction: Earcons refer to auditory signals that present information through abstract musical tones that can be used in structured combinations to create auditory messages (reference 1). Earcons are also sometimes referred to in the literature as complex tones. Earcons have five parameters that can be modified to create different messages: rhythm, pitch, timbre, register, and dynamics.
A motive, the building block of an earcon, is defined as "a rhythmicized sequence of pitches. Rhythm and pitch are the fixed parameters of a motive, while timbre, register, and dynamics are the variable parameters of motives" (reference 2).
Discussion: Earcons are said to be a powerful and flexible means for creating auditory messages (see references 2, 3, 4, and 5). Reference 3 argues that the advantages associated with the use of earcons are clear: (1) they are easily constructed on any workstation or personal computer; (2) the sounds do not have to correspond to the objects they represent, so objects that either make no sound or that make an unpleasant sound can be represented; and (3) studies have shown that they are preferred over other types of auditory communication (see reference 6). The main disadvantage, however, is that, like simple tones, earcons must be learned. Their meaning is not inherent in the signal.
Design Issues: Reference 4 presents extensive, very specific guidelines for developing earcons, examining such issues as the psychoacoustical characteristics of sound, the formal arrangement of sounds into earcons, and the meaning and interpretation of earcons. Reference 5 presents slightly more general information. However, the concepts discussed require some knowledge of the parameters associated with sound. References 4 and 5 argue that experts, such as professional composers, should be included on any design team that is attempting to construct earcons. "The science of sound is a highly technical, diverse, and complicated discipline. Only an expert in this field understands the existence, importance, implications, and consequences of and the means of dealing with, the many perceptual problems and intricacies of sound" (reference 2).
For the purposes of this guideline document, earcons are discussed as a means of augmenting the visual presentation of in-vehicle messages and are not meant to be used as the only way to present in-vehicle messages.
Determining the Appropriate Auditory Signal, p. 6-4; Perceived Urgency of Auditory Signals, p. 6-16
Introduction: Auditory icons are familiar environmental sounds that intuitively convey information about the object or action that they represent (reference 1). They are sometimes also referred to in the literature as naturalistic sounds or earcons. The three types of auditory icon are iconic, metaphorical, and symbolic.
The figure below (from reference 2) demonstrates some of the performance improvements that might be obtained when using an auditory icon for a warning component.
Figure 6-2. Brake Reaction Times for Different Warning Sounds (from Reference 2)
Discussion: The goal of auditory icon design is to map the attributes of a computer event to some everyday sound-producing event (see references 1 and 3). This makes auditory icons extremely easy for users to learn and remember, as their meaning is inherent. Perhaps this is why they have been examined for use in collision warning applications. One could argue that drivers' responses would be based on experiences in which they have heard these sound occur naturally, thus their responses will be faster. This has, in fact, been shown to be the case. Reference 2 describes a study in which drivers were required to carry out a tracking task while at the same time attending to a road scene interspersed with imminent collisions. They were asked to respond to each collision warning they were given and determine the appropriate braking response. Four collision warnings were tested (a simple tone; a speech warning "ahead"; the sound of a car horn; and the sound of skidding tires). Results of the study showed that braking reaction times were faster for the auditory icons than for the more traditional warning sounds. Another study described in reference 4, found similar results. Braking reaction times for collision warnings using auditory icons were shown to be significantly less than for conventional collision warnings (tones).
Design Issues: While the experiments described above show an improved braking reaction time associated with the use of auditory icons in collision warning applications, the icons are not necessarily the best choice for presenting this type of information. In addition to producing faster braking reaction times, they also produced a higher number of inappropriate reactions due to startle effects (e.g., slamming on brakes for a low-level warning; see reference 2). This type of reaction could actually negate any benefits of having a collision warning system and potentially put the driver's safety at risk. Ensuring that the appropriate level of urgency is projected to the driver is a very important design issue. References 5 and 6 suggest that factors such as the frequency, amplitude, envelope shape, and melodic structure of a warnings can all affect perceived urgency. Thus, altering certain sound parameters may allow a designer to reduce the startling affect of these type of warnings.
For the purposes of this guideline document, auditory icons are discussed as a means for augmenting the visual presentation of in-vehicle messages and are not meant to be used as a sole means for presenting in-vehicle messages.
Determining the Appropriate Auditory Signal, p. 6-4; Perceived Urgency of Auditory Signals, p. 6-16
Introduction: Speech messages refer to auditory signals that present information through voice messages that add information beyond pure sound. For the purposes of this guideline document, speech is discussed as a means for augmenting the visual presentation of in-vehicle messages and is not meant to be used as the only means of presenting in-vehicle messages.
Discussion: Speech displays are an effective means for communicating information to the driver. In addition to warning, they can be used to provide responses to user queries and feedback from control inputs. Warnings, however, have received the majority of attention in speech display research. They are effective in that they not only alert the driver to an emergency situation, but they also provide additional information about the nature of the problem (reference 1). However, the added length of the message can increase the driver's response time. Therefore, an important tradeoff exists between comprehension and clarity (i.e., message length) and driver response times. The guidelines given on the previous page should aid designers in making this tradeoff.
Design Issues: When presenting messages that do not require immediate action, reference 2 suggests several options exist for helping the driver use the information: (1) present the information in the order of importance or relevance to the driver; (2) present the most important information at either the beginning or the end of the message because it is easiest to recall; (3) highlight the most important parts of the message; (4) provide a means for repeating the message-this is especially helpful for older drivers; and (5) provide a redundant visual presentation of the information-this is also helpful for older drivers.
One important design decision is whether to include an alerting tone before presenting a voice message. References 3, 4, and 5 found that voice warnings preceded by an alerting tone did not produce faster response times than the voice warning by itself. However, in one study, an alerting tone actually increased response time (see reference 3). Reference 6 supports the notion that synthesized speech is distinctive from human speech and can perform an alerting function in addition to transferring the pertinent information to the driver. This is another reason for making sure that we do not try to make the speech warnings sound too human. A machine-like voice will better cue the driver to its identity.
Another important consideration when determining whether to use speech displays is driver acceptance. Existing research indicates that speech displays should be used sparingly because the auditory channel can quickly become cluttered or overloaded with stimuli (references 7, 8, and 9). Speech displays are inherently intrusive and have a tendency to annoy the user if they are presented too frequently. In fact, speech displays used in certain aircraft applications have even been disabled so that the pilots would not have to listen to the chatter of redundant or irrelevant messages. Because of the potential problems of acceptance, speech displays should only be used when the visual modality is overloaded, and they should always be accompanied by a visual representation so that the information can be referred to again at a later time (reference 9).
Determining the Appropriate Auditory Signal, p. 6-4
Introduction: Perceived urgency of auditory signals refers to the subjective impression of urgency that a signal gives to the person hearing it. The goal is to match the urgency of the auditory signal to the urgency of the situation to which it pertains. This is called "urgency mapping." Reference 1 has shown that the perceived urgency of an auditory signal can be directly manipulated by changing certain temporal or melodic parameters, such as speed, rhythm, number of units, speed change, fundamental frequency, pitch range, pitch contour, and musical structure.
Figure 6-3. Example of Using Steven's Power Law for Producing Urgency Exponents
(see references 2 and 3 for more detailed explanations and examples)
Discussion: The perceived urgency of auditory signals has been researched to some extent in the past 10 years. The results have shown that varying certain acoustical parameters has a strong and consistent effect on a person's subjective impression of the urgency of the warning. Reference 1 provides designers with a database concerning the subjective ratings and rankings of the perceived urgency of many of the temporal and melodic parameters. Designers can use this information to produce warnings with the appropriate levels of perceived urgency. This is extremely important, as new research has discovered that increases in the perceived urgency of a warning correlates with faster reaction times (see references 4 and 5). Therefore, if auditory signals are designed with urgency mapping in mind, more effective warnings can be developed.
Design Issues: References 2 and 3 show how, using Steven's power law, certain quantifiable sound parameters such as speed, number of repetitions, and frequency can be scaled and compared directly. In reference 3, urgency exponents for speed, number of repetitions, and frequency were calculated to be 1.35, 0.5, and 0.38, respectively. Because speed has a higher urgency exponent, it means that the subjective assessment of urgency changes faster as the change in speed increases. Reference 6 states that these results imply that a small change in the speed of a warning increases its urgency considerably, whereas a much larger change in the number of repetitions would be required to produce the same change. The ability to quantify this subjective assessment allows designers of IVIS to develop a set of auditory signals that would sound different but, through a manipulation of certain parameters, would have the same urgency.
Reference 7 suggests that urgency mapping tests be carried out on sets of auditory signals being used as alarms, especially if they are abstract alarm sounds. The first step in this process is to get a group of people who have a good working knowledge of the environments in which the alarms will be used to rate the situational urgency of the referents on a scale from 1 to 5. The second step is to have a different group of people rate on a scale from 1 to 5 the psychoacoustical urgency of the auditory warnings, without any knowledge of the referents. The next step is to correlate the two measures. If there is a significant correlation, then the designer may decide to make little or no changes to the alarm. However, if there is no correlation or a negative correlation, then the designer should modify the alarm. The list on the previous page may help a designer choose which parameters to alter and how to alter them to make an alarm more or less urgent.
Cross References: Design of Earcons, p. 6-10; Design of Auditory Icons, p. 6-12
Introduction: Automatic Speech Recognition (ASR) devices recognize human speech and, in an in-vehicle context, treat speech commands as inputs to an IVIS device. Currently, ASR is viewed as an enabling technology in intelligent transportation system (ITS) development. Many of the state-of-the-art advances associated with ITS involve complex in?vehicle devices that include a host of functions including: navigation, e-mail, motorist services, internet access, cellular phone capabilities, "infotainment", and fax. Such broad functionality is associated with greater perceptual, information processing, and psychomotor demands on the driver, and presents a crucial challenge to IVIS developers. ASR is viewed as a means to allow the driver to interact with the IVIS device, while maintaining his/her eyes on the road and hands on the wheel. Moreover, recent advances in ASR techniques (speaker independent devices, increased vocabulary size, reduced processing time, noise?filtering and word?matching algorithms) suggest that it may be an attractive alternative to traditional approaches to the driver-vehicle interface (DVI).
Discussion: Reference 1 discusses a variety of issues and research results associated with speech controls, and some of the guidelines above have been adapted from the design principles presented in reference 1 and, to a lesser extent, reference 2.
Reference 3 investigated ASR performance using a recorded, multispeaker database and seven candidate microphone positions. Evaluation criteria included signal?to?noise ratio (SNR) and recognition rate. A range of locations (e.g., center dashboard, ceiling near rearview mirror, visor on headliner in front of the driver, over driver head, and on the steering wheel) were investigated. Although all microphone positions were roughly near the driver, great variability was reported (between 0 percent and 10 percent) in error rates across the seven positions. The location on the forward portion of the vehicle headliner, right in front of the driver, gave the best combined results for both SNR and recognition rates. However, this issue has not been extensively studied and the optimum microphone location may vary across different in-vehicle applications.
Reference 4 investigated six options for providing feedback with an ASR device. The options included auditory or visual feedback, delay prior to feedback, and no feedback. In the study, subjects entered fields of data into an ASR device. These data consisted of alphanumeric characters, words, or numbers plus words, and were of varying length. With no feedback, only 70 percent of the entered fields were error?free; the average number of correctly entered fields across the feedback conditions (any feedback) was 97 percent. The authors noted some tradeoffs between visual and auditory feedback, with visual word feedback being optimal when a large visual display is available to the user. However, in situations with small displays or when visual overload of the user is a concern (such as in the in?vehicle environment), auditory feedback is recommended.
Design Issues: As noted in reference 1, key issues in the design and implementation of ASR systems include:
In addition, use of ASR does not ensure safe operation of an in-vehicle device. Even simple speech requires cognitive operations by the user (e.g., recalling phone numbers). In general, ASR is best used as a redundant source of input by the driver (i.e., an alternate manual means should be provided to the driver as well).
Chapter 10: Sensory Modality Design Tool
Introduction: The timing of auditory navigation information refers to the time or distance at which the in-vehicle navigation system should present an auditory instruction to the driver before an approaching navigation maneuver (e.g., a required turn).
Figure 6-4. Equations for Determining the Appropriate Timing of an Instruction
Discussion: In reference 1, subjects were asked to give a subjective rating of the timeliness of auditory navigation instructions (1 = much too early to 6 = much too late). From the subjects' ratings, regression lines were plotted. Three separate equations were developed for calculating the distance at which navigation information should be given regarding an approaching turn onto a side road, while traveling at different speeds.
Reference 2 conducted a similar study aimed at determining the last possible moment at which a subject would feel comfortable hearing an auditory navigational instruction. The results of this study indicated that, traveling at speeds of 65 kph (40 mph), the recommended distance for giving navigational instructions before a turn is 137 meters (450 feet). However, it is necessary to make adjustments for other speeds (15 feet for each mile per hour/4.58 meters for each kilometer per hour); age of driver (up to 36 meters (119 feet)); the direction of turn, left or right (left turns require more warning distance); and gender of the driver. The results of this study are similar to those found in reference 1 but were determined to be more difficult to apply to the general driver population.
Design Issues: The applicability of these guidelines to visual guidance messages is uncertain. Since visual information (with no accompanying auditory alert) is likely to be perceived later than auditory messages, the distances recommended above may have to be increased somewhat to account for this delay. Turning off the current route is only one type of maneuver. Many other types (i.e., turning at a T-intersection, or an existing freeway) should be studied separately to determine which factors will affect them. The results of these studies could then be combined with the above guidelines to determine the appropriate timings for any possible type of combination of maneuvers.
In reference 3, it was recommended that if two maneuvers are less than 10 seconds apart, the two instructions should be given together, before the first maneuver. This is referred to as "stacking" the messages. Reference 1 gave a similar recommendation, stating that when the distance between two subsequent maneuvers is less than the minimum preferred distance for that speed, the instructions should be stacked.
Introduction: Perceived annoyance of auditory signals refers to the subjective annoyance associated with particular signal characteristics. Although many sound parameters that increase urgency also increase annoyance careful design can create highly urgent sounds that are not overly annoying. The goal is to minimize the annoyance associated with a warning, balanced by the need to match the urgency of the signal to the urgency of the situation. This is called "annoyance tradeoff" and should be considered in signal design.
Discussion: Substantial research has shown that sound parameters can effect perceived urgency (reference 1). The urgency mapping principle states that the urgency of the sound should a match the urgency of its referent. Like urgency, annoyance is systematically affected by sound parameters (references 2 and 3). Recently, research has shown that perceived annoyance of a sound is not completely dependent on the parameters that affect urgency (reference 4). Some sound parameters, such as those listed in the table in the guideline, can increase urgency substantially, while increasing annoyance relatively little. Using these parameters, it is possible to design a highly urgent sound that is less annoying than other sounds with the same perceived urgency. In general, design of highly urgent signals involves a tradeoff between urgency and annoyance, but some sound parameters can help minimize the annoyance of highly urgent sounds.
Recent results also show that the importance of annoyance is greater when designing sounds for benign alerts (reference 4). Perceived annoyance is a strong predictor of perceived appropriateness for auditory signals for benign events, whereas perceived urgency is a strong predictor of perceived appropriateness for auditory signals for critical events. The figure in the guideline shows that this relationship is quite robust, with perceived annoyance accounting for 67 percent of the variance of perceived appropriateness for email alerts and only 9 percent of the variance for a collision avoidance warning. Conversely, perceived urgency accounts for almost 90 percent of the variance of perceived appropriateness of collision avoidance warnings. This relationships shows that designing to minimize annoyance can be as critical as designing to map urgency.
Design Issues: Sound perception and perceived urgency and annoyance are somewhat dependent on the context and intended message of the signal (reference 5). This makes it critically important to evaluate the sounds generated using these guidelines in the driving context. Even using imagined driving scenarios in a laboratory situation affected the perceived urgency and annoyance of sounds (reference 4). In addition, urgency mapping and the annoyance tradeoff are only two considerations in creating useful auditory alerts. A critical consideration for a situation that contains multiple alerts is the ability of drivers to discriminate and recognize multiple auditory signals, as described earlier in the section Design of Complex Tones.
Cross References: The Auditory Presentation of In-Vehicle Information, p. 6-1; Design of Complex Tones, p. 6-8; Perceived Urgency of Auditory Signals, p. 6-16