This report is an archived publication and may contain dated technical, contact, and link information

Federal Highway Administration >
Publications >
Research Publications >
03065 >
06.Cfm >
In-Vehicle Display Icons and Other Information Elements: Volume I

Publication Number: FHWA-RD-03-065
Date: September 2004

In-Vehicle Display Icons and Other Information Elements: Volume I

PDF files can be viewed with the Acrobat® Reader®

AUGMENTING ICONS WITH AUDITORY INFORMATION

Introduction: Augmenting icons with auditory information refers to including some type of auditory signal with an icon to make the message clearer or more salient. Almost all of the literature suggests that operator performance can be improved by combining auditory and visual messages. These channels can be used together to provide either redundant or complimentary cues to the driver.

Design Guidelines
Use the auditory modality for presenting high priority alerts and warnings; present additional contextual information visually. Use auditory prompts when a previously static visual display changes. Use auditory prompts when high priority information is automatically displayed. Use a combination of visual and auditory prompts to repeating low complexity messages.

Table 6-1. Heuristics for Assessing Priority

Priority is a function of the urgency of a response and the consequences of failing to make a response.

High Priority	Low Priority
Fast response needed (0-5 minutes)	No response needed (5 min +)
Serious consequences (death or injury)	No immediate consequences
Examples: Notification of serious traffic conditions that may affect the safety of the driver or mechanical problems that could impact the safety of the driver or the condition of the vehicle	Examples: Vehicle maintenance schedules, or weather information

Complexity is a function of how much information is being provided and how difficult it is to process. The phrase "information units" is used to describe the amount of information presented in terms of key nouns and adjectives contained within a message. The design guideline entitled "Design of Speech Messages" on page 6-14 provides a tool for determining the number of information units.

Table 6-2. Heuristics for Assessing Complexity

High Complexity	Low Complexity
>9 information units	3-5 information units
Processing time >5s	Processing time <5 s
Examples: Transit schedules in area along route or routing restrictions for specific vehicle cargos	Examples: Directions of turns or estimates of travel costs

Discussion: It is widely believed that combining an auditory and visual presentation of information could improve operator performance. Reference 1 recommends that the auditory modality be used as: (1) an auditory prompt to look at a visual display, or (2) supplemental information for a visual display. Providing information in this redundant fashion will lessen the need for a driver to scan the visual display and allow him or her to review the information if it is not fully understood or remembered. Reference 2 emphasizes the importance of redundant coding by stating that presenting information in the auditory and visual modalities will accommodate transient shifts in noise within the processing environment (e.g., visual glare, background noise, verbal distractions), which may influence one format or another. Display format redundancy also accommodates the strengths and abilities of different population groups (e.g., high spatial ability vs. high verbal ability).

Design Issues: Reference 3 suggests that, to determine the most appropriate display modality for presenting a particular information element, it is extremely important to predict whether the driver will need the information predrive or in-transit. Then, based upon other issues such as the complexity and urgency of the information, a decision can be made regarding which modality will accomplish the goal with the least amount of compromise to driver safety.

In reference 4, a driving simulator was used to study the benefits of multimodal displays (both auditory and visual). The multimodal displays were associated with better driving performance than auditory-only or visual-only displays, as well as better performance on a navigation task. Both the multimodal and auditory-only displays were associated with better emergency responses than the visual-only display.

Cross References:

Conveying Urgency with Icons, p. 5-14; Determining the Appropriate Auditory Signal, p. 6-4; Design of Speech Messages, 6-14

References:

Dingus, T. A., and Hulse, M.C. (1993). Some human factors design issues and recommendations for automobile navigation information systems. Transportation Research, 1C(2), 119-131.
Wickens, C. D. (1987). Information processing, decision-making, and cognition. In G. Salvendy (Ed.), Handbook of human factors (pp. 549-574). New York: J. Wiley & Sons.
Mollenhauer, M. A., Dingus, T. A., and Hulse, M. C. (1995). Recommendations for sensory mode selection for ATIS displays. Proceedings of the Institute of Transportation Engineers 65th Annual Meeting: A Compendium of Technical Papers, 667-672.
Liu, Y., and Dingus, T. A. (1997). Development of human factors guidelines for advanced traveler information systems and commercial vehicle operations: Human factors evaluation of the effectiveness of multi-modality displays in advanced traveler information systems (FHWA-RD-150). Washington, DC: Federal Highway Administration.

DETERMINING THE APPROPRIATE AUDITORY SIGNAL

Introduction: To determine the appropriate auditory signal means to choose the type of signal (simple tone, earcon, auditory icon, or speech message) that will best augment the visual message presented to the driver. The following auditory signals represent the most frequently used options:

Simple tones:	Single or grouped frequencies presented simultaneously.
Earcons:	Musical tones that can be used in structured combinations to create auditory messages (reference 1). These are sometimes referred to as complex tones.
Auditory icons:	Familiar environmental sounds that intuitively convey information about the object or action they represent (reference 2). These are sometimes referred to as naturalistic sounds or earcons, and are intuitively recognizable.
Speech messages:	Voice messages that add information beyond pure sound

Design Guidelines

Use simple tones and auditory icons when an immediate response is required.
Earcons should be used when it is important for the driver to know that pieces of information are related.
Auditory icons are effective for use in collision-warning applications (i.e., horn or skidding tires).
Use speech messages when a high degree of message flexibility is required.
Use speech messages when a high degree of message detail is required.
Use speech messages when the meaning of tones or other sounds may be forgotten under stress.
Use speech messages when the auditory message deals with a future point in time for which there must be some preparation (i.e., time or distance to turn).
Speech message displays should not be used for time-critical tasks

Table 6-3. Ratings of Audio Signals for Various Functions.

Functions	Example Message	Simple Tones	Earcons	Auditory Icons	Speech Messages
Status indication	Navigation system on and functioning	Good	Good	Fair	Poor
Alerting (attentional)	Generic warning indicator (to divert attention to a display)	Good	Fair	Poor	Poor
Warning (informational)	Rear-end collision-avoidance warning indicator	Fair	Poor	Good	Fair
Presentation of qualitative information	Location of next available lodging	Poor	Poor	Poor	Good
Presentation of quantitative information	Cost of upcoming toll bridge	Poor	Poor	Poor-Fair	Good

Discussion: According to reference 3, there are a limited number of tones (five to six) that are absolutely recognizable; therefore, they are not a good choice for presenting quantitative information. Also, unless they are presented in close temporal sequence, it is difficult to make qualitative judgments regarding deviations. They are good, however, for gaining the attention of the driver, whether it be simply for the purpose of getting him or her to attend to information being presented or to warn of an impending danger. Like tones, earcons are also limited because it is difficult to make qualitative judgments regarding deviations from a desired state or value. It is also difficult to obtain accurate quantitative information for earcons. Earcons are most effective when presenting a family of related sounds (see reference 4). One powerful feature associated with the use of earcons is that "related information can be given related sounds and hierarchies of information can be represented" (reference 5). They are extremely flexible. However, their meaning is not apparent and must be learned. Therefore, they are not a good choice for presenting critical, time-dependent information to the driver. Auditory icons are most effective when they can be mapped to everyday, naturally occurring sounds (see reference 2). When this is the case, they are extremely easy for the user to both learn and remember. They have been shown to be successful in collision warning applications (see references 6 and 7) in reducing reaction times to collision events. The problem with auditory icons, however, is that not all information items to be presented in IVIS systems can be mapped to a naturally occurring sound. In these instances, the designer has to create metaphors for the icons, which can end up being just as abstract as a pure tone or earcon. Speech messages are most effective for rapid, but not automatic, communication of complex, multidimensional information; the meaning of the message is intrinsic in the signal and context, and minimum learning is required. However, speech messages can be inefficient, more easily masked, and have problems associated with repeatability and confusions with other sounds in the automobile such as conversations and noise from the radio.

Design Issues: Some advantages and disadvantages associated with the use of each of the methods for presenting auditory information are given above. This is by no means an exhaustive literature review associated with the use of the auditory modality, but a tool for aiming the designer in the most appropriate direction. It should also be mentioned that the auditory signals discussed are being presented as a method for augmenting visual messages or to act as a redundant cue, not as a sole means for presenting in-vehicle information to the driver.

Cross References: Chapter 6: The Auditory Presentation of In-Vehicle Information

References:

Brewster, S. A., Wright, P. C., and Edwards, A. D. (1993). An evaluation of earcons for use in auditory human-computer interfaces. INTERCHI '93, 222-27.
Gaver, W. W. (1986). Auditory icons: Using sound in computer interfaces. Human-Computer Interaction, 2(2), 167-177.
Advanced Systems Technology Branch. (1993). Preliminary human factors design standards for airway facilities (ACD-350). Atlantic City International Airport, NJ: Federal Aviation Administration Technical Center.
Blattner, M. M., Sumikawa, D. A., and Greenberg, R. M. (1989). Earcons and icons: Their structure and common design principles. Human-Computer Interaction, 4, 11-44.
Brewster, S. A., Wright, P. C., and Edwards, A. D. N. (1994). A detailed investigation into the effectiveness of earcons. In G. Kramer (Ed.), Auditory display: Sonification, audification, and auditory interfaces, Volume XVII. Reading, MA: Addison Wesley.
Graham, R., Hirst, S. J., and Carter, C. (1995). Auditory icons for collision avoidance warnings. Proceedings of the ITS America 1995 Annual Meeting, 1057-1063.
Belz, S. M., Robinson, G. S., and Casali, J. G. (1998). Auditory icons as impending collision system warning signals in commercial motor vehicles. Proceedings of the Human Factors and Ergonomics Society 42nd Annual Meeting, 1127-1131.

DESIGN OF SIMPLE TONES

Introduction: Simple tones are auditory signals that convey information through the use of single or grouped frequencies presented simultaneously. For the purposes of this guideline document, simple tones are discussed as a means for augmenting the visual presentation of in-vehicle messages and are not meant to be used as the only means for presenting in-vehicle messages.

Design Guidelines

Appropriate loudness levels are 15-25 decibels (dB) above the predicted masked threshold.
Auditory warning signals should be less than 30 dB above the masked threshold to minimize operator annoyance and the disruption of communication.
The pitch of warning sounds should be between 150 and 1000 Hertz (Hz).
Continuous tones should be avoided because they are usually high pitched and aversive, prevent communication if they are loud, and are easy to habituate because they never change

When more than one tone is used:

Avoid tones with the same on/off ratio.
Avoid tones that share the same temporal pattern.
Avoid tones that begin in the same way (i.e., with a long tone).
No more than 6 simple tones should be used

Bar graph. This bar graph indicates that design guidelines were based equally on expert judgment and experimental data.

Table 6-4. Advantages and Disadvantages Associated with the Use of Simple Tones

Advantages of Simple Tones	Disadvantages of Simple Tones
Can serve an alerting function. Can increase detectability of messages. Can produce faster reaction times. Useful in situations where the noise environment is too complex for adequate voice warnings	Difficult to establish appropriate loudness levels, especially in the constantly changing in-vehicle environment. Can be confusing because: (1) Their meaning is not inherent. (2) There may be too many to remember. Number of tones that can be reliably discriminated is low. Can induce startle responses.· Difficult to prioritize or determine perceived urgency

Discussion: Simple tones are similar to arbitrary symbols and only become meaningful through learning. Their main function is to alert the driver to a situation or event. The event could be an impending collision, or it could be simply a display of additional information via text, voice messages, or even in-dash indicators. There are many instances for which a simple tone may be appropriate; however, it is important to limit their use to no more than six per display (reference 1).

Design Issues: One of the central problems associated with a simple auditory tone is loudness. In the vehicle, the noise level is constantly changing; the driver may be speeding down the interstate with the windows down, chatting with a passenger, listening to the radio, or sitting quietly at a stoplight. In each of these situations, the appropriate level for presenting auditory information varies. Warnings that are too loud can: (1) be shut off; (2) cause the driver to be attending to the warning when he/she should be attending to the situation it is warning of; (3) distract from the main task; or (4) startle the driver, causing an inappropriate response. However, warnings that are not loud enough are likely to be missed. Therefore, determining the appropriate auditory threshold is extremely important. See references 2 and 3 for guidelines regarding the range for predicting thresholds and constructing auditory warning systems.

Cross References: Determining the Appropriate Auditory Signal, p. 6-4

References:

Advanced Systems Technology Branch. (1993). Preliminary human factors design standards for airway facilities (ACD-350). Atlantic City International Airport, NJ: Federal Aviation Administration Technical Center.
Patterson, R. D. (1982). Guidelines for auditory warnings systems on civil aircraft (Civil Aviation Authority paper 82017). London: Civil Aviation Authority.
Zwicker, E., and Scharf, B. (1965). A model of loudness summation. Psychological Review, 72, 3-26.

DESIGN OF COMPLEX TONES

Introduction: Complex tones are auditory signals that present information through the use of a hierarchical nesting of pulses and bursts that combine to form the signal or sound. The parameters of complex tones can have an important effect on perceived urgency, annoyance, and appropriateness.

Design Guidelines

The amplitude envelope of the initial pulse should include a 20 millisecond (ms) onset to reduce startle effects.
The pulse should be composed of multiple frequency components, such as formants or harmonics, to mitigate masking due to background noise.
The temporal pattern of auditory signals should be as distinct as possible, otherwise confusion is likely, even if the spectral content is substantially different

Figure 6-1. Temporal Parameters of Auditory Signals - Pulse, Burst, and Sound Parameters Defined Graphically

Temporal parameters of the pulse, burst, and sound:

Duration: The time from the beginning to the end of a pulse or burst.
Inter-pulse interval: The time between the end of one pulse or burst and the beginning of the next.
Inter-burst interval: The time between the end of one pulse or burst and the beginning of the next.
Speed: The time between the beginning of a pulse or burst to start of the next.
Density: Pulse or burst duration divided by pulse speed.
Onset time: The time from the start of the pulse or burst until it reaches maximum output.
Offset time: The time in which the pulse or burst falls from maximum output to zero.
Duty cycle: Number of pulses per second.

Discussion: Pulses combine to form bursts and bursts combine to form the overall auditory signal. Pulses, bursts, and the sound all have temporal sound parameters that affect confusion, urgency, and annoyance. This hierarchy of sound parameters is defined by the timescale, where the timescale of the pulse is from 100 to 300 ms and the timescale of the burst is 500 to 2,000 ms. The complete warning signal ranges from 2,500 ms to tens of seconds.

The harmonic content that defines the timbre or formant also has a powerful effect on the perception of the alert (reference 1). A signal composed of a harmonic frequency series is substantially less urgent than one composed of a random or partially random frequency series (reference 1). The formant determines the characteristic quality of vowel sounds and is composed of several frequency regions of relatively great intensity. More specifically, a formant is a resonant peak in the frequency spectrum of a voice and is a critical acoustic feature of most speech sounds. Formants are different than more common combinations of frequency components, such as harmonic series. Rather than being equally spaced across the frequency spectrum, formants are distributed in an apparently random distribution. The critical attribute of a formant is that it determines the characteristic quality of vowel sounds produced by humans. Formants reflect anatomical properties of the vocal tract and are fundamental characteristics of natural speech sounds and seem more likely to influence emotional content of a sound compared to artificial frequency combinations, such as octaves. Research has shown that formants affect the perception of sound characteristics related to urgency and annoyance (reference 2).

Design Issues: The parameters that affect perception at the level of the sound may not have the same effect at the level of the burst. For example, pulse onset has a different effect than the onset of the entire signal. A slow onset at the level of the pulse increases urgency, whereas longer onsets at the burst or signal level decrease urgency. This result suggests that empirical findings regarding the effect of temporal parameters on pulse perception may not generalize to burst perception.

Cross References: The Auditory Presentation of In-Vehicle Information, p. 6-1; Perceived Urgency of Auditory Signals, p. 6-16; Perceived Annoyance of Auditory Signals, p. 6-22

References:

Edworthy, J., and Adams, A. (1996). Warning design: A research perspective. Bristol, PA: Taylor and Francis.
Stanford, L. M., McIntyer, J. W. R., Nelson, T. M., and Hogan, J. T. (1988). Affective responses to commercial and experimental auditory alarm signals for anesthesia delivery and physiological monitoring equipment. International Journal of Clinical Monitoring and Computing, 5, 111-118.

DESIGN OF EARCONS

Introduction: Earcons refer to auditory signals that present information through abstract musical tones that can be used in structured combinations to create auditory messages (reference 1). Earcons are also sometimes referred to in the literature as complex tones. Earcons have five parameters that can be modified to create different messages: rhythm, pitch, timbre, register, and dynamics.

Rhythm: Whole note, dotted half note, half note, quarter note, dotted eighth note, eighth note, sixteenth note.
Pitch: Eight octaves of 12 pitches each.
Timbre: Sinusoidal, sawtooth, triangular, rectangular.
Register: Low, medium, high.
Dynamics: Soft, medium, loud, soft to loud, loud to soft.

A motive, the building block of an earcon, is defined as "a rhythmicized sequence of pitches. Rhythm and pitch are the fixed parameters of a motive, while timbre, register, and dynamics are the variable parameters of motives" (reference 2).

Design Guidelines

Use synthesized musical timbres that are subjectively easy to tell apart (i.e., organ and brass).
Do not use pitch alone to distinguish between tones unless there are very significant differences. Some suggested ranges for pitch are max: 5 kHz and min: 125 Hz-150 Hz.
Use tones that are three or more octaves apart.
Make the rhythm as different as possible. Putting a different number of notes in each rhythm is effective.
Some suggested ranges for intensity are max: 20 dB above threshold and min: 10 dB above threshold.
When playing combinations of multiple earcons, a gap of 0.1 second should be between them so that the user can tell where one finishes and another starts

Table 6-5 Three Methods for Constructing Earcons

Method	Description	Example
Combining	The process of combining to create an earcon means linking different motives together in a chain-like sequence. Let A and B be earcons that represent different messages. A and B can be combined by linking A and B to form a third earcon AB.	A = urgent B = e-mail AB = urgent e-mail
Transforming	The process of transformation cosmetically alters a motive by changing its timbre, register, and/or tempo. However, it is important not to alter the motive beyond recognition. The semantic implications are a change of state in an object, not a change of object. Earcon A may be transformed into earcon B by modification in the construction of A.	A = system up A’ or B = system down
Inheriting	The process of inheriting is one in which a single earcon is heard in an increasingly complex chain. Imagine a tree of earcons with a family motive at the root. The next level adds pitch to the rhythm of the family motive. At the next level, a recognizable timbre is added.	A = the family motive (i.e., in-vehicle messaging) A + pitch = AB (i.e., message received) AB + timbre = ABC (i.e., message forwarded)

Discussion: Earcons are said to be a powerful and flexible means for creating auditory messages (see references 2, 3, 4, and 5). Reference 3 argues that the advantages associated with the use of earcons are clear: (1) they are easily constructed on any workstation or personal computer; (2) the sounds do not have to correspond to the objects they represent, so objects that either make no sound or that make an unpleasant sound can be represented; and (3) studies have shown that they are preferred over other types of auditory communication (see reference 6). The main disadvantage, however, is that, like simple tones, earcons must be learned. Their meaning is not inherent in the signal.

Design Issues: Reference 4 presents extensive, very specific guidelines for developing earcons, examining such issues as the psychoacoustical characteristics of sound, the formal arrangement of sounds into earcons, and the meaning and interpretation of earcons. Reference 5 presents slightly more general information. However, the concepts discussed require some knowledge of the parameters associated with sound. References 4 and 5 argue that experts, such as professional composers, should be included on any design team that is attempting to construct earcons. "The science of sound is a highly technical, diverse, and complicated discipline. Only an expert in this field understands the existence, importance, implications, and consequences of and the means of dealing with, the many perceptual problems and intricacies of sound" (reference 2).

For the purposes of this guideline document, earcons are discussed as a means of augmenting the visual presentation of in-vehicle messages and are not meant to be used as the only way to present in-vehicle messages.

Cross References:

Determining the Appropriate Auditory Signal, p. 6-4; Perceived Urgency of Auditory Signals, p. 6-16

References:

Brewster, S. A., Wright, P. C., and Edwards, A. D. (1993). An evaluation of earcons for use in auditory human-computer interfaces. INTERCHI '93, 222-227.
Blattner, M. M., Sumikawa, D. A., and Greenberg, R. M. (1989). Earcons and icons: Their structure and common design principles. Human-Computer Interaction, 4, 11-44.
Blattner, M. M. (1993). Sound in the multimedia interface (LLNL TR W-7405-Eng-48). Livermore, CA: Lawrence Livermore National Laboratory.
Sumikawa, D. A. (1985). Guidelines for the integration of audio cues into computer user interfaces (UCRL 53656). Livermore, CA: Lawrence Livermore National Laboratory.
Sumikawa, D. A., Blattner, M. M., Joy, K. I., and Greenberg, R. M. (1986). Guidelines for the syntactic design of audio cues in computer interfaces (UCRL-92925-REV.1). Livermore, CA: Lawrence Livermore National Laboratory.
Jones, S. D., and Furner, S. M. (1989). The construction of audio icons and information cues for human computer dialogues. Contemporary Ergonomics 1989. Proceedings of the Ergonomics Society's 1989 Annual Conference, 436-441.

DESIGN OF AUDITORY ICONS

Introduction: Auditory icons are familiar environmental sounds that intuitively convey information about the object or action that they represent (reference 1). They are sometimes also referred to in the literature as naturalistic sounds or earcons. The three types of auditory icon are iconic, metaphorical, and symbolic.

Iconic auditory icons sound like the object or action they represent (e.g., the sound of a crash to indicate a collision warning).
Metaphorical auditory icons sound like some element of the object or action they represent (e.g., the sound of children to indicate a school crossing).
Symbolic auditory icons rely on social convention for meaning (e.g., the sound of a siren to indicate an ambulance approaching).

The figure below (from reference 2) demonstrates some of the performance improvements that might be obtained when using an auditory icon for a warning component.

Design Guidelines

By definition, auditory icons must be identifiable as having relevance or conveying some inherent meaning (see above for examples of iconic, metaphorical, and symbolic auditory icons).
Auditory icons should be detectable 10 to 20 dB above the masked threshold.
No more than six auditory icons should be used in an auditory icon set.
Auditory icons should strive to attract the attention of the driver without generating a startle reaction. Special attention should be paid to the perceived urgency associated with different candidate auditory icons.

Figure 6-2. Brake Reaction Times for Different Warning Sounds (from Reference 2)

Discussion: The goal of auditory icon design is to map the attributes of a computer event to some everyday sound-producing event (see references 1 and 3). This makes auditory icons extremely easy for users to learn and remember, as their meaning is inherent. Perhaps this is why they have been examined for use in collision warning applications. One could argue that drivers' responses would be based on experiences in which they have heard these sound occur naturally, thus their responses will be faster. This has, in fact, been shown to be the case. Reference 2 describes a study in which drivers were required to carry out a tracking task while at the same time attending to a road scene interspersed with imminent collisions. They were asked to respond to each collision warning they were given and determine the appropriate braking response. Four collision warnings were tested (a simple tone; a speech warning "ahead"; the sound of a car horn; and the sound of skidding tires). Results of the study showed that braking reaction times were faster for the auditory icons than for the more traditional warning sounds. Another study described in reference 4, found similar results. Braking reaction times for collision warnings using auditory icons were shown to be significantly less than for conventional collision warnings (tones).

Design Issues: While the experiments described above show an improved braking reaction time associated with the use of auditory icons in collision warning applications, the icons are not necessarily the best choice for presenting this type of information. In addition to producing faster braking reaction times, they also produced a higher number of inappropriate reactions due to startle effects (e.g., slamming on brakes for a low-level warning; see reference 2). This type of reaction could actually negate any benefits of having a collision warning system and potentially put the driver's safety at risk. Ensuring that the appropriate level of urgency is projected to the driver is a very important design issue. References 5 and 6 suggest that factors such as the frequency, amplitude, envelope shape, and melodic structure of a warnings can all affect perceived urgency. Thus, altering certain sound parameters may allow a designer to reduce the startling affect of these type of warnings.

For the purposes of this guideline document, auditory icons are discussed as a means for augmenting the visual presentation of in-vehicle messages and are not meant to be used as a sole means for presenting in-vehicle messages.

Cross References:

Determining the Appropriate Auditory Signal, p. 6-4; Perceived Urgency of Auditory Signals, p. 6-16

References:

Gaver, W. W. (1986). Auditory icons: Using sound in computer interfaces. Human-Computer Interaction, 2(2), 167-177.
Graham, R., Hirst, S. J., and Carter, C. (1995). Auditory icons for collision avoidance warnings. Proceedings of the ITS America 1995 Annual Meeting, 1057-1063.
Gaver, W. W. (1989). The sonic finder: An interface that uses auditory icons. Human-Computer Interaction, 4(1), 67-94.
Belz, S. M., Robinson, G. S., and Casali, J. G. (1998). Auditory icons as impending collision system warning signals in commercial motor vehicles. Proceedings of the Human Factors and Ergonomics Society 42nd Annual Meeting, 1127-1131.
Edworthy, J., Loxley, S., and Dennis, I. (1991). Improving auditory warning design: Relationship between warning sound parameters and perceived urgency. Human Factors, 33(2), 205-231.
Hellier, E. J., Edworthy, J., and Dennis, I. (1993). Improving auditory warning design: Quantifying and predicting the effects of different warning parameters on perceived urgency. Human Factors, 35(4), 693-706.

DESIGN OF SPEECH MESSAGES

Introduction: Speech messages refer to auditory signals that present information through voice messages that add information beyond pure sound. For the purposes of this guideline document, speech is discussed as a means for augmenting the visual presentation of in-vehicle messages and is not meant to be used as the only means of presenting in-vehicle messages.

Design Guidelines

If speech must be used in a time-critical application (i.e., warning), the message should be kept to a single word or a short phrase with the fewest number of syllables possible.
Messages that are not urgent or for which a response may be delayed can be a maximum of seven units of information in the fewest number of words possible. If the information cannot be presented in a short sentence, the most important information should be presented at the beginning and/or the end of the message.
Navigation instructions should be limited to three or four information units (i.e., "Accident ahead, merge right" or "Turn right in ½ mile").
Do not try to make the voice sound too human. A machine should have a machine voice to cue its identity when it speaks.
Provide a means for repeating speech messages.
Provide a redundant visual presentation of the information being presented aurally.

Table 6-6. Determing the Number of Information Units

Message Type	Number of Information Units	Example Message
Urgent message (i.e., collison warning)	1 unit	Brake
Navigation instructions	3-4 units	Road Construction Ahead at Jaspertown
Non-urgent message (i.e., motorist services)	7 units	Gas Station Ahead Exit #46 Turn Right

Table 6-7. Examples of Auditory Messages

Suggested	Not Suggested
"Oil change needed by July 1, 2003"	"Vehicle maintenance log shows that vehicle oil change is due and should be completed by July 1, 2003."
"Turn right in ½ mile"	"At the next stoplight,turn right onto Shark Lane in ½ mile"

Discussion: Speech displays are an effective means for communicating information to the driver. In addition to warning, they can be used to provide responses to user queries and feedback from control inputs. Warnings, however, have received the majority of attention in speech display research. They are effective in that they not only alert the driver to an emergency situation, but they also provide additional information about the nature of the problem (reference 1). However, the added length of the message can increase the driver's response time. Therefore, an important tradeoff exists between comprehension and clarity (i.e., message length) and driver response times. The guidelines given on the previous page should aid designers in making this tradeoff.

Design Issues: When presenting messages that do not require immediate action, reference 2 suggests several options exist for helping the driver use the information: (1) present the information in the order of importance or relevance to the driver; (2) present the most important information at either the beginning or the end of the message because it is easiest to recall; (3) highlight the most important parts of the message; (4) provide a means for repeating the message-this is especially helpful for older drivers; and (5) provide a redundant visual presentation of the information-this is also helpful for older drivers.

One important design decision is whether to include an alerting tone before presenting a voice message. References 3, 4, and 5 found that voice warnings preceded by an alerting tone did not produce faster response times than the voice warning by itself. However, in one study, an alerting tone actually increased response time (see reference 3). Reference 6 supports the notion that synthesized speech is distinctive from human speech and can perform an alerting function in addition to transferring the pertinent information to the driver. This is another reason for making sure that we do not try to make the speech warnings sound too human. A machine-like voice will better cue the driver to its identity.

Another important consideration when determining whether to use speech displays is driver acceptance. Existing research indicates that speech displays should be used sparingly because the auditory channel can quickly become cluttered or overloaded with stimuli (references 7, 8, and 9). Speech displays are inherently intrusive and have a tendency to annoy the user if they are presented too frequently. In fact, speech displays used in certain aircraft applications have even been disabled so that the pilots would not have to listen to the chatter of redundant or irrelevant messages. Because of the potential problems of acceptance, speech displays should only be used when the visual modality is overloaded, and they should always be accompanied by a visual representation so that the information can be referred to again at a later time (reference 9).

Cross References:

Determining the Appropriate Auditory Signal, p. 6-4

References:

Wogalter, M. S., and Young, S. L. (1991). Behavioural compliance to voice and print warnings. Ergonomics, 34(1), 78-89.
Ross, T., Midtland, K., Fuchs, M., Pauzie, A., Engert, A., Duncan, B., Vaughan, G., Vernet, M., Peters, H., Burnett, G., and May, A. (1996). HARDIE design guidelines handbook: Human factors guidelines for information presentation by ATT systems (DRIVE II Project V2008).
Simpson, C. A., and Williams, D. H. (1980). Response time effects of alerting tone and semantic context for synthesized voice cockpit warnings. Human Factors, 22(3), 319-330.
Hakkinen, M. T., and Williges, B. H. (1984). Synthesized warning messages: Effects of an alerting cue in single- and multiple-function voice synthesis systems. Human Factors, 26(2), 185-195.
Bucher, N. M., Karl, R. L., Voorhees, J. W., and Werner, E. (1984). Alerting prefixes for speech warning messages. IEEE Proceedings of the National Aerospace and Electronics Conference, Volume 2 (pp. 924-931).
Cowley, C. K., and Jones, D. M. (1992). Synthesized on digitized? A guide to the use of computer speech. Applied Ergonomics, 23(3), 172-176.
Wickens, C. D. (1992). Engineering psychology and human performance (2nd edition). New York: Harper-Collins.
Stokes, A., Wickens, C., and Kyte, K. (1990). Display technology: Human factors concepts. Warrendale, PA: Society of Automotive Engineers.
Wierwille, W. W. (1993). Visual and manual demands of in-car controls and displays. In B. Peacock and W. Karwowski (Eds.), Automotive ergonomics (pp. 229-320). London: Taylor and Francis.

PERCEIVED URGENCY OF AUDITORY SIGNALS

Introduction: Perceived urgency of auditory signals refers to the subjective impression of urgency that a signal gives to the person hearing it. The goal is to match the urgency of the auditory signal to the urgency of the situation to which it pertains. This is called "urgency mapping." Reference 1 has shown that the perceived urgency of an auditory signal can be directly manipulated by changing certain temporal or melodic parameters, such as speed, rhythm, number of units, speed change, fundamental frequency, pitch range, pitch contour, and musical structure.

Temporal Parameters

Melodic Parameters

Speed (slow = 1.5 pulse/sec; fast = 6 pulse/sec)

Rhythm (regular = all pulses equally spaced;

irregular = pulses not equally spaced)

Number of units (1 = 1-4 pulse burst; 4 = 4-4 pulse bursts)

Speed change (slowing down; speeding up)

Fundamental frequency (low = 200 Hz; high = 800 Hz)

Pitch range (small = 3 semitones; large = 9 semitones)

Pitch contour (down/up; random)

Musical structure (resolved = from natural scales; atonal = random sequence of pulses)

Design Guidelines

To increase the perceived urgency:

Use faster auditory signals.
Use regular rhythms.
Use a greater number of units (4).
Use auditory signals that speed up.
Use high fundamental frequencies.
Use a large pitch range.
Use a random pitch contour
Use an atonal musical structure.

To decrease the perceived urgency:

Use slower auditory signals.
Use irregular rhythms.
Use fewer number of units(1).
Use auditory signals that slow down.
Use low fundamental frequencies.
Use a small pitch range.
Use a down or up pitch contour.
Use a resolved musical structure.

Figure 6-3. Example of Using Steven's Power Law for Producing Urgency Exponents

(see references 2 and 3 for more detailed explanations and examples)

S = kO^m

(8)

Discussion: The perceived urgency of auditory signals has been researched to some extent in the past 10 years. The results have shown that varying certain acoustical parameters has a strong and consistent effect on a person's subjective impression of the urgency of the warning. Reference 1 provides designers with a database concerning the subjective ratings and rankings of the perceived urgency of many of the temporal and melodic parameters. Designers can use this information to produce warnings with the appropriate levels of perceived urgency. This is extremely important, as new research has discovered that increases in the perceived urgency of a warning correlates with faster reaction times (see references 4 and 5). Therefore, if auditory signals are designed with urgency mapping in mind, more effective warnings can be developed.

Design Issues: References 2 and 3 show how, using Steven's power law, certain quantifiable sound parameters such as speed, number of repetitions, and frequency can be scaled and compared directly. In reference 3, urgency exponents for speed, number of repetitions, and frequency were calculated to be 1.35, 0.5, and 0.38, respectively. Because speed has a higher urgency exponent, it means that the subjective assessment of urgency changes faster as the change in speed increases. Reference 6 states that these results imply that a small change in the speed of a warning increases its urgency considerably, whereas a much larger change in the number of repetitions would be required to produce the same change. The ability to quantify this subjective assessment allows designers of IVIS to develop a set of auditory signals that would sound different but, through a manipulation of certain parameters, would have the same urgency.

Reference 7 suggests that urgency mapping tests be carried out on sets of auditory signals being used as alarms, especially if they are abstract alarm sounds. The first step in this process is to get a group of people who have a good working knowledge of the environments in which the alarms will be used to rate the situational urgency of the referents on a scale from 1 to 5. The second step is to have a different group of people rate on a scale from 1 to 5 the psychoacoustical urgency of the auditory warnings, without any knowledge of the referents. The next step is to correlate the two measures. If there is a significant correlation, then the designer may decide to make little or no changes to the alarm. However, if there is no correlation or a negative correlation, then the designer should modify the alarm. The list on the previous page may help a designer choose which parameters to alter and how to alter them to make an alarm more or less urgent.

Cross References: Design of Earcons, p. 6-10; Design of Auditory Icons, p. 6-12

References:

Edworthy, J., Loxley, S., Geelhoed, E., and Dennis, I. (1989). The perceived urgency of auditory warnings. Proceedings of the Institute of Acoustics, 11(5), 73-80.
Hellier, E. and Edworthy, J. (1989). Quantifying the perceived urgency of auditory warnings. Canadian Acoustics, 17(4), 3-11.
Hellier, E., Edworthy, J. and Dennis, I. (1993). Improving auditory warning design: Quantifying and predicting the effects of different warning parameters on perceived urgency. Human Factors, 35(4), 693-706.
Haas, E. C. and Casali, J. G. (1995). Perceived urgency and response time to multi-tone and frequency modulated warning signals in broadband noise. Ergonomics, 38(11), 2313-2326.
Burt, J. L., Bartolome, D. S., Burdette, D. W., and Comstock, J. R. (1995). A psychophysiological evaluation of the perceived urgency of auditory warning signals. Ergonomics, 38(11), 2327-2340.
Edworthy, J. (1994). The design and implementation of non-verbal auditory warnings. Applied Ergonomics, 25(4), 202-210.
Edworthy, J., and Stanton, N. (1995). A user-centered approach to the design and evaluation of auditory warning signals: 1. Methodology. Ergonomics, 38(11), 2262-2280.

GENERAL DESIGN GUIDELINES FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS

Introduction: Automatic Speech Recognition (ASR) devices recognize human speech and, in an in-vehicle context, treat speech commands as inputs to an IVIS device. Currently, ASR is viewed as an enabling technology in intelligent transportation system (ITS) development. Many of the state-of-the-art advances associated with ITS involve complex in?vehicle devices that include a host of functions including: navigation, e-mail, motorist services, internet access, cellular phone capabilities, "infotainment", and fax. Such broad functionality is associated with greater perceptual, information processing, and psychomotor demands on the driver, and presents a crucial challenge to IVIS developers. ASR is viewed as a means to allow the driver to interact with the IVIS device, while maintaining his/her eyes on the road and hands on the wheel. Moreover, recent advances in ASR techniques (speaker independent devices, increased vocabulary size, reduced processing time, noise?filtering and word?matching algorithms) suggest that it may be an attractive alternative to traditional approaches to the driver-vehicle interface (DVI).

Design Guidelines

For IVIS applications, ASR devices should be used to aid complex tasks that involve high visual, cognitive, or manual requirements.
Vocabulary sets for ASR devices should: reflect natural language conventions as much as possible, avoid similar?sounding words or phrases, and be small enough so that drivers can recall command words rapidly and with few or no errors. The microphone for an ASR device should be located on the forward portion of the vehicle headliner, right in front of the driver (reference 3).
Drivers should be provided with immediate feedback (e.g., error correction, input confirmation) of the recognition results or the system's response to the speech input. Changes in the visual display itself provide a good form of feedback, but require driver head or eye movements to verify. Although any feedback will improve the driver's performance with the system, size limits on IVIS displays in the in?vehicle environment, as well as concerns about visual overload, suggest that auditory feedback should be used.

Table 6-8. Issues to Consider When Designing ASR Systems,

Task-Related Issues	Environment-Related Issues	Operator-Related Issues
Single versus dual task. Workload. Head movement requirements. Driving situation (e.g., effects of stress). Requirements for feedback. Vocabulary requirements.	External noise (e.g., traffic, road noise). Internal noise (e.g., entertainment system, conversation). Vibration. Acceleration/deceleration G-forces.	Age. Articulation. Regional accents. Level of training. Gender

Discussion: Reference 1 discusses a variety of issues and research results associated with speech controls, and some of the guidelines above have been adapted from the design principles presented in reference 1 and, to a lesser extent, reference 2.

Reference 3 investigated ASR performance using a recorded, multispeaker database and seven candidate microphone positions. Evaluation criteria included signal?to?noise ratio (SNR) and recognition rate. A range of locations (e.g., center dashboard, ceiling near rearview mirror, visor on headliner in front of the driver, over driver head, and on the steering wheel) were investigated. Although all microphone positions were roughly near the driver, great variability was reported (between 0 percent and 10 percent) in error rates across the seven positions. The location on the forward portion of the vehicle headliner, right in front of the driver, gave the best combined results for both SNR and recognition rates. However, this issue has not been extensively studied and the optimum microphone location may vary across different in-vehicle applications.

Reference 4 investigated six options for providing feedback with an ASR device. The options included auditory or visual feedback, delay prior to feedback, and no feedback. In the study, subjects entered fields of data into an ASR device. These data consisted of alphanumeric characters, words, or numbers plus words, and were of varying length. With no feedback, only 70 percent of the entered fields were error?free; the average number of correctly entered fields across the feedback conditions (any feedback) was 97 percent. The authors noted some tradeoffs between visual and auditory feedback, with visual word feedback being optimal when a large visual display is available to the user. However, in situations with small displays or when visual overload of the user is a concern (such as in the in?vehicle environment), auditory feedback is recommended.

Design Issues: As noted in reference 1, key issues in the design and implementation of ASR systems include:

Recognition accuracy: Lower accuracies will reduce system performance and user acceptance.
Background noise: Ambient noise (traffic, radio, speech displays) can interfere with ASR system performance.
Speech variability: Human speech varies considerably with respect to volume, frequency, pitch, and tone under different conditions, in addition to accents and regional variations. Speech variability can contribute to reduced recognition of speech.
Task selection: Selection of tasks for which speech should be used must reflect task characteristics and a clear understanding of the tradeoffs associated with using speech controls vs. manual controls.

In addition, use of ASR does not ensure safe operation of an in-vehicle device. Even simple speech requires cognitive operations by the user (e.g., recalling phone numbers). In general, ASR is best used as a redundant source of input by the driver (i.e., an alternate manual means should be provided to the driver as well).

Cross References:

Chapter 10: Sensory Modality Design Tool

References:

Simpson, C. A., McCauley, M. E., Roland, E. F., Ruth, J. C., and Williges, B. H. (1987). Speech controls and displays. In G. Salvendy (Ed.), Handbook of human factors (pp. 549-574). New York: J. Wiley & Sons.
McMillan, G. R., Eggleston, R. G., and Anderson, T. R. (1997). Nonconventional controls. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (pp. 729- 771). New York: J. Wiley & Sons.
Smolders, J., Claes, T., Sablon, G., and Van Compernolle, D. (1994). On the importance of the microphone position for speech recognition in the car. Proceedings of the 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, Volume 1 (pp. 429-432).
Schurick, J. M., Williges, B. H., and Maynard, J. F. (1985). User feedback requirements with automatic speech recognition. Ergonomics, 28(11), 1543-1555.

TIMING OF AUDITORY NAVIGATION

Introduction: The timing of auditory navigation information refers to the time or distance at which the in-vehicle navigation system should present an auditory instruction to the driver before an approaching navigation maneuver (e.g., a required turn).

Design Guidelines

For maneuvers defined as leaving the current route (i.e., turning onto a side road), the timing of the auditory guidance instruction can be based on the equations provided below.
It may be advisable to implement the equation for "preferred maximum distance." An instruction given slightly too early is preferable to one given too late.
When the distance between two subsequent maneuvers is less than the minimum preferred distance for that speed, the instructions are "stacked" (given during a single message).

Figure 6-4. Equations for Determining the Appropriate Timing of an Instruction

Discussion: In reference 1, subjects were asked to give a subjective rating of the timeliness of auditory navigation instructions (1 = much too early to 6 = much too late). From the subjects' ratings, regression lines were plotted. Three separate equations were developed for calculating the distance at which navigation information should be given regarding an approaching turn onto a side road, while traveling at different speeds.

Reference 2 conducted a similar study aimed at determining the last possible moment at which a subject would feel comfortable hearing an auditory navigational instruction. The results of this study indicated that, traveling at speeds of 65 kph (40 mph), the recommended distance for giving navigational instructions before a turn is 137 meters (450 feet). However, it is necessary to make adjustments for other speeds (15 feet for each mile per hour/4.58 meters for each kilometer per hour); age of driver (up to 36 meters (119 feet)); the direction of turn, left or right (left turns require more warning distance); and gender of the driver. The results of this study are similar to those found in reference 1 but were determined to be more difficult to apply to the general driver population.

Design Issues: The applicability of these guidelines to visual guidance messages is uncertain. Since visual information (with no accompanying auditory alert) is likely to be perceived later than auditory messages, the distances recommended above may have to be increased somewhat to account for this delay. Turning off the current route is only one type of maneuver. Many other types (i.e., turning at a T-intersection, or an existing freeway) should be studied separately to determine which factors will affect them. The results of these studies could then be combined with the above guidelines to determine the appropriate timings for any possible type of combination of maneuvers.

In reference 3, it was recommended that if two maneuvers are less than 10 seconds apart, the two instructions should be given together, before the first maneuver. This is referred to as "stacking" the messages. Reference 1 gave a similar recommendation, stating that when the distance between two subsequent maneuvers is less than the minimum preferred distance for that speed, the instructions should be stacked.

References:

Ross, T., Vaughan, G., and Nicolle, C. (1997). Design guidelines for route guidance systems: Development process and an empirical example for timing of guidance instructions. In Y. I. Noy (Ed.), Ergonomics and safety of intelligent driver interfaces (pp. 139-152). Mahwah, NJ: Lawrence Erlbaum Associates.
Green, P., and George, K. (1995). When should auditory guidance systems tell drivers to turn? Proceedings of the Human Factors and Ergonomics Society 39th Annual Meeting, 1072-1076.
Verwey, W. B., Alm, H., Groeger, J. A., Janssen, W. H., Kuiken, M. J., Schraagen, J. M., Schumann, J., van Winsum, W., and Wontorra, H. (1993). GIDS functions. In J. A. Michon (Ed.), Generic intelligent driver support: A comprehensive report on GIDS (pp. 113-144). London: Taylor and Francis.

PERCEIVED ANNOYANCE OF AUDITORY SIGNALS

Introduction: Perceived annoyance of auditory signals refers to the subjective annoyance associated with particular signal characteristics. Although many sound parameters that increase urgency also increase annoyance careful design can create highly urgent sounds that are not overly annoying. The goal is to minimize the annoyance associated with a warning, balanced by the need to match the urgency of the signal to the urgency of the situation. This is called "annoyance tradeoff" and should be considered in signal design.

Design Guidelines
For signals to be perceived as appropriate, highly urgent sounds should be used for highly critical situations. For signals to be perceived as appropriate, low annoyance sounds should be used for benign situations. Sound characteristics of pulse duration, burst density, sound type, and speed all increase perceived urgency more than perceived annoyance

Table 6-9. Sound Characteristics that Increase Urgency, While Having a Modest Effect on Annoyance

CHARACTERISTIC	EFFECT ON URGENCY	EFFECT ON ANNOYANCE
Duration of sound (Pulse)	Longer > > more urgent	Longer > more annoying
Burst density	High > > more urgent	High > more annoying
Speed	Faster > > more urgent	Faster > more annoying


Figure 6-5. Appropriateness Depends on Perceived Annoyance for Benign Situations (e.g., e-mail notification), Whereas Appropriateness Depends on Perceived Urgency for Highly Critical Situations, (e.g., collision avoidance)

Discussion: Substantial research has shown that sound parameters can effect perceived urgency (reference 1). The urgency mapping principle states that the urgency of the sound should a match the urgency of its referent. Like urgency, annoyance is systematically affected by sound parameters (references 2 and 3). Recently, research has shown that perceived annoyance of a sound is not completely dependent on the parameters that affect urgency (reference 4). Some sound parameters, such as those listed in the table in the guideline, can increase urgency substantially, while increasing annoyance relatively little. Using these parameters, it is possible to design a highly urgent sound that is less annoying than other sounds with the same perceived urgency. In general, design of highly urgent signals involves a tradeoff between urgency and annoyance, but some sound parameters can help minimize the annoyance of highly urgent sounds.

Recent results also show that the importance of annoyance is greater when designing sounds for benign alerts (reference 4). Perceived annoyance is a strong predictor of perceived appropriateness for auditory signals for benign events, whereas perceived urgency is a strong predictor of perceived appropriateness for auditory signals for critical events. The figure in the guideline shows that this relationship is quite robust, with perceived annoyance accounting for 67 percent of the variance of perceived appropriateness for email alerts and only 9 percent of the variance for a collision avoidance warning. Conversely, perceived urgency accounts for almost 90 percent of the variance of perceived appropriateness of collision avoidance warnings. This relationships shows that designing to minimize annoyance can be as critical as designing to map urgency.

Design Issues: Sound perception and perceived urgency and annoyance are somewhat dependent on the context and intended message of the signal (reference 5). This makes it critically important to evaluate the sounds generated using these guidelines in the driving context. Even using imagined driving scenarios in a laboratory situation affected the perceived urgency and annoyance of sounds (reference 4). In addition, urgency mapping and the annoyance tradeoff are only two considerations in creating useful auditory alerts. A critical consideration for a situation that contains multiple alerts is the ability of drivers to discriminate and recognize multiple auditory signals, as described earlier in the section Design of Complex Tones.

Cross References: The Auditory Presentation of In-Vehicle Information, p. 6-1; Design of Complex Tones, p. 6-8; Perceived Urgency of Auditory Signals, p. 6-16

References:

Edworthy, J., Loxley, S., and Dennis, I. (1991). Improving auditory warning design - Relationship between warning sound parameters and perceived urgency. Human Factors, 33(2), 205-231.
Laird, D. A., and Coye, K. (1929). Psychological measurements of annoyance related to pitch and loudness. Journal of the Acoustical Society of America, 1, 156-163.
Berglund, B., and Preis, A. (1997). Is perceived annoyance more subject-dependent than perceived loudness? Acoustica, 83(2), 313-319.
Marshall, D., Lee, J D., and Austria, A. (2001). Annoyance and urgency of auditory alerts for in-vehicle information systems. Proceedings of the 45th Annual Meeting of the Human Factors and Ergonomics Society, 2, 1627-1631.
Edworthy, J., and Adams, A. (1996). Warning design: A research prospective. Bristol, PA: Taylor and Francis.

Previous | Table of Contents | Next

Page Owner: Office of Research, Development, and Technology, Office of Safety, RDT

Topics: research, safety, operations
Keywords: research, safety, human factors, driver information, design guidelines, icon design, visual symbols, icon interpretation, icon legibility, auditory messages, general vs. specific icons, icon recognition, icon evaluation
TRT Terms: Automobiles–Instruments–Display systems, Automobiles–Electronic equipment, Electronics in navigation, Graphical user interfaces (Computer systems)–Design, Icons (Computer graphics)–Design, Highway communications, Traffic signs and signals, Information display systems, Driver information systems
Scheduled Update: Archive - No Update needed

This page last modified on 03/08/2016