U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations
|This report is an archived publication and may contain dated technical, contact, and link information|
Publication Number: FHWA-RD-95-197
Date: December 1996
Development of Human Factors Guidelines for Advanced Traveler Information Systems and Commercial Vehicle Operations: Comparable Systems Analysis
Guidelines Used in the Design of the TravTek Auditory Interface
From the outset of the TravTek project, the design of the synthesized voice interface was given a top priority. The TravTek team felt that the auditory mode, if implemented effectively, had great potential as a means of imparting complex information to the driver while allowing eyes to remain on the road. In addition, voice messages also draw the driver's attention to the fact that new information is available, so the driver need not glance at the visual display frequently to check for an update. However, unlike some ATIS systems, voice was designed for TravTek to provide a supplement to the visual display, which could also be used as a stand–alone system if desired.
The TravTek driver interface design team strived to make the application of voice to an ATIS system a desirable and useful feature to the driver (Means et al., 1992). The guiding principles that were applied to the design of the auditory interface to achieve this goal included:
The use of voice in the automotive industry will typically meet with resistance due to the belief that "people do not like talking cars." This belief has apparently arisen from negative reactions to vehicles that use voice to warn drivers of open doors or unbuckled seatbelts. An examination of the use of voice for such purposes reveals violations of some of the TravTek principles mentioned above.
Drivers may not be receptive to the use of voice for a system warning unless the condition is urgent (e.g., collision warning). In the case of an open door, a non–verbal auditory signal or a telltale on the instrument panel is probably sufficient to alert the driver to the problem. The use of voice in this instance may be perceived by the driver as "nagging," as it is more analogous to an interfering passenger than it is to a machine warning. Drivers may have various reasons for wanting to suppress a voice system at times, and these wishes must be accommodated by giving the driver control over volume as well as activation of voice functions.
As anthropomorphism is inevitable in a talking car, the TravTek design team chose a conservative approach to the sort of "personality" that may be attributed to the ATIS system. Anthropomorphism can be lessened by designing voice systems that are more machine–like than human–like in their expression. ATIS systems with excessively long voice messages, or messages that exceed strict bounds of usefulness, may be accused of "chattering" or "nagging" (Means et al., 1992). "Auditory clutter" is the term used by Stokes, Wickens, and Kite (1990) to describe the overuse of the auditory channel, resulting in potential distraction from the driving task. To minimize auditory clutter, voice feedback was avoided for correct maneuvers, driving speed, and system status (uses that are advocated by Davis, 1989). Initial road tests of the TravTek driver interface convinced the TravTek design team to further reduce the length and number of voice messages.
Computer–generated speech for ATIS systems may be either synthesized or digitized. Digitized speech has the significant advantage of intelligibility and naturalness, along with the disadvantage of prohibitive limitations in recording and storing large amounts of text. In TravTek, synthesized voice has enabled the use of a large variety of message text in an implementation that achieves an acceptable level of intelligibility.
Although the TravTek voice sounds male, this should not be interpreted as a deliberate design decision. The selection of a speech synthesis product was based on hardware requirements for durability in an automotive application. There was little choice regarding voice characteristics, and it was necessary to settle for the voice available in a product that satisfied the constraints. Although a synthesizer does allow for programmer control over voice attributes such as rate of speech, pitch, and voice gain, it did not provide a choice between a male and a female voice. It did, however, offer a choice of three voicesðCdifferentiated as "a large person,""a medium–sized person," and "a small person." The male–sounding TravTek voice is the "medium–sized person" (the other two voices are mostly unintelligible).
Regarding the issue of the intelligibility of synthesized and digitized speech, synthesized speech is decidedly less intelligible than digitized speech. The reduced intelligibility has been demonstrated to stem from the absence of many prosodic features that are found in natural human speech (Allen, 1976). The prosodic element of speech is what gives natural speech its rhythm. A state–of–the–art voice synthesizer applies some reasoning to the insertion of pauses and variations in pitch and stress; however, the inability of the system to interpret the input text severely limits the prosodic results. Many commercially available voice synthesizers provide a means of inserting prosodic markers in text, giving the programmer some control over the intonational pattern of a synthesized utterance (Means et al., 1992).
Similarly, a speech synthesizer typically contains a large dictionary of stored pronunciations for known words, as well as a program that creates a pronunciation for an unknown word based on its spelling. In English, spelling is not a good predictive measure of pronunciation, and no known algorithms will produce consistently accurate results in pronouncing unknown English words. For the TravTek voice messages, the prosody was carefully selected by a linguistic specialist to enhance intelligibility. As part of this process, the linguistic expert listened to voice synthesizer pronunciation for all words that were spoken by the system, including over 12,000 Orlando, Florida, street names, storing corrected pronunciations as needed. This effort resulted in large improvements in intelligibility, and the TravTek design team considers this preprocessing to be essential for public acceptance of synthesized voice.
Other strategies are also effective in increasing intelligibility of synthesized speech. It was found that in TravTek route guidance and traffic messages, the street names are the least intelligible part of the utterance. To aid in the comprehension of street names, it is useful to speak the street name suffix (e.g., "Colonial Drive" as opposed to simply "Colonial"). Alerting prefaces are thought to be effective in attracting the listener's attention to an impending voice message. Various experimenters have found a reduced response time for prefaced messages, despite the increased length of the message due to the preface (Bucher, Karl, Voorhees, and Werner, 1984; and Simpson and Williams, 1980).
The mechanical–sounding characteristic of synthesized speech may have some advantage over digitized voice in automotive applications. Because the voice does not sound human, it is easily and immediately distinguishable from other voices in the automobile environment. In this way, the perceptual contact makes the voice somewhat self–alerting; it is obvious that the car is speaking. The machine–like voice also reduces the tendency toward anthropomorphism, as mentioned above.
Usefulness of Information
Useful TravTek information was defined as that which enabled the driver to optimize performance of the driving task. Route guidance instructions are useful if they enable the driver to follow a route safely and without error. Traffic information is useful if it enables a driver to avoid congestion, minimize traffic time, and drive safely in the vicinity of unavoidable traffic congestion. There are many open questions as to the appropriate information content of guidance and traffic messages in an ATIS system.
People typically include the names of turn streets in route guidance instructions. Are street names a useful piece of information in a maneuver instruction? The TravTek design team believed that they were, given the difficulty in timing turn messages accurately enough to prevent erroneous turns in areas with closely spaced cross streets. Street names also aid in driver orientation in unfamiliar territory.
At an intersection where two streets cross at right angles, the instruction describing the maneuver is easily formulated: "turn right" or "turn left." Complex intersections entail maneuvers that are more difficult to describe clearly. While this is another justification for the use of street names, it is important to note that street signs are not always easily visible. For this reason, voice guidance, graphical representation of the maneuver intersection, and error recovery strategies for driver mistakes are all important elements of route guidance. Combined into a coherent system, they work effectively to keep the driver on the planned route.
Another aspect of route guidance is that it is useful for drivers to know how far they are from their next maneuver. Individual drivers, however, differ in their ability to reason about distance. Davis (1989) and Streeter et al. (1985) have discussed the potential ambiguity of measures of distance. For instance, "in two blocks" may be ambiguous when the next cross street does not intersect the driven street on both sides; does it delimit a block? Upon hearing an instruction such as "turn left at the third light," will the driver count the light at the intersection he is passing through when the message is spoken? The conclusion is that distance expressed unambiguously (e.g., in fractions of a mile) is more likely to help drivers who can gauge it than it is to confuse drivers who cannot do so.
The WHERE AM I? function was designed primarily to provide assistance when the driver is navigating along a self–planned route. It helps the driver to identify the intended turn street and to orient himself or herself when uncertain of his or her current location and heading. WHERE AM I? can be a useful supplement to route guidance as well, identifying turn streets at intersections where street signs are missing or when visibility is obscured. It also provides an indication of navigation system accuracy without reference to the map display.
An example of a WHERE AM I? message is "Approaching Lee Road. Headed North on Orlando Avenue." At first glance, it may seem counter–intuitive to answer the question of "Where am I?" by first stating what you are near, then where you are. It was decided to word the message in this way to impart the presumably more urgent piece of information first. Cross–street information is considered to be more urgent, as it may identify the location of the intended maneuver. The WHERE AM I? function is self–interrupting, i.e., it aborts its current message and restarts itself when the button is pressed during WHERE AM I? output. Repeated quick button presses will result in a series of cross–street notifications with the location information clipped. This function enables the driver to use WHERE AM I? to listen to names of cross streets in quick succession while proceeding down the road. The terseness of the WHERE AM I? message further serves this purpose (Means et al., 1992).
Research Issues in Traffic Advisories
There is not a good understanding of which pieces of information about a traffic problem are necessary and useful to drivers. Specifically, it is not clear whether knowing about lane closures or the cause of a congestion problem will cause drivers to modify their behavior. Perhaps it is beneficial to suppress clarification of a spectacular incident such as a fire, as it could encourage gawkers to travel to the scene. On the other hand, it also alerts drivers to the possibility of emergency vehicles in the area.
There are a variety of ways to express the severity of a congestion problem. Terms such as "heavy traffic," "sluggish traffic," "a 1–minute delay," "bumper to bumper," "stop and go," "slow and go," and "merging delays" are used in traffic reports broadcast by radio stations. Radio traffic reports also occasionally provide an estimate of the length of a congestion queue. More research is needed, however, to determine how drivers use an estimate of a backup queue; whether it can be reliably estimated; and how best to describe it to drivers (in miles, or number of traffic signals, or from street X to street Y.) The word "congestion" itself may be ambiguous. Do drivers interpret it consistently as slow traffic, or can congestion also refer to a heavy volume of traffic moving at the posted speed?
The location of a traffic problem can also be expressed in various ways. Can the relevance of the problem be assessed more easily by the driver if its location is described relative to the vehicle or in absolute terms?
Onboard computer–generated traffic advisories can provide information on demand that is filtered for relevance to a given vehicle location/route. Some issues that arise in relevance filtering include the criteria that are applied to determine relevance; the upper limit on the amount of information that should constitute an on–demand traffic report; and the possibility of giving drivers the ability to tailor traffic reports to their own needs and interests (Means et al., 1992).
The TravTek Approach to Traffic Reporting
TravTek traffic advisories report lane closures and the cause of an incident when known. Congestion is characterized as "heavy" or "moderate," depending on the degree to which travel time on the affected road varies from free–flow travel time (Dudeck and Huchingson, 1986). The locational description of a traffic problem differs according to whether the vehicle is on the planned route and whether the traffic problem is an incident or a non–incident–related congestion problem. The location of an incident (e.g., an accident, disabled vehicle, malfunctioning signal, construction, etc.) can be pinpointed, whereas congestion is so volatile that we cannot delimit it precisely and reliably. When the vehicle is on a planned route, only problems that are ahead of the vehicle on the route are reported. If an on–route problem is an incident, its location is described as "[distance] ahead on [street name]," where distance is expressed in miles. A non–incident congestion problem on the route is specified as "ahead on [street X] between [street Y] and [street Z]." The visual display indicates incidents and congestion with icons placed on the local area map, thus clarifying the location of problems reported by the voice traffic report.
The collection, dissemination, and in–vehicle use of traffic data is a process whose design is subject to many interdependencies. The TravTek solution to on–vehicle presentation of traffic data was largely driven by outside constraints imposed by the organization of the TravTek Traffic Management Center (TMC). The design of the TMC itself is constrained by local availability of information sources. In future ATIS systems, it would be preferable to base the information content of traffic advisories on solid research on the usefulness of the information, constrained by the general feasibility of data collection in most large urban areas.
Driver Control of Voice Features
It is essential that drivers be allowed to select functions for which they receive voice output, control the volume of the voice, and suppress all voice messages. In the TravTek system, when a voice message is about to be spoken, the radio is muted for the duration of the voice output. Activation of the radio volume button during voice output adjusts the volume of voice messages; to adjust radio volume, the driver uses the volume control during radio output. This enables differing volume levels for radio broadcasts and for TravTek voice functions (Means et al., 1992).
In initial testing, it was found that voice messages for TravTek functions were generally welcome when the driver was not listening to the radio or conversing with a passenger. Because the voice synthesizer suppresses radio output and tends to interrupt conversations, drivers may occasionally want to turn off some or all voice functions. Separate controls for voice guidance, traffic reports, and WHERE AM I? function allow the driver to reduce the amount of voice selectively, as opposed to having only a single voice on/off control.
While a driver's attention is occupied by the driving task and competing thought processes, he may not immediately attend to a voice message that is issued automatically. Because the auditory display is inherently ephemeral, the REPEAT VOICE function provides a necessary mechanism to recapture information that may not be initially comprehended.
Non–Verbal Auditory Signals
Considerable research has been done in the use of non–verbal auditory warnings in aircraft cockpits (see Patterson, 1982, for a comprehensive discussion). Some of the knowledge that has accrued from aircraft research may pertain to passenger vehicles (e.g., appropriate volumes and temporal characteristics for auditory tones). However, principles guiding the use of auditory systems in aircraft must not be applied indiscriminately to passenger vehicles. It is important to bear in mind the essential differences between highly trained cockpit personnel and automobile drivers who vary greatly in age, driving ability, physical condition, etc.
When ATIS systems become so commonplace as to be available to untrained drivers, the meaning of auditory signals must be easily learned and retrained, with minimal potential for confusion. In the TravTek system, three nonverbal auditory signals were used. Two were tied directly to driver actions: a feedback signal for touchscreen key presses, and an error tone for inappropriate steering wheel button presses (for instance, pressing ROUTE GUIDE when no destination has been entered).
The third signal must be taught. When the VOICE GUIDE function is turned off, a glance–at–the–screen tone is sounded to prompt the user to look at the visual display when new information is presentedðCwhen the next maneuver is first depicted, or when the driver must be informed that the car has left its route. The glance–at–the–screen signal was selected to be soft and unobtrusive; with the goals of not startling drivers or disrupting conversations.