U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
202-366-4000


Skip to content
Facebook iconYouTube iconTwitter iconFlickr iconLinkedInInstagram

Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations

Report
This report is an archived publication and may contain dated technical, contact, and link information
Publication Number: FHWA-RD-03-065
Date: September 2004

In-Vehicle Display Icons and Other Information Elements: Volume I

PDF Version (8.33 MB)

PDF files can be viewed with the Acrobat® Reader®

CHAPTER 7: EVALUATING IN-VEHICLE ICONS

OVERVIEW OF GENERAL PROCEDURES FOR EVALUATING IN-VEHICLE ICONS

Introduction: Evaluating icons refers to the general process of determining whether an icon, or an integrated set of icons, meets specific criteria in areas such as legibility, recognition, interpretation, and driver preferences. Developing useful and effective icons requires evaluation. A rigorous and iterative evaluation phase in icon design increases the likelihood that the implementation of the icon in the in-vehicle environment will improve driving and system performance and not negatively impact driver safety.

Design Guidelines

Flowchart. Overview of Procedures for Evaluating Vehicle Icons. Click here for more details.

Bar graph. This bar graph indicates that design guidelines were based equally on expert judgment and expert data.

Figure 7.1 Overview of Procedures for Evaluating In-Vehicle Icons

Discussion: General procedures for evaluating icons have been presented in a number of data sources, including references 1, 2, 3, and 4. Importantly, the procedures outlined in this chapter reflect an integrated approach to icon evaluation. That is, each evaluation procedure (Production Test, Appropriateness Ranking Test, Comprehension/ Recognition Test, and Matching Test) addresses different research objectives and represents a key step in an overall process of developing legible, recognizable, and interpretable in-vehicle icons.

Evaluations of individual icons may not require going through the entire evaluation process. For example, if the icon development team has 3-5 strong candidate icons in-hand for a particular message, then the Production Test and the Appropriateness Ranking Test may not be needed. Similarly, if high levels of comprehension are obtained in the Comprehension/Recognition Test, then the Matching Test may not be needed.

Reference 4 provides guidelines for the development and evaluation of hazard and safety symbols. Many of the guidelines contained in reference 4 for the graphic design of symbols are general, and the recommendations provided for specific symbols are not for in-vehicle applications. However, some of the symbol evaluation procedures suggested in reference 4 are similar to those contained here. Some key differences between the guidelines presented here and those presented in reference 4 are: (1) Production Test is not called out as a formal procedure in reference 4; evaluations begin by selecting or gathering "existing symbol alternatives"; (2) An Appropriateness Ranking procedure is also not specified; instead, a comprehension estimation procedure (in which subjects estimate the percentage of the driving population that would understand a candidate icon ) is used to eliminate poor candidates; (3) Criteria for acceptance following an open-ended Comprehension Test is 85 percent (versus the ISO criteria of 66 percent), with a maximum of 5 percent critical confusions; and (4) a Matching Test is not discussed.

While a subset of these evaluation procedures (as well as the alternate procedures described on pages 7-4 through 7-13) may be used, the IVIS developer should be aware of the limitations inherent in such an approach. That is, key issues associated with the effectiveness of a given icon may not be addressed without a complete, integrated approach to in-vehicle icon evaluation.

Design Issues: All evaluations of in-vehicle icons should be performed using test subjects who are representative of the driving public. Key demographic variables include age and gender. Age effects, in particular, can be expected for icon evaluations. Thus, individual evaluations should use a mix of younger and older test subjects. Also, representative non-English speakers should be included in testing.

Cross References: General Development Process for In-Vehicle Icons, p. 2-2; Production Test, p. 7-4; Appropriateness Ranking Test, p. 7-6; Comprehension/Recognition Test, p. 7-8; Matching Test, p. 7-10; Additional Symbol Evaluation Approaches, p. 7-12

References:

  1. Zwaga, H., and Easterby, R. S. (1984). Developing effective symbols for public information. In R. Easterby and H. Zwaga (Eds.), Information design: The design and evaluation of signs and printed material (pp. 277-297). New York: J. Wiley & Sons.

  2. International Organization for Standardization (ISO)/DIS 9186. (1988). Procedures for the development and testing of public information symbols. Geneva, Switzerland: ISO.

  3. Collins, B. L. (1982). The development and evaluation of effective symbol signs. Washington, DC: U.S. Department of Commerce (NBS Building Science Series, 141).

  4. American National Standards Institute. (1998). American national standard criteria for safety symbols, ANSI Z535.3. Washington, DC: NEMA.

PRODUCTION TEST

Introduction: The production test refers to an icon evaluation approach in which a broad range of candidate symbols for a concept or referent (i.e., in-vehicle message) are generated. It is used when no symbols for a given message exist (reference 1). In this test, subjects are asked to draw symbols that they think represent a particular message. The output of the production test is a number of graphic or symbolic representations of a message that are considered effective and comprehensible by individual subjects. The production test will not result in a final icon selection. It is used to generate candidate symbols/icons only.

Design Guidelines
Flowchart and description of production text. Click here for more detail.
Bar graph. This bar graph indicates that design guidelines were based equally on expert judgment and expert data.

Figure 7-2. Production Test

Discussion: Empirical testing of candidate icons requires a variety of candidate symbols to present to subjects. The production test has been identified by reference 1 as a key step in icon development and an important means for generating a wide range of images for subsequent testing.

Reference 2 used a production test to generate ideas for symbols for common in-vehicle systems (coolant, fuel, air, oil, transmission, hydraulic, and brake), and specific conditions associated with the system (fluid level, temperature, pressure, and filter). This process generated a wide variety of candidate symbols. Importantly, variation across the symbols reflected characteristics of the subjects themselves. Some were serious, well thought out, and detailed; others were humorous and less thoughtful. Many reflected the type of work performed by the subjects, such as the mechanistic and function-oriented drawings made by subjects who were engineers.

Design Issues: The production test is not, by itself, a sufficient means to validate icons or symbols. The overall goal of the production test is to create a number of different candidate symbols as input for the more systematic evaluation approaches such as the comprehension/recognition test. In addition, the production test relies on the participants' ability to conceptualize the referent and generate an icon that includes the attributes of the referent needed for a comprehensible icon. With complex or novel concepts, this may not be an efficient and effective process for icon development.

An alternative is to use knowledge elicitation and concept mapping techniques to identify the elements of a comprehensible icon-that is, to conduct structured focus groups or one-on-one interviews with designers to elicit ideas about candidate icons. The objective of concept mapping, as applied to icon development, is to identify attributes of the referent that uniquely specify it and are commonly associated with it. Several structured processes exist to support this activity (see references 3, 4, and 5). In general, these processes begin by identifying concepts associated with a particular activity or system (e.g., in-vehicle routing and navigation). Once general concepts are identified, participants are queried to define distinguishing attributes and relationships. An example question might be: "How would you describe this item?" Once attributes and their relationships have been defined, a series of queries is used to refine them. Example questions might be: "Are there characteristics of this item that are not included in the list?" or "What are the most relevant characteristics in identifying this item?" Attributes identified by several users can be combined to define the features required to enhance icon comprehension. This process can be performed by manually sorting and combining the participants' responses, or by using sophisticated statistical techniques, such as factor and cluster analysis (see references 6 and 7). All evaluations of in-vehicle icons should be performed using test subjects who are representative of the driving public. Key demographic variables include age and gender. Age effects, in particular, can be expected for icon evaluations. Thus, individual evaluations should use a mix of younger and older test subjects.

Cross References: Overview of General Procedures for Evaluating In-Vehicle Icons, p. 7-2; Appropriateness Ranking Test, p. 7-6;
Comprehension/Recognition Test, p. 7-8; Matching Test, p. 7-10; Additional Symbol Evaluation Approaches, p. 7-12

References:

  1. Zwaga, H., and Easterby, R. S. (1984). Developing effective symbols for public information. In R. Easterby and H. Zwaga (Eds.), Information design: The design and evaluation of signs and printed material (pp. 277-297). New York: J. Wiley & Sons.

  2. Green, P. (1981). Displays for automotive instrument panels: Production and rating of symbols. HSRI Research Review, July-August, 1-12.

  3. Hart, A. (1986). Knowledge acquisition for expert systems. New York: McGraw-Hill.

  4. Joiner, C. (1998). Concept mapping in marketing: A research tool for uncovering consumers' knowledge structure associations. Advances in Consumer Research, Vol. XXV, 311-317.

  5. McGraw, K., and Harbison, K. (1997). User-centered requirements: The scenario-based engineering process. Mahwah, NJ: Lawrence Erlbaum.

  6. Shaw, M. (1981). Recent advances in personal construct technology. New York: Academic Press.

  7. Wilson, J. R., and Corlett, E. N. (1990). Evaluation of human work. New York: Taylor and Francis.

APPROPRIATENESS RANKING TEST

Introduction: The purpose of the appropriateness ranking test is to screen the candidate symbols generated during the production test and select the best for further testing. Essentially, subjects are asked to rank order a set of candidate symbols for a message with respect to their relative appropriateness. Once these ranking data have been gathered, the three candidate symbols with the highest ranking are typically selected for further testing.

Design Guidelines
  Flowchart for Figure 7-3. Click here for more details.
 
  • For each message being considered, produce a set of cards.
  • Each card should show one of the candidate icons developed during the production test or identified using other means.
 
 
 
 
 
  • For each subject,randomize the set of cards containing the candidate icons prior to testing.
  • If multiple messages are being tested (more than one set of cards) the order in which subjects rank order each set of cards should be counterbalanced across subjects.
 
 
 
  • Ask representative subjects to sort the cards within each card set according to the degree to which each candidate icon is perceived to be an appropriate one for the message under consideration (i.e., rank order the cards within the set according to their aappropriateness).
  • The cards can be sorted by placing the card with the most appropriate icon first in the set and placing the least appropriate icon last in the set.
  • Multiple card sets can be used to test a group of subjects at the same time.
 
 
 
 
  • Using an approach such as Torgenson's Categorical Scaling Procedure (reference 3), calculate scale values for each of the candidate icons.
  • Tutorial 1 provides a step-by-step description of the calculations required to calculate scale values from rank oders.
  • This provides not only a general rank order, but also some indication as to how much the candidate symbols differ along an interval scale.
 
 
  • Using the interval scale values, select approximately three candidates for each message for further study.
  • Reference 1 provides the following criteria for selecting these candidates, "When differences between scale values are minimal, symbol variants with different image content are chosen, ignoring graphic detail." Thus, the "top three" candidates are not necessarily the ones chosen for further study.
  • Other selection issues include the relative differences between the scale values and the uniqueness across the candidates. Thus, in a close field of candidates (i.e., smaller differences in scale values), those candidates with acceptable appropriateness ranks and some heterogeneity, in terms of icon/symbol features, might be selected.
  • The appropriateness ranking test will not result in a final icon selection. It is used to provide a preliminary evaluation of candidate symbols/icons only.
 
 
 
Bar graph. This bar graph indicates that design guidelines were based equally on expert judgment and expert data.

Figure 7-3. Appropriateness Ranking Test

Discussion: A preliminary screening of candidate icons for a message is necessary to make further testing of the icons feasible and cost-effective (see also reference 1). The appropriateness ranking test provides an objective, quick, low-cost approach to the task of reducing what can be a large number of candidate icons to a more manageable number of high-potential candidates.

The appropriateness ranking test has been successfully used in previous symbol development efforts. For example, in reference 2, six messages were tested using the procedure described above, with between eight and 35 candidate symbols being rank ordered for each message. In this study, the results of the appropriateness ranking test allowed the researchers to reduce, in a systematic manner, the number of these candidates to about three per message.

The advantage of converting rank order data to an interval scale of perceived appropriateness is that rank orders alone do not indicate the relative differences among judged stimuli. Thus, mean ranks suggest that differences in perceived appropriateness between, for example, stimuli 1 and 2 are the same as differences between stimuli 2 and 3. The Categorical Scaling Procedure provides the interval data necessary to make informed decisions regarding the true relative appropriateness of candidate icons.

Design Issues: When developing candidate icons for the test, consider that driver perception and performance will vary as a function of the medium used to present the test stimuli. That is, display parameters such as resolution, production of color, and luminance can affect responses. In general, the color, brightness, resolution, and size of test icons should be as close as possible to how the icons will be viewed by drivers in the in-vehicle environment.

As noted in reference 4, the appropriateness ranking test should not be used to make final selections of icons. Considerable experimental data suggest that more detailed, concrete icons are consistently judged to be more appropriate and given higher ranks than more abstract icons. However, highly detailed icons may lead to increased response times, are more easily confused with other icons, and are not always consistent with the need to provide simple visual information through in-vehicle displays. Thus, the appropriateness ranking test helps to identify candidate icons based on image content only, and cannot address more complex issues such as the comprehensibility of icons in an operational environment.

Often, subjects will be unable to distinguish the appropriateness of a given icon from another. Subjects should be instructed that the same ranking (or a tie) can be given to more than one candidate icon. All evaluations of in-vehicle icons should be performed using test subjects who are representative of the driving public. Key demographic variables include age and gender. Age effects, in particular, can be expected for icon evaluations. Thus, individual evaluations should use a mix of younger and older test subjects.

Cross References: Overview of General Procedures for Evaluating In-Vehicle Icons, p. 7-2; Production Test, p. 7-4; Comprehension/Recognition Test, p. 7-8; Matching Test, p. 7-10; Additional Symbol Evaluation Approaches, p. 7-12; Tutorials: Analysis of Rank Order Data, p. 9-1

References:

  1. Zwaga, H., and Easterby, R. S. (1984). Developing effective symbols for public information. In R. Easterby & H. Zwaga (Eds.), Information design: The design and evaluation of signs and printed material (pp. 277-297). New York: J. Wiley & Sons.
  2. Easterby, R. S., and Zwaga, H. (1976). Evaluation of public information symbols, ISO tests: 1975 series (Report AP 60). Birmingham, UK: Applied Psychology Department, University of Aston at Birmingham.
  3. Torgerson, W. S. (1965). Theory and methods of scaling. New York: J. Wiley & Sons.
  4. Hakiel, S. R. (1991). Evaluating icons for human-computer interfaces (Report No. HF 144). Hursley Park, Winchester, UK: IBM UK Laboratories Ltd.

COMPREHENSION/RECOGNITION TEST

Introduction: The comprehension/recognition test refers to an evaluation technique that provides a means to determine which of a number of candidate icons/symbols for a concept are best understood by a sample of subjects who represent the user population. During this test, an icon/symbol is presented to subjects, the context of the icon/symbol is specified (i.e., where they might expect to see the icon, according to reference 1), and subjects are asked to name the object, location, or activity associated with the icon/symbol.

Design Guidelines
  Flowchart for Figure 7-4. Click here for more detail.
  • Place candidate symbols on separate sheets of paper, slides, or computer screens, depending on the presentation method planned for the study.
  • Randomize presentation order across subjects.
  • Separate different candidates for the same message into distinct test sets.
  • Prepare and provide to subjects an example sheet with a common icon (like a fuel pump to indicate a fuel gage) and its meaning written beneath the graphic.
 
 
  • Indicate the context in which the icon will be used, either verbally or co-located with the icon.
  • Subjects are to write down the action, condition, activity, location, etc. associated with the icon (e.g., "What do you think this icon means?").
 
 
 
  • Present test subjects with candidate icons/symbols and ask them to write down the action, condition, activity, location, etc. that they believe is represented by the icon/symbol.
 
 
 
  • Have a panel of judges independently categorize responses along a scale according to well-defined criteria that identify the likelihood that an individual response indicates correct comprehension of the icon. That is, the perceived meaning should be compared to the intended meaning. For example, use an 8-category rating scheme: 1 = response matches indended meaning exactly; 2 = response captures key informational elements of intended meaning, but misses one or more minor elements; 3 = response captures some aspects of intended meaning, but misses one or more key elements; 4 = reponse does not match indended meaning but captures one or more key informational elements; 5 = response does not match indended meaning, but is somewhat relevant; 6 = subject's response is in no way relevant to thie intended meaning; 7 = subject indicates no understanding of this icon; 8 = no answer.
  • For each candidate, convert thetortal number of responses in each category into percentages.
 
 
 
 
  • Decisions regarding minimum percent correct rates for individual icons should reflect designer's needs, as well as the consequences associated with selecting a cutoff that is too high or too low. Reference 2 notes ISO requirements for an acceptable symbol have, in the past, been a (minimum) 67 percent correct comprehension level (i.e., combined categories 1 and 2 from above).
Bar graph. This bar graph indicates that design guidelines were based equally on expert judgment and expert data.

Figure 7-4. Comprehension/Recognition Test

Discussion: Reference 1 has been developed as a standard procedure for developing and testing public information symbols. This standard provides a highly detailed set of instructions for testing symbols. Some computational procedures in reference 1, however, are unnecessarily complex and the guidelines presented on the preceding page represent a summary of the procedures listed in references 1 and 3.

Design Issues: When developing candidate icons for the test, consider how driver perception and performance will vary as a function of the medium used to present the test stimuli. That is, display parameters such as resolution, production of color, and luminance can affect responses. In general, the color, brightness, resolution, and size of test icons should be as close as possible to how the icons will be viewed by drivers in the in-vehicle environment.

All evaluations of in-vehicle icons should be performed using test subjects that are representative of the driving public. Key demographic variables include age and gender. Previous studies have indicated that significant differences exist between younger people and older people in their ability to comprehend symbols (references 4 and 5). Therefore, subjects should be representative of the user population (e.g., half between the ages of 18 and 40 and the other half over 55 years).

In real-world driving, icons are presented in the context of certain in-vehicle capabilities and driving circumstances. As such, evaluations of in-vehicle icons should include a description of the context in which icons will be presented and used. However, icon evaluations should avoid providing either too little or too much context to experimental subjects. If too little context is provided, unrealistically low comprehension scores may result from subjects' being unable to connect a visual icon with its many possible meanings. Too much context may yield unrealistically high comprehension scores because the subjects have been cued for a certain response by the specificity of the context. Both extremes should be avoided. In chapter 9, a tutorial entitled "Providing Subjects with Context During Icon Evaluations" provides both procedures and examples associated with providing appropriate context to experimental subjects.

Candidate icons should be tested individually, as the focus is on testing absolute comprehension/recognition for individual icons. The goal is not to test confusability across icons (as it is in the matching test).

Cross References: Overview of General Procedures for Evaluating In-Vehicle Icons, p. 7-2; Production Test, p. 7-4; Appropriateness Ranking Test, p. 7-6; Matching Test, p. 7-10; Additional Symbol Evaluation Approaches, p. 7-12; Tutorials: Providing Subjects with Context During Icon Evaluations, p. 9-11

References:

  1. ISO/DIS 9186. (1988). Procedures for the development and testing of public information symbols. Geneva, Switzerland: ISO.
  2. Wolff, J. S., and Wogalter, M. S. (1998). Comprehension of pictorial symbols: Effect of context and test method. Human Factors, 40(2), 173-186.
  3. Zwaga, H., and Easterby, R. S. (1984). Developing effective symbols for public information. In R. Easterby and H. Zwaga (Eds.), Information design: The design and evaluation of signs and printed material (pp. 277-297). New York: J. Wiley   Sons.
  4. Dewar, R. E., Kline, D. W., and Swanson, A. H. (1994). Age differences in comprehension of traffic sign symbols. Transportation Research Record 1456, 1-10.
  5. Saunby, C. S., Farber, E. I., and DeMello, J. (1988). Driver understanding and recognition of automotive ISO symbols. SAE Technical Paper Series (No. 880056). Warrendale, PA: Society of Automotive Engineers.

MATCHING TEST

Introduction: After the best or most appropriate design for a symbol has been determined, it is important to examine how well that symbol will work within a set and whether the many symbols within the set, can be discriminated from one another without confusion. To do this, a Matching Test is performed. Subjects are shown a sheet with all of the symbols from a set on it, arranged in a matrix, and told the context under which they would use these symbols. Next, subjects are given a referent name and asked to indicate on the matrix which one of the symbols stands for that particular referent. The outcome of the matching test is two measures of symbol effectiveness: the number of correct choices of a particular symbol, and the degree of confusion among symbols.

Design Guidelines
Flowchart for Figure 7-5. Click here for more detail.  
  • Based on the results of a comprehension/recognition (or comparable test), select or create a symbol for each message within a message set.
  • Place the complete set of symbols, for a function or group of related functions, in a matrix on a sheet of paper, slide, or computer screen. Between 4-8 symbols should be included in each matrix.
  • The medium selected for information presentation (i.e., paper, slide, or computer screen) should be consistent with the expected in-vehicle conditions with respect to parameters such as resolution, color, luminance, contrast, and size.
  • Repeat for additional function or groups of icons.
  • Randomize presentation order of symbol sets across subjects.
 
 
  1. Tell subjects the context in which the symbol set will be used, being careful that such information does not create any bias.
  2. The subjects' task will be to select the icon that best represents a particular driver message.
 
 
 
 
  • Give the subject the meaning (or driver message) associated with one of the icons in the symbol set and ask them to indicate which of the icons in the matrix represents that meaning.
  • Repeat test procedure with additional symbol sets.
  • To maintain independent choices, individual subjects can only be tested on one icon per symbol set. Thus, while individual test require little time, many subjects are required (reference 1 used up to 400 subjects per symbol).
 
 
  • Calculate the percentage of correct responses for each symbol/message combination.
  • For each candidate, convert thetortal number of responses in each category into percentages.
 
 
 
  • Use the percentage correct for each icon, as well as the confusions across icons, to develop ideas for new icons or for the redesign of easily confusable icons..
 
 
 
Bar graph. This bar graph indicates that design guidelines were based equally on expert judgment and expert data.

Figure 7-5. Matching Test

Discussion: The Matching Test measures the specific association between the content of an icon and an in-vehicle message when the icon is presented to a subject at the same time as other icons within an icon set (see also reference 1). Subjects are only tested on one icon per symbol set to avoid non-independence of their choices. If multiple icons within a symbol set were tested, individual choices would be dependent on previous choices, thus confounding the results.

Data from the Matching Test can be represented in two ways. First, indicate the number of correct choices for a particular symbol by calculating the percentage of correct responses for each symbol/message combination. Second, construct a table with icons in the set as columns and messages as rows. Cell entries can show overall percentages associated with the subjects responses. Thus, incorrect as well as correct responses are depicted in the table (also called a confusion matrix).

If the scores from the Matching Test are acceptable to the design team, the testing may be complete. However, if some scores are too low, additional icon development and evaluation may be needed.

Design Issues: When developing candidate icons for the test, consider that driver perception and performance will vary as a function of the medium used to present the test stimuli. That is, display parameters such as resolution, production of color, and luminance can affect responses. In general, the color, brightness, resolution, and size of test icons should be as close as possible to how the icons will be viewed by drivers in the in-vehicle environment.

Importantly, the Matching Test does not measure any absolute trait of individual symbols, nor measure absolute comprehension or recognition associated with a candidate icon. All measures relate to subjects' ability to match an icon with a message within the context of other related icons. In this regard, icon developers often wish to develop families or groups of icons that share some common purpose or meaning (e.g., collision avoidance). Groups or families of icons typically share some common design element such as color, border, size, or graphic style. The Matching Test may provide an ideal method for evaluating subjects' ability to discriminate between related and nonrelated icons.

In real-world driving, icons are presented in the context of certain in-vehicle capabilities and driving circumstances. As such, evaluations of in-vehicle icons should include a description of the context in which they will be presented and used. However, icon evaluations should avoid providing either too little or too much context to experimental subjects. If too little context is provided, unrealistically low comprehension scores may result from subjects' being unable to connect a visual icon with the many possible icon meanings. If too much context is provided, unrealistically high comprehension scores may result because the subjects have been cued for a certain response by the specificity of the context. Both extremes should be avoided. In chapter 9, a tutorial provides both procedures and examples associated with providing appropriate context to experimental subjects.

Cross References:

Overview of General Procedures for Evaluating In-Vehicle Icons, p. 7-2; Production Test, p. 7-4; Appropriateness Ranking Test, p. 7-6; Comprehension/Recognition Test, p. 7-8; Additional Symbol Evaluation Approaches, p. 7-12; Tutorials: Providing Subjects with Context During Icon Evaluations, p. 9-11

References:

1. Zwaga, H., and Easterby, R. S. (1984). Developing effective symbols for public information. In R. Easterby and H. Zwaga (Eds.), Information design: The design and evaluation of signs and printed material (pp. 277-297). New York: J. Wiley & Sons.

ADDITIONAL SYMBOL EVALUATION APPROACHES

Introduction: In addition to the evaluation approaches described in the previous guidelines in this chapter, a number of additional approaches have been suggested and successfully used to evaluate the effectiveness of icons.

Design Guidelines
  • The evaluation technique selected to assess the effectiveness of candidate icons should be consistent with the goals and constraints of a particular system development effort. The table below summarizes a range of alternative evaluation approaches, gives the advantages and disadvantages associated with each, and provides references for designers to go to for more detailed information.
Bar graph. This bar graph indicates that design guidelines were based primarily on exprimental data.

Table 7-1. Summary of Additional Symbol Evaluation Approaches

Evaluation

Technique

Description Advantage/Disadvantage References
Rating task Subjects are asked to determine the degree to which a symbol suggests or communicates its designated name. Reference 1 found that results of the rating task were well correlated with the results of the reaction time task (see below); however, ratings are easier to obtain and more statistically efficient. Reference 1
Reaction time task (speed of comprehension) Subjects are given a referent, then shown a slide of one of the symbols and asked to indicate whether the referent and the symbol are the “same.” The amount of time taken to make the response is recorded as reaction time. Reaction time in a discrimination task is influenced by individual differences and decreases markedly with learning (reference 1). Reference 1Reference 2
Identification time task Subjects are shown slides of traffic signs (in both text and symbol format) and are asked to identify verbally the message that is being presented. Their response time is recorded as identification time. Relatively inexpensive and easy means for obtaining information about the adequacy of symbols

Reference 3

Reference 4

Semantic differential method Subjects rate symbols on 12 different adjective pairs, such as weak-strong, strange-familiar. A factor analysis is then performed to evaluate the results. The factors included: evaluative, potency, activity, and understandability. For all but understandability, it is difficult to determine the design issues that are associated with the factors

Reference 5

Reference 6

Modified semantic differential method This method uses adjective pairs that are more specific and relevant to designers, such as balanced-unbalanced, confusing-clear, etc Reference 5 compared this method with traditional but logistically expensive measures, such as reaction time and glance legibility, and found this technique to provide a simple, inexpensive, and valid measure of comprehension Reference 7

Discussion: In many icon and in-vehicle display development efforts, there is insufficient time or budget available to conduct the sequential, inter-dependent series of evaluations described in the preceding design guidelines. This design guideline, therefore, identifies additional evaluation techniques that have been used and described in the human factors and icon development literature.

Design Issues: The selection of an appropriate evaluation approach should reflect specific empirical objectives, as well as the driver messages, expected driving context, and design constraints associated with individual in-vehicle icons or symbols. For example, if human response times to a particular icon are important for its effectiveness, then evaluation should include dependent measures that will capture this information. In this particular case, the optimal icons will be those that satisfy icon comprehension and discrimination with the shortest response time. The evaluation of other icons or sets of icons may focus more on different types of measures and thus will require a completely different type of evaluation process.

All evaluations of in-vehicle icons should be performed using test subjects who are representative of the driving public. Key demographic variables include age and gender. Age effects, in particular, can be expected for icon evaluations. Thus, individual evaluations should use a mix of younger and older test subjects.

Cross References:

Overview of General Procedures for Evaluating In-Vehicle Icons, p. 7-2; Production Test, p. 7-4; Appropriateness Ranking Test, p. 7-6; Comprehension/Recognition Test, p. 7-8; Matching Test, p. 7-10

References:

  1. Green, P., and Pew, R. W. (1978). Evaluating pictographic symbols: An automotive application. Human Factors, 20(1), 103-114.

  2. Ells, J. G., and Dewar R. E. (1979). Rapid comprehension of verbal and symbolic traffic sign messages. Human Factors, 21(2), 161-168.

  3. Dewar, R. E., and Ells J. G. (1974). Comparison of three methods for evaluating traffic signs. Transportation Research Record, 503, 38-47.

  4. Dewar, R. E., Ells, J. G., and Mundy G. (1976). Reaction time as an index of traffic sign perception. Human Factors, 18(4), 381-392.

  5. Dewar, R. E., and Ells, J. G. (1977). The semantic differential as an index of traffic sign perception and comprehension. Human Factors, 19(2), 183-189.

  6. Caron, J. P, Jamieson, D. G., and Dewar, R. E. (1980). Evaluating pictographs using semantic differential and classification techniques. Ergonomics, 23(2), 137-146.

  7. Vora, P., Helander, M., Swede, H., and Wilson, J. (1991). Developing guidelines for symbol design: A comparison of evaluation methodologies. Interface '91, 6-11

Previous | Table of Contents | Next

Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000
Turner-Fairbank Highway Research Center | 6300 Georgetown Pike | McLean, VA | 22101