Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations
|This report is an archived publication and may contain dated technical, contact, and link information|
|Publication Number: FHWA-HRT-13-018 Date: April 2013|
Publication Number: FHWA-HRT-13-018
Date: April 2013
This chapter describes the overall design of the experiment, the participants, the materials, and the procedures used in the study. It covers both instrument measurements and perceptual judgments.
The stimuli for the present study consisted of 120 test samples. Of these samples, 24 consisted of a standard of diffuse reflectance with a reflective index of 98 percent paired with one of a variety of color and neutral density filters. Measurements of these samples established a reference condition for color measurement. The other 96 samples consisted of 4 different types of white retroreflective sign material covered with color and neutral density filters selected to match the chromaticity and luminance factor values obtained for the diffuse samples. The four different sheeting materials tested were ASTM types III, VIII, IX, and proposed type XI.(11) Of the 13 colors specified for use in traffic control signs, the 6 colors tested were: red, green, blue, yellow, orange, and white. There were four variants for each of the six colors, with each variant approximating one of the four corners of the color area that defines that color in CFR Title 23.(3)
The present experiment investigated daylight chromaticity and luminance properties of sample sign materials by means of measurements taken with spectroradiometric instruments both in the laboratory and in the field. The experiment compared these instrument measurements with the perceptual judgments of color and brightness given by human observers in the field.
Measurements were taken using two different spectroradiometers and a spectrocolorimeter. Laboratory measurements were made with a PR-715 SpectraScan® spectroradiometer and a Hunter LabScan® XE spectrocolorimeter. Field measurements were made with a PR-650 SpectraScan® spectroradiometer.
The purpose of using two different laboratory measurement procedures was to compare the values obtained using a spectroradiometer and a bench-top spectrocolorimeter. The values obtained by the PR-715 spectroradiometer were then compared to the field measurements made using the PR-650 spectroradiometer. This chain of instrument measurements was intended to provide a basis for correlating the observational conditions to standard laboratory conditions under which measurements are made for quality assurance. It was also important to gain insight into differences in laboratory measurements used to determine compliance with standards and perceptual judgments of color, apparent saturation, and brightness.
Laboratory measurements of the 120 color samples and the standard diffuse white reflector were taken with the PR-715 at the National Institute of Standards and Technology on an optical bench using procedures specified by ASTM E1349.(12) Measurements were made for each sample using a 0.5-degree-high by 1.5-degree-wide rectangular aperture. A 0-degree/45-degree geometry was employed, where the illuminating source was at 0 degrees (normal incidence) and the measurement instrument was at 45 degrees, relative to the surface of the sample being tested. The PR-715 was placed approximately 7 ft (2 m) from the sample such that an area of 2.92 by 0.689 inches (74.1 by 17.5 mm) was measured. The illuminating source was a 1,000-W FEL CC8 tungsten-halogen lamp operated at approximately 3,200 K and placed 7 ft (2 m) from the sample. The illuminant was corrected by calculation to match illuminant D65. A single measurement of the diffuse reflector and each color sample was made. The spectroradiometric values obtained were used to calculate tristimulus values (X, Y, Z) for each sample, which were converted to chromaticity coordinates (x, y) and a luminance factor (Y). The samples were also measured in the laboratory at the FHWA research site with a Hunter LabScan® XE. The LabScan® XE uses annular illumination at 45 degrees with measurements at 0 degrees. Eight measurements were made with the LabScan® XE and averaged. The sample was removed and replaced on the measurement port before each measurement. This procedure was intended to account for the smaller area sampled by the LabScan® XE compared to the PR-715.
Field physical measurements were initially made at the FHWA research site during the color appearance trials with human observers in fall of 2007. These measurements were taken at a distance of 32.5 ft (9.9 m) on the same horizontal plane as the stimuli. The instrument’s line of sight was approximately 5 degrees from the observers’ line of sight. This technique was used to approximate the participants’ viewing angle (0 degrees) without interfering with their observations. The measurements were made using a 1-degree spot aperture under diffuse illumination produced by the daytime sky. Because of the fast pace of stimulus presentation, the researchers were not able to measure every sample presentation. Photometric and colorimetric measurement was taken every 10th trial of the color appearance judgments. Since the stimuli were presented to the participants in different random orders for each session, this sampling procedure resulted in an uneven number of measurements for the 120 color samples. Chromaticity (x, y), luminance (Y), and CCT determinations were made for each sample. In addition to the physical measurements made in conjunction with the color appearance judgments, physical measurements of the standard diffuse white reflector with no color filters were made before and after each block of 60 color appearance trials. These additional measurements were used to track lighting changes due to varying sky conditions (passing clouds, high overcast, etc.) within experimental sessions and across sessions conducted on different days.
Since the field luminance measurements taken with the PR-650 were sampled only every 10th perceptual trial, the sample sizes were small and unequal. To obtain a more statistically reliable sample, the field chromaticity and luminance measurements were reproduced outdoors 1 year later, in fall of 2008, with the same instrument (PR-650). The 120 color samples were measured seven times at a 0-degree angle of regard. This 0-degree angle for the corroborate measurements represented true normal orientation but was not substantially different from the approximately 5-degree angle used in fall of 2007.
The CIE 1931 x, y chromaticity coordinates for both the laboratory and field physical measurements were converted into the CIELAB color space for comparison with UADs from the human observers (see figure 3 and figure 4).(4) The CIELAB color space is based on an opponent color concept, which is similar to the opponent color processes of human vision.(10) The CIELAB color space coordinates were rotated 90 degrees clockwise to more closely correspond with the layout of the perceptual UADs. Such a rotation resulted in a reversal of the positive and negative vertical axis. Since the emphasis in the present experiment was on perceptual measurements, the customary UAD layout was employed as the reference.
|Figure 3. Graph. CIE 1931 x, y chromaticity diagram.|
|Figure 4. Illustration. CIELAB color space.|
The perceptual experiment examined the responses of participants to colored retroreflective sign materials under daytime lighting conditions using two methodologies. A rating scale technique was used to determine the hue, apparent saturation, and brightness responses for individual color samples. A ranking technique was used to determine relative brightness responses for a subset of color samples.
Seventeen people were recruited from the Washington, DC, metropolitan area. All participants were adults (at least 18 years of age), were licensed drivers, had visual acuity (corrected or uncorrected) of at least 20/40 in their best eye, and had normal color vision. Table 1 shows the median and mean age by gender and age category. There were nine men (six younger, three older) and eight women (five younger, three older). Participants were assigned to one of three groups, which produced two groups of six and one group of five. Each group was assigned a day of the week (Tuesday, Wednesday, or Thursday) for coming to the research facility each week for the duration of the experiment. The experiment and its procedures were approved by an institutional review board. The participants received an honorarium for their services.
Table 1. Participant characteristics by age and gender category.
|Category||Number||Mean (Years)||Median (Years)||Age Range|
|Younger (18–64 years)||11||22||23||19–25|
|Older (65+ years)||6||71||68||66–83|
The research was conducted outdoors on the grounds of the FHWA Turner-Fairbank Highway Research Center (TFHRC) in McLean, VA. The experiment was conducted during September, October, and early November 2007 in daylight, with the middle of each 4-h session occurring when the sun was at the meridian. The sessions were conducted only on clear, partly cloudy, or high overcast days in relatively bright sunlight conditions. Testing was conducted in an area that was away from distractions such as vehicular and pedestrian traffic. Figure 5 depicts the testing environment and relative location of the equipment used in the experiment.
|Figure 5. Illustration. Plan view of the experimental setup (not to scale).|
Participants observed the stimuli and then marked their judgments on preprinted response sheets mounted on clipboards. The timing of judgment trials was accomplished by prerecorded verbal commands. An audio file was played aloud through the speakers to cue the participants when to respond to the individual color samples during the rating task. The audio cue was used to keep an observation time pace of 10 s per stimulus.
Additional equipment included tripods and a laptop computer. Tripods were used to hold the color samples for the individual rating task, the subset of color samples for the ranking task, and the portable PR-650. The laptop computer was used to record data from the PR-650.
The generally accepted daylight legibility distance for a standard 30-by-30-inch (76-by-76-cm) STOP sign with 10-inch (25-cm)-high lettering is 400 ft (122 m).(13) The visual stimulus samples of color and sheeting combinations were 7.5 inches (19.1 cm) square, quarter scale relative to a standard STOP sign. Therefore, participants viewed the samples from a quarter-scaled distance of 100 ft (30.5 m). This setup is not to imply that color recognition of traffic signs is limited to the legibility distance but, rather, reflects the distance at which a driver may acquire information regarding a particular sign. The color and shape of a traffic sign should be unambiguous at the legibility distance to ensure that all informational aspects of a sign reinforce each other.
The color samples were mounted at a height of 5 ft (1.5 m), the minimum mounting height of a rural STOP sign.(13) The samples were tilted slightly forward to minimize specular reflections, as is typically done in highway traffic sign installations. Figure 6 shows a standard red STOP sign with mounting characteristics similar to those replicated in the experiment. Note the green foliage background exhibited in this view. The colored samples in the experiment were viewed on a background of natural grass, bushes, and trees with a fence in the far distance.
|Figure 6. Photo. Standard retroreflective STOP sign in a typical application.|
The experiment was conducted outdoors at the edge of a roadway on the TFHRC grounds. The roadway was blocked by traffic barriers during testing. Participants sat in chairs in a single row under two shade tents and observed the color samples from a different seating position for each half-day block of trials. The seating positions were assigned randomly at the beginning of each testing session. Individual color samples were mounted on a tripod located at the side of the roadway, 100 ft (30.5 m) from the row of seated participants. A third shade tent, located to the side of the tripod containing the sample, was used to cover the table containing the racks of color samples. One experimenter stood near this third tent to change the color samples on the tripod between trials. The second tripod holding the PR-650 was located slightly to the left of the participants’ line of sight, 32.5 ft (9.91 m) from the color sample tripod. The audio loudspeaker system was located on a small table between the color sample tripod and the row of participant chairs. A second experimenter sat in a chair behind the participants to monitor the laptop data collection from the PR-650. Figure 7 shows the outdoor experiment setup.
|Figure 7. Photo. Outdoor experiment setup on the grounds of TFHRC.|
Each group participated in four 4-h sessions over a period of four or five weeks. The first session was somewhat longer to allow time for paperwork and training. Examples of the training materials used to acquaint the participants with the required responses are in appendix A.
Before data collection began, participants read and signed an informed consent form and were administered a visual acuity and color vision test. Participants’ visual acuity was assessed with a standard wall-mounted Snellen chart. Color vision was assessed with an Ishihara color deficiency test book. Participants were required to have at least 20/40 vision, uncorrected or corrected, in at least one eye and to have normal color vision.
Following the vision testing and initial administrative paperwork, participants were taken to the outdoor testing facility to begin training. The primary training took place at the beginning of the first day for each participant group and was scheduled to take 30–45 min. Participants were given detailed instructions regarding the four tasks that they were to perform during the study: hue scaling, apparent saturation scaling, brightness scaling, and brightness ranking. The formal verbal instructions were read aloud, and additional training consisted of an explanation of the concepts of hue, apparent saturation, and brightness along with diagrams and charts. A practice exercise was administered, and ample time was allotted for questions throughout the training. The formal verbal instructions as well as the supplemental training materials are in appendix A.
The participants judged each color sample by means of a modified method of hue and apparent saturation scaling.(6) In this method, each participant rated the color of the sample stimulus in terms of the percentages of four unique hues (red, yellow, green, and blue), such that the total percentage added to 100 percent. Participants could assign 100 percent of the hue to one color, or they could use pairs of the basic hues. Participants were restricted from pairing red with green and yellow with blue; otherwise, all other pairs were allowed. Each participant also rated the apparent saturation (colorfulness) of each sample stimulus on a separate 0–100 percent scale. See appendix A for instructions given on the hue and apparent saturation scaling method.
An additional brightness scale, expressed as a percentage ranging from 0 to 100 percent, was added to the aforementioned method. Verbal instructions were given for this perceptual brightness scale (see appendix A). Thus, each presentation of a color stimulus sample received three or four perceptual rating scores: one or two for the percentage of the four basic hues, one for apparent saturation (colorfulness), and one for brightness. These measurements constituted the fundamental color appearance judgments given by the participants to the 120 color stimulus samples. Each day, participants viewed the set of 120 color samples twice, once in the morning and once in the afternoon, in a different random order. Participants viewed the color samples in blocks of 60 stimuli, with a 5-min break between each block.
The participants were also asked to perform a separate perceptual brightness ranking task. Two colors, red and yellow, were used for this task. Five samples with the same color chromaticity point (red or yellow) were presented simultaneously on a tripod. The five samples represented the same color filter with the four different retroreflective sheetings and the diffuse white reflector material. The participants ranked the five samples from dimmest to brightest. These additional perceptual brightness ranking judgments were made following each block of 60 individual trials for the hue, apparent saturation, and brightness scaling tasks. Participants saw the red and yellow sample sets separately twice each day (in different left-to-right positions) for a total of eight times during the course of the study. Table 2 shows the typical daily schedule for participants in the study.
Table 2. Typical daily experimental schedule.
|Hue, Saturation, and Brightness Rating—Block 1||30|
|Brightness Ranking—Set 1||5|
|Hue, Saturation, and Brightness Rating—Block 2||30|
|Brightness Ranking—Set 2||5|
|Hue, Saturation, and Brightness Rating—Block 3||30|
|Brightness Ranking—Set 3||5|
|Hue, Saturation, and Brightness Rating—Block 4||30|
|Brightness Ranking—Set 4||5|
Participants returned to TFHRC three more times over the course of several weeks. Before testing began each day, the experimenters briefly reviewed the scaling and ranking procedures. Following the review, participants repeated the four data collection blocks (two morning, two afternoon) that were conducted on day 1, except the stimuli were presented in a different random order for each block. This procedure continued for days 2–4. After data collection was completed on day 4, participants were debriefed, offered an opportunity to ask questions, and paid for their participation.
The primary task in the perceptual portion of the experiment involved 8 repetitions of hue, apparent saturation, and brightness scaling judgments for 120 color samples, a total of 960 scaling judgments made by each participant. Since there were 17 participants, the entire experiment yielded 16,320 scaling judgments for the primary task. The secondary task in the experiment yielded 272 brightness rankings of one color area coordinate for the yellow and red color samples (136 rankings per color). Each ranking involved 5 brightness levels, making for 1,360 total ranking scores. These brightness ranking determinations supplemented the brightness rating measurements by employing a different method to determine similar color properties.