Skip to contentUnited States Department of Transportation - Federal Highway Administration FHWA Home
Research Home
This report is an archived publication and may contain dated technical, contact, and link information
Publication Number: FHWA-RD-98-133
Date: October 1998

Accident Models for Two-Lane Rural Roads: Segment and Intersections

3. Data Collection

Limitations on Data Quality

As noted, numerous checks were performed on the data. Examples of such checks were repeated reviews of plans and photologs, comparisons of values of multiple variables for consistency (for example, radius of curvature versus degree of curve), use of computer programs to flag unusually large values of variables, and to confirm that ordering was preserved (beginning milepost comes earlier than end milepost for each curve). However, the accuracy of the data was limited by a number of inherent factors discussed below.

Accident Data

Accident data were obtained from HSIS files.

Segment accidents were required to be "non-intersection" accidents, i.e., accidents that did not occur at intersections and were not intersection related. Intersection accidents were accidents at intersections in the database and all intersection-related accidents occurring within ± 250 feet of an intersection in the database. In the Minnesota data, a variable called "INTERSE" was used in the segment database to exclude accidents with the values "intersection" or "intersection-related" and in the intersection databases to include accidents with precisely these values. In Washington a variable called "LOC_TYPE" was used in the segment database to eliminate all accidents coded as: at intersection and related, intersection related but not at intersection, at intersection but not related, driveway within intersection. Likewise, "LOC_TYPE" was used to retain precisely these accidents when they were within 250 feet of the intersection under study. Accidents occurring on the minor road at an intersection approach were typically coded to the major road at the intersection.

Severities were also recorded for each accident, while accident types (run-off-road, etc.) were recorded for Minnesota. In the case of Washington, accident types were not recorded since the accident file has elaborate subcategories that differ significantly from those of Minnesota. An exception was made in the case of run-off-road accidents. A Washington State variable called "V1EVENT2" in the HSIS file was used to estimate whether an accident was of run-off-road type: If the accident was a single vehicle accident in which the vehicle struck an appurtenance or other object, overturned, ran into a ditch or river or over an embankment (these are categories in the file), it was taken to be a run-off-road accident.

Underreporting of accidents was a matter of some concern. In both States during the time periods under consideration, accidents involving either injuries or property damage of $500 or more had to be reported. In Minnesota the reporting threshold rose to $1,000 as of August 1, 1994. The amount of any underreporting is a matter of speculation (one source in Minnesota thought there might be one minor unreported accident for each reported one because accident-prone drivers wish to avoid both penalties for intoxication and insurance premium increases).

The reliability of the reported accident characteristics depends on the acumen of the reporting officer or official and witnesses as well as on the comparability of variables between the two States.

Traffic Data

The HSIS traffic variables in Table 1, ADT and com_avg, derive from Minnesota and Washington traffic count data.

ADT data for the Minnesota segments appear to have been reliably estimated on a timely basis. Two multi-year data sets, 1985-1987 and 1988-1989, and four annual data sets, 1990, 1991, 1992, and 1993, were available for this study. The traffic data in these sets seem to have been based on measurements and calculations, e.g., interpolation and/or extrapolation both along roads and in time. The HSIS Guidebook dated October 1993 notes that traffic data on major roads are collected on a two-year cycle, and on minor rural roads on a four-year cycle, and that growth factors are applied for the years in which measurements are not made.

According to MNDOT manual counts, including detailed classification of vehicle types, are done at about a thousand sites around the State. In a manual count a person stands at the roadside and counts and classifies every vehicle that passes over a 16-hour period (from 6 AM to 10 PM on a weekday). One hundred of the sites, the major ones, are counted every 2 years; and another 900 every 6 years. Every 2 years estimates are produced of ADT and commercial ADT throughout the State. Count locations do not exist on every segment but are averaged from those of adjacent segments along relatively homogeneous roads. A count might be done once in, say, 6 miles in some places.

The vehicle types that are summarized under the variable com_avg in Table 1 are heavy vehicles, defined as those with two or more axles and six or more tires. On roads with low traffic, about 25% of the heavy vehicle traffic consists of five-axle semis, usually with 18 wheels; on roads with high traffic about 75% is five-axle semis. A twin trailer (cab + tractor + trailer + another trailer) with perhaps five or six axles, along with most three-axled trucks without tractors, would be counted as a heavy vehicle but not a semi. The variable com_avg is thought not to be as accurate as ADT.

Minnesota intersection traffic data are somewhat less reliable than segment traffic data. The intersection files from Minnesota give traffic counts for both the major and minor roads, along with the year in which these data were acquired. Not only are the years quite variable from intersection to intersection, varying from 1976 to 1992, but very few of them appear to have been updated between the 1985-1989 time period files and the 1990-1993 time period files. Traffic counts had been made only once in the years from 1987 to 1993 and annual files just repeated the value of an earlier year. In other cases no traffic counts had been made since 1986 or earlier.

In view of this unreliability, efforts were made to determine a growth rate factor that could be used to update traffic counts to the time periods of interest. MNDOT personnel reported that population growth rates did not relate in a simple fashion to traffic flow (so traffic counts on an intersection could not be updated from one year to the next by a population growth multiplier). Sometimes traffic counts will be higher when new development and construction is going on and then will ease off when the buildings and houses are occupied. A program was written to extract a growth rate by least squares from traffic data for segments near the intersection and thereafter use the year of intersection traffic count to extrapolate to an ADT for the years 1987 and mid-1991. The Minnesota intersection traffic variables used in the modeling and validation below, ADT1 and ADT2, were derived from int1 and int2 by means of this program.

Washington State traffic data became available at a relatively late stage of this study but only for segments and for some intersections along segments. The traffic data were based on upstream traffic counts, but in some cases the count stations were rather far upstream, 10 or more miles. The Project Team considered averaging a downstream count and an upstream count when the upstream count was at a significant distance, but decided against it in order to maintain conformity with HSIS files. The chief concern with these data, apart from the distance of count stations, is that routes, alternate routes, and each half of certain divided highways have similar labels and considerable programming is required to ensure that a count lies on a route of interest rather than a related one. According to the HSIS Washington Guidebook, a small number of the count stations are permanent and a large number of others are used for 72-hour counts every second or third year. The counts for com_avg are considered to be less reliable than the overall counts, in part because they are based on fewer stations. Washington State Department of Transportation personnel observed that the truck counts are done on weekdays, that com_avg is based on this figure, and that it might be better to take the weekday figure and add 10% to 20% to get the overall weekly value. It was also noted that the percentage of truck traffic on a road can vary from 4% to 17% at different times of year, chiefly because of seasonal variation in the non-truck traffic.

Alignment Data

Horizontal and vertical alignment data came from construction plans in the case of Minnesota and from HSIS horizontal and vertical curve files in the case of Washington.

The Minnesota plans varied in age from a few years prior to 1985 to approximately 1920. Special effort was made to determine that these plans showed the latest alignment or realignment and that no realignment was done during the time periods under study. Nonetheless it is possible that some roads were realigned and that plans were never conveyed to the Minnesota Plan Office. The Plan Office plans are primarily Federal aid projects, and State and County aid projects sometimes do not get recorded at the State Plan Office. In addition to location problems (discussed below), problems sometimes arose because of illegibility of markings on the plan and inconsistencies between alternative measures (e.g., radius versus degree of curve, or beginning and end of curve versus length of curve) written on the plan. These were typically resolved by a judgment as to which number was most plausible. A few horizontal curves had spiral transitions at beginnings and/or ends. These were not recorded but a judgment was made as to a beginning and endpoint for a single idealized horizontal curve. A very small fraction, 2% or less, of vertical curves were represented in the plans as angle points, where the grade changes without a transition, typically a small change. Our initial understanding was that no such transitions occurred on Minnesota major roads and these points were edited so that a transition curve of 50 feet was introduced. Later, visiting Minnesota engineers reported that angle points do occasionally occur on main roads.

The Washington State alignment data were represented by a Horizontal Curve file and a Vertical Curve file. Many segments and intersections were eliminated from the sample because of anomalies in the values in these files, but the ones that remained also had minor anomalies. Because of rounding errors in the original Washington data (not enough significant digits kept) some curves appeared to overlap, and editing had to be done to restore plausible beginning and ending points for curves. In addition in some cases there were small differences between the ending grade of one vertical curve and the beginning grade of the next. When the intervening stretch was treated as a straightaway during the modeling, its grade was taken to be the average of the two neighboring grades. A few angle points occurred for both horizontal and vertical curves with small grade changes or small angle change. Curve lengths were adjusted to 50 feet for these exceptional cases.

Location Uncertainties

Minnesota data compilation was hampered by the fact that HSIS files, Minnesota photologs, and Minnesota construction plans use three different ways of measuring distance: true mileposts, nominal mileposts, and control stations. HSIS variables begmp and endmp and true_beg and true_end refer respectively to nominal beginning and ending mileposts and true beginning and ending distances of segments. Both the Minnesota photologs and the Minnesota accident data are keyed to nominal mileposts rather than true distances, and the primary usage of true_beg and true_end is to calculate segment length. The milepost of an accident in the accident files is nominal rather than true distance, and the tenths of a mile shown on Minnesota photologs are nominal mileposts not true distance. This was confirmed by MNDOT personnel and by comparison of photologs with the Minnesota List-Trumile-File for Trunk Highways. This latter book, a print-out of a file (our copy was dated September 1, 1988) obtained in Minnesota, had a listing of all State highways along with reference posts (i.e., nominal mileposts), true distances, and control stations, most of the entries effective as of 1977 (but with some updates as recent as 1983).

Control stations, used in the construction plans, are local numbers, in hundreds of feet, and may be equated to nominal mileposts by use of the just mentioned file. Many plans contain station adjustments (places where a gap in the stations occurs) and converting back and forth between the various units is an art. This conversion is especially difficult for intersections. The intersection reference point, the nominal milepost of the intersection center, is sometimes not adequately tied to construction plans or to features on the photologs: station numbers of nearby landmarks are occasionally either wrong or absent, and interpolation adds a further source of error. Plans, sometimes of ancient vintage, do not show an intersection or expected landmark, or else are ambiguous (two or more intersections or landmarks shown in the plan are plausible candidates for the sought after one). This is particularly true of three-legged intersections since these are the least well-marked, least documented, and least significant data class.

Linking a particular intersection to its photolog and to a particular site on a plan involves a comparison among four different numbers: the reference point for the intersection, the distance recorded on the photolog, the true distance recorded by the State, and the station number in the construction plans. Sometimes discrepancies occur among these numbers: the intersection may be at a slightly different point than expected in the photolog, or it may be several hundred feet away from its expected location in the plan. When the plan does not show an intersection in the near vicinity of the expected spot, an identifiable landmark must be found to verify locations and in some cases this is quite difficult.

For Washington State data, distances are measured in ARM's (accumulated route miles). The ARM is a true milepost, used in all of the HSIS files: roadway, traffic, accident, and alignment. Only the videotapes are in nominal mileposts, but a logbook permits unambiguous translation back and forth. Discrepancies were rare, perhaps because Washington Department of Transportation personnel had already resolved them. The only issue of concern was rounding errors, noted above.

A final caveat with respect to location concerns the accident data. MNDOT indicated that the accident data reviewers attempt to locate a nearby physical feature mentioned in the police report. They then determine the reference point for that feature and add an adjustment, typically a few hundred feet, to get to the accident site. The reviewers aim to get within 50 feet of the true accident site. They also assign a reliability code to their estimate.

Time Uncertainties

HSIS traffic and roadway data, the Minnesota construction plan data, and the photolog data are all supposed to apply to the time intervals under consideration. Rural areas might be expected to change more gradually than urban and suburban areas. However, some variables such as traffic data are based on averages of discrete observations that may not be representative. Others, including Minnesota intersection traffic data discussed above, may be out of date. Photolog years in Minnesota vary from 1987 to 1990 and in Washington from 1993 to 1995; changes in the number of driveways, speed limits, channelization, etc., may have occurred before or after the photolog was obtained.

For validation of the Minnesota model, 1990-1993 data were used. Since construction plans and photologs for the new time period were unavailable, some variables could not be re-measured. So it was assumed that these were generally unchanged.

Miscellaneous Limitations

Data acquired from the photologs were subject to various limitations. Minnesota photologs in reels and CD-ROMs offered a larger visual field than the videotapes acquired from Washington State. On the other hand, the latter were accompanied by audio that indicated signage and roadside features and gave the numbers on sometimes otherwise unreadable speed limit signs. The Washington voice-over also provided intersecting street and route names and was accompanied by a written log. In both cases some effort was required to verify that minor roads had stop signs, to determine channelization, and to assess whether a driveway had been seen along the road. Driveways, for example, can sometimes be mistaken for footpaths. In addition, for Washington State the photologs were used to estimate angle of intersection between major and minor roads, and limited visibility along minor roads made this difficult.

Roadside Hazard Rating was determined from the photologs. Different observers would not always agree on the value of this subjective variable (values of two, and sometimes three, independent observers were averaged, and photologs were re-inspected in some cases). The hazard rating sometimes varied substantially along a segment. With regard to intersections, it was more difficult to arrive at values in the vicinity of Washington State intersections since the roadsides at these intersections tended to be less rural than their Minnesota counterparts (small town streets rather than country roads), and the proper rating to assign to a roadside business or residence was not always evident.

Weather data collected by the Midwest Climate Center, as already noted, were limited by the fact that they were not sufficiently local.

The treatment of intersections along a segment was not quite consistent between Minnesota and Washington. In Minnesota very few segments began or ended at an intersection, and for the few that did (thought to be less than 5%) no attempt was made to remove, say, 250 feet from the segment and shorten it by omitting the intersection vicinity. In Washington most of the segments began and/or ended with an intersection, and all such segments were shortened by removal of 250 feet at each end where an intersection was encountered. On the other hand, no internal intersections were removed from the segments in either State. In Washington 95% of the segments contained no internal intersections, but in Minnesota more than half of the segments contained at least one intersection. This means that in Minnesota accidents along segments are more likely to include accidents that happened near intersections (although they would not be intersection-related or at an intersection).

It should also be noted that some desirable variables were omitted from the study altogether, e.g., superelevations, alignments on minor roads, actual speeds, and sight distances. To some extent the latter are represented in, or can be reconstructed from, horizontal and vertical alignment as well as Roadside Hazard Rating, but a direct unambiguous measurement is lacking. Also excluded, of course, are detailed information about drivers and vehicles on the road; accident circumstances such as time of day, week, and year; and weather at the time and place of an accident. To some extent demographic conditions such as ages of drivers and law enforcement practices are incorporated in the STATE variable (see below).


Previous    Table of Contents    Next
United States Department of Transportation - Federal Highway Administration