Potential Use of Archived Intelligent Transportation Systems Data for Government Reporting
|
CHAPTER 3: KEY ITS DATA ELEMENTS FOR GOVERNMENT REPORTING SYSTEMS
3.3 USE OF ITS TRAFFIC VOLUME DATA IN GOVERNMENT REPORTING SYSTEMS
3.3.1 Key Findings
Figure 3.1 shows a theoretical process for how traffic data from ITS and traditional sources are typically processed by state DOTs and how they are used in government reporting systems. The definition of ITS-generated traffic data follows a sequence of events characterized by several standards:
- The National ITS Architecture has defined the basic nature of traffic data to be used in ITS ...
- Which states and localities use in developing their own regional architectures ...
- On which the deployment of field devices to collect traffic data is based. These devices will use the Data Collection and Monitoring (DCM) standard (part of the National Transportation Communications for ITS Protocol [NTCIP]) to define machine-detectable traffic data ...
- That are used in the TMDD message sets (a standard also) ...
- That in theory will be archived for future use by government reporting systems and other uses. (The archived data standard is currently under development.)
One of the most significant ITS data types for use in government reporting systems is traffic volume data. The benefits of accessing ITS-generated traffic data for secondary uses (i.e., uses beyond real-time control strategies) have been touted in many recent forums. For government reporting systems, the deployment of ITS roadway detectors means that continuously collected traffic volumes can now be obtained where only short-counts were previously available. The impact is essentially the same as would be obtained by greatly increasing the number of locations where automatic traffic recorders (ATRs) exist. For traffic volumes, this means that sample bias in making estimates of annual averages (AADT) is greatly reduced as well as providing more data on which to base traffic adjustment factors (e.g., sample adjustments and K- and D-factors).
However, several shortcomings exist in the application of ITS-generated data to government reporting systems:
- Vehicle classification of the traffic stream (estimating the number of vehicles by category) is not currently a consideration in the TMDD, yet the ability to distinguish at least the major truck types is critical to many government-reporting systems.4 This is particularly puzzling in light of the fact that many types of sensors deployed by TMCs can distinguish length-based vehicle classes (e.g., video image processing systems). Further, vehicle lengths are used in algorithms to convert loop occupancy to traffic density, and default values are often a compromise. This may be an artifact of the early development of the TMDD when most sensors in use were loop-based and not used for classification. It may also be that vehicle classification is not an important aspect of TMC operations or at least in the currently defined message sets. However, given the fact that the equipment is improving and is capable of collecting the data, it seems reasonable that it should be included in the TMDD. This is punctuated by the fact that vehicle classification data in urban areas are a scarce commodity.

Figure 3.1. Data Flow for Traffic Data in Support of Government Reporting Systems
Click here for text description of Figure 3.1
- ITS is deployed mainly on the higher functional classes because this is where most of the congested travel in urban areas occurs. This means that the benefits of increased temporal and spatial traffic count coverage will be confined to the major highways: freeways and major signalized arterials. Traffic measurement on the lower classes will continue be required for at least the foreseeable future.
- To date, ITS owners have been less concerned with data quality than traditional data collection personnel. This is due to the current generation nature of ITS responses, which requires only gross estimates of system condition. Also, the intensive coverage of ITS sensors means that losing data from a few is not important - the system is redundant with sensors - so that down equipment is often ignored for long periods. Further, periodic calibration is also not a priority for ITS operators beyond initial set-up. This contrasts with the traditional case where sensors are scarce and personnel work hard to ensure they are operating. The net result of this is that ITS-generated traffic data is of unknown quality and field sensors are often not calibrated or maintained to the same standards as traditionally operated sensors. This difference is the source of much skepticism on the part of traditional traffic data collectors over ITS's ability to supply traffic data to government reporting systems.
- A companion issue to the collection of traffic data is the lack of supporting data about the collection activity (sometimes referred to as "metadata"). Documentation of the equipment functioning and calibration is usually nonexistent.
- Management and processing of ITS traffic data to produce the summary statistics (e.g., AADT) needed for government reporting systems is problematic. The first issue is determining whether ITS owners or government reporting system owners do the processing. The second issue is how to deal with ITS-generated data that are missing or suspicious for short periods of time - are these data to be ignored or adjusted (imputed/edited)? In the traditional realm, where data collection equipment is closely monitored, missing/suspicious data for short periods of time are not a large issue. However, ITS data are often subject to communication disruptions that last only for short periods of time. For example, several 5-minute intervals of data throughout a day might be subject to communication failures. If these periods are to be ignored, then the remaining data are wasted. Methods for detecting suspicious data and adjusting for missing values would be helpful in taking full advantage of ITS data.
- Finally, the location-referencing problem causes much confusion. ITS detector locations are not typically keyed to match locations of traditional data collection devices. The result is often duplication of traffic counts at the same location.
3.3.2 Case Study of Using ITS Data to Supplement Traditionally Collected Traffic Data: Detroit, Michigan
In order to address many of the issues associated with using ITS-generated traffic volume data discussed above, ITS data from the Detroit area were obtained. These data were supplied by the Michigan Intelligent Transportation Systems Center. Known as the "MITS Center," it is the hub of ITS technology applications at the Michigan Department of Transportation. It is a traffic management center where staff oversees a traffic monitoring system composed of:
- 180 Instrumented Freeway Miles
- 156 Closed Circuit TV Cameras
- 59 Dynamic Message Signs
- 61 Ramp Meters
- 2260 Inductive Loops
- 11 Highway Advisory Radios
Recently, Michigan DOT personnel responsible for submitting HPMS data to FHWA have starting using volume data from MITS as the source of AADT values. The process requires that MITS detectors be matched to HPMS segments.
Starting with the 2000 HPMS submittal, MDOT used MITS data as the source for AADTs using the following process 5:
- Step 1: Aggregate data from one-minute/by lane to one-hour directional. The MITS staff created an application to aggregate the data to 1-hour increments by direction for both volume counts and "minute periods down". The "minute periods down" is a self-editing tool that identifies whether signals are being detected by the field equipment. If a detector is identified as non-responsive for a majority of the minute then this field is flagged. The aggregation application provides the total number of minutes all lanes were down in an hour period.
- Step 2: Import the count data and do a gross edit check based on minutes down. The file received from MITS is imported into a database and all daily records with a 24 hour "minutes down" total of 500 or more are automatically deleted from the database. All other records are individually reviewed by staff. Decisions to delete additional daily records are based on the distribution of minutes down periods. Minimal missing periods per hour may be ignored depending on certain variables such as impact on the hour or 24 hour totals.
- Step 3: For the 2000 year only we used 5 weekday volumes per month. An application was created to use the data for each month. The application created a 24 hour average from the 5 weekday counts; averaged the monthly estimates for each month data was available; multiplied the average by the appropriate seasonal/DOW adjustment factor. This was done for each site. The final value was the adjusted AADT for the site.
- Step 4: Apply the derived AADT to the appropriate road segment. Some of the sites were in areas where this value would be the AADT. Other sites were in the middle of an interchange. These locations required the addition of ramp volumes to make the AADT complete for the assigned segments.
The MITS data were also used in this study to demonstrate a potential method for dealing with missing data. The method explored imputes data from a history file at each location. The history file contains hourly growth factors for each location computed for each day of the week; these are the average percent growth in traffic from the previous hour for the day of week in question. So, for example, at a particular site, the average growth in volumes on Mondays from 7:00 to 8:00 AM might be 23 percent. This method is therefore referred to as the "Historical Growth Rate" method of imputation. Another method of imputation from the literature uses data from nearby detectors (either from adjacent lanes or upstream/downstream locations) to fill in missing data.6 The scales of these methods are clearly different: the current method imputes data that already been summarized to the hourly level while the method of Hu et al (2001) imputes data at the lowest level of aggregation. No attempt to compare the methods were made for this reason. Further, it is likely that they are complementary - rather than competing - procedures.
A series of processing procedures were developed to accomplish this task.7 These are discussed below.
- Step 1: Aggregate data from one-minute/by lane to five-minute/by lane. The MITS data are archived at one-minute intervals, which is a very detailed level. (Most ITS traffic archives do not go below 5-minute summaries.) The one-minute volumes were aggregated to five-minute periods by simple summing. When not all one-minute intervals where present, the sum was factored by the ratio of five: number of one-minute intervals present. However, at least two one-minute intervals had to be present; otherwise, the values were set to missing. This straight factoring assumes that traffic volumes do not vary much within each five-minute period (a reasonable assumption). From a practical standpoint with the Detroit data, the majority of data were complete for each five-minute period:
| No. 1-min Intervals in Each 5-min Period | No. of 5-min Periods | Percent |
| 1 | 54,020 | 0.20 |
| 2 | 115,674 | 0.42 |
| 3 | 485,759 | 1.75 |
| 4 | 3,457,539 | 12.48 |
| 5 | 23,586,804 | 85.15 |
- Step 2: Perform Quality Control (QC). QC procedures developed for FHWA's Mobility Monitoring Program were applied to the data. Records that failed the quality control were set to missing. The QC procedures for volume applied were:
- Maximum volume threshold (greater than 250 vehicles per lane for five minutes)
- Sequential volume test (if the same volume is reported for five or more consecutive time periods, assume that the detector is malfunctioning)
A total of 99 percent of records passed these rudimentary checks.
- Step 3: Develop Lane Utilization Factors for Aggregating Across All Lanes In a Direction. It was hypothesized that lane utilization (i.e., the distribution of traffic in each lane) was related to the number of lanes and the level of congestion. A series of analyses were made using the V/C ratio as the indicator of congestion (Figures 3.2 to 3.4). The results indicate that under low-congestion levels, traffic is unevenly distributed across all lanes. As congestion builds, the traffic distribution evens out. These factors were applied when not all lanes in a direction at a location were present.

Figure 3.2. Lane Utilization Factors Based on 2000 Detroit ITS Data: Two-Lane Freeways
Figure 3.3. Lane Utilization Factors Based on 2000 Detroit ITS Data: Three-Lane Freeways
Figure 3.4. Lane Utilization Factors Based on 2000 Detroit ITS Data: Four-Lane Freeways
- Step 4: Aggregate five-minute/by direction to one-hour by direction. As with the one-minute to 5-minute aggregation, a simple summing was performed. When not all 12 five-minute periods where present, the results were straight factored as before, with the caveat that at least eight five-minute periods had to be present; otherwise, hourly volumes were set to missing.
- Step 5: Impute Volumes for Missing Hours. The method chosen to do imputation of missing hours was to develop average hourly traffic growth rates for each location by day of week. The growth rates are computed as the percent growth in traffic from the previous hour for a specific location. In order to apply to procedure, sufficient data must already exist at a location; otherwise, imputation is not done. These site-specific, day-of-week growth rates were computed using:
- Only hours with complete five-minute data (i.e., 12 five-minute periods)
- Hourly volumes for each location and day of week that were within +/- 1.5 standard deviations of the average hourly volumes.
- The method was tested by "jackknifing" imputed values against actual values. The results are given in Table 3.2. The overall average errors (i.e., "signed" error) are very small. The absolute errors are also within reason, six to seven percent for most hours.
- Step 6: Develop Temporal Distributions. As a further check of veracity, day-of-week temporal (24-hour) distributions of traffic were developed for selected locations (Figures 3.5 to 3.11). The patterns exhibit the expected differences between weekday (morning and afternoon peaks) and weekend (single mid-day peak).
Table 3.2. Results of Imputation Experiment
| Hour | Average Signed Error | Average Absolute Error |
| 0 | | |
| 1 | 3.3% | 11.3% |
| 2 | 2.5% | 11.2% |
| 3 | 2.3% | 11.1% |
| 4 | 1.7% | 11.2% |
| 5 | 1.5% | 9.6% |
| 6 | 0.1% | 7.4% |
| 7 | 0.6% | 7.1% |
| 8 | 0.4% | 6.7% |
| 9 | -0.3% | 7.6% |
| 10 | -0.1% | 6.7% |
| 11 | 0.2% | 6.1% |
| 12 | 0.1% | 6.1% |
| 13 | 0.4% | 5.5% |
| 14 | -0.2% | 6.2% |
| 15 | 1.5% | 6.5% |
| 16 | 0.6% | 6.8% |
| 17 | 0.8% | 6.5% |
| 18 | 0.0% | 7.1% |
| 19 | 0.0% | 7.1% |
| 20 | 0.3% | 7.0% |
| 21 | -0.1% | 7.7% |
| 22 | 1.4% | 9.6% |
| 23 | 0.0% | 10.1% |
Figure 3.5. Temporal Distributions Based on 2000 Detroit ITS Data: I-696A, MP10.93
Figure 3.6. Temporal Distributions Based on 2000 Detroit ITS Data: I-696B, MP22.847
Figure 3.7. Temporal Distributions Based on 2000 Detroit ITS Data: I-75, MP80.127
Figure 3.8. Temporal Distributions Based on 2000 Detroit ITS Data: I-94A, MP202.004
Figure 3.9. Temporal Distributions Based on 2000 Detroit ITS Data: I-96A, MP160.95
Figure 3.10. Temporal Distributions Based on 2000 Detroit ITS Data: I-96C, MP177.511
Figure 3.11. Temporal Distributions Based on 2000 Detroit ITS Data: M-39, MP7.01
4Classifying vehicles in the traffic stream is different from identifying the category of individual vehicles identified in crashes. In the former case, classification is performed automatically, in the latter case, by visual inspection.
5 This process was documented and provided by Mike Walimaki of MDOT.
6 Hu, Pat et al, Proof of Concept of ITS as an Alternate Data Source: A Demonstration Project of Florida and New York Data, prepared for FHWA, September 30, 2001,
http://www-cta.ornl.gov/Publications/Proof_of_Concept.pdf
7 Note: The procedures outlined below were not tested for accuracy. Clearly, more work in this area is needed to test the assumptions used. However, in the absence of formalized tests and default values, these procedures are thought to produce reasonable results. Further, the steps developed indicate the process that future processing procedures should use.