Skip to contentUnited States Department of Transportation - Federal Highway AdministrationSearch FHWAFeedback
Policy Information
<< Previous Contents Next >>

Quality Control Procedures for Archived Operations Traffic Data: Synthesis of Practice and Recommendations

Review of Current Practices

This section summarizes a review of the literature as well as a review of current ADMS practices. A literature search was conducted to identify relevant papers and reports on validity criteria. A survey of current practices was conducted of nine existing data archives.

Literature Review

The literature review identified validity criteria that could be classified into the following general categories:

  • Univariate and multivariate range checks - These criteria typically correspond to the minimum, maximum, or range of expected values for a single variable or combination of variables. These criteria are easy to automate in a database environment.
  • Spatial and temporal consistency - These criteria evaluate the consistency of traffic data as compared to nearby locations (either across lanes, or upstream and downstream monitoring locations) or previous time periods. Automating these criteria in a database may require more effort but could likely yield more extensive reviews of data quality.
  • Detailed diagnostics - These criteria require detailed diagnostic data from traffic detectors and typically can not be performed with the original source data that is collected and archived by traffic operations centers. Nevertheless, these criteria may be useful in diagnosing the cause(s) for poor data quality at specific detector locations.

Univariate and multivariate range checks

Nihan et al. (1990)6 and Jacobson et al. (1990)7 proposed validity criteria for single loop detectors as follows:

  1. Maximum 20-second lane volume of 17 vehicles (3,060 vphpl) (see Figure 1);
  2. Minimum and maximum 20-second volume-to-occupancy (v/o) ratios are outside of defined threshold ranges for four occupancy ranges (see Figure 1):
    1. 0.1 to 7.9 percent occupancy: min v/o ratio=0.327, max v/o ratio=1.372;
    2. 8.0 to 25.9 percent occupancy: min v/o ratio=0.209, max v/o ratio=1.098;
    3. 26.0 to 35.9 percent occupancy: min v/o ratio=0.085, max v/o ratio=0.663; and,
    4. >36.0 percent occupancy: min v/o ratio=0.037, max v/o ratio=0.400.
  3. Minimum 20-second lane volume of 1 when 20-second occupancy is between 0 and 0.1 percent; and
  4. Maximum 5-minute lane occupancy of 90 percent.

Figure 1. Illustration of Valid Data Ranges in Nihan et al. (1990) and Jacobson et al. (1990)
This chart shows the valid value ranges for lane occupancy (expressed in percent, on the x-axis) and traffic volume (expressed in number of vehicles per 20 seconds, on the y-axis). The chart indicates that traffic volumes are invalid when greater than 17 vehicles. The chart also indicates that low traffic volumes are invalid when lane occupancy is high, and that low occupancy values are invalid when traffic volumes are high.

In software documentation for a CD-ROM based data archive (CDR), Ishimaru (1998)8 describes acceptable volume-occupancy ranges for 20-second data from single loop detectors (Figure 2):

  1. If OCC = 1%, then VOLUME=2 must occur;
  2. If 1%<OCC≤5%, then 0≤VOLUME≤7;
  3. If 5%≤OCC≤10%, then 2≤VOLUME≤11;
  4. If 10%≤OCC≤16%, then 1≤VOLUME≤17; and,
  5. If OCC>16%, then 0≤VOLUME≤17.

The CDR software described by Ishimaru manipulates 5-minute summary data and permits analysts to specify data acceptance thresholds for the maximum permitted number of 20-second records that were considered to be invalid by the above criteria.

Figure 2. Illustration of Valid Data Ranges in Ishimaru (1998)
This chart shows the valid value ranges for lane occupancy (expressed in percent, on the x-axis) and traffic volume (expressed in number of vehicles per 20 seconds, on the y-axis). This chart differs from Figure 1, in that there are fewer invalid data ranges.

Turochy and Smith (2000)9 used five criteria to identify invalid detector data from Hampton Roads, Virginia. Data were considered invalid if:

  1. Occupancy > 95 percent;
  2. Vehicle volumes > 17 vehicles per lane per 20-second interval (3,060 vphpl);
  3. Positive traffic count with zero speed;
  4. Vehicle volume is greater than calculated threshold for a reported occupancy of zero (due to occupancy rounding or truncation);
  5. Minimum and maximum threshold for average effective vehicle length (AEVL), with values of 9 and 60 feet (2.7 and 18 meters) used as minimum and maximum thresholds for vehicle length. AEVL is calculated as the following: AEVL = (10 × speed [km/hr] × occupancy [%]) / equivalent hourly flow rate [vehicles per hour per lane, vphpl].

Hu et al. (2001)10 and Schmoyer et al. (2001)11 also used five criteria to identify invalid detector data from Long Island, New York and Orlando, Florida. Data were considered invalid if:

  1. Five-minute traffic counts > 300 vehicles (or > 30 vehicles per 30-second period);
  2. Traffic count < zero;
  3. Number of consecutive identical non-zero traffic count values exceeds maximum threshold (variable based on Poisson probability);
  4. Number of consecutive zero count traffic values exceeds maximum threshold (variable depending upon hour of day, ranges from 5 to 15 minute duration);
  5. Change in traffic count values exceeds maximum threshold (variable based on Poisson probability), comparing each 30-second count to the 5-minute period before and after.

Chen et al. (2003)12 describes a daily statistic algorithm that is used to identify erroneous data from single loop detectors in the Calfornia PeMS data archive. A daily score is computed for each unique loop detector and compared to a specified threshold that can be customized for different systems:

  1. Number of 30-second samples that have OCC=0 (maximum threshold of 1,200 per day);
  2. Number of 30-second samples that have OCC>0 and VOLUME=0 (maximum threshold of 50 per day);
  3. Number of 30-second samples that have OCC>35% (maximum threshold of 200 per day); and,
  4. Entropy of 30-second occupancy samples (minimum threshold of 4), which is defined as:
    The entropy of a traffic data variable is defined as sum of the product of the probability (frequency) of the traffic data variable and the log value of the traffic data variable. where p(x) gives the probability of any x.13

Turner et al. (2004)14 used the following quality control checks to identify invalid data from nearly 30 cities in the Mobility Monitoring Program:

  1. Controller error codes: remove numeric error code values (typically "-1" or "255");
  2. No vehicles present: replace zero speed values with null/missing speed values when VOLUME=OCC=SPEED=0;
  3. Check for consistency of elapsed time between data polls;
  4. Check for and remove duplicate records (location identifier, date and time stamp are identical);
  5. Check for and remove date, time, and location identifier values that are not in the valid domain range;
  6. Maximum volume (varies based on time interval, invalid if VOLUME>3000 vphpl as max rate);
  7. Maximum occupancy (invalid if OCC>95% for 20-30 second periods, OCC>80% for 1-5 minute periods);
  8. Minimum speed (invalid if SPEED<5 mph);
  9. Maximum speed (invalid if SPEED>100 mph for 20-30 second periods, SPEED>80 mph for 1-5 minute periods;
  10. Multivariate consistency (invalid if SPEED=0 and VOLUME>0 [and OCC>0]);
  11. Multivariate consistency (invalid if VOLUME=0 and SPEED>0);
  12. Multivariate consistency (invalid if OCC>0 and SPEED=0 and VOLUME=0);
  13. Truncated occupancy values of zero (invalid if OCC=0 and VOLUME>[(2.932×SPEED×ELAPSED_TIME)/600];
  14. Maximum estimated density (invalid if density > 220 where density=[VOLUME×(3600/ELAPSED_TIME)/SPEED]
  15. Consecutive identical volume, occupancy and speed values (invalid if more than 8 consecutive volume AND occupancy AND speed values are identical, including zero values).

Weijermars and Van Berkum (2006)15 presented criteria that could be used to check traffic counts from loop detectors on signalized arterial streets. The criteria are based on maximum expected traffic flow, minimum expected traffic flow, and the principle of flow conservation.

Spatial consistency

Chen and May (1987)16 recommended comparing the average detector on-time against the average on-time for all lanes at that location/station. The recommended time sampling interval was five minutes and the allowable deviation for a particular lane from the station average was ±15 percent.

Cleghorn et al. (1991)17 presented several validity criteria for detector data on freeways:

  1. An upper boundary line for volume and occupancy as indicated by the following equation: Volume = (occupancy × maximum expected speed × time interval) / (minimum expected vehicle length + detector length).
  2. Maximum difference in traffic counts between paired loop detectors should be two vehicles or less.
  3. The maximum acceptable deviation from historical averages (varies based on number of standard deviations selected as acceptable).

Cleghorn et al. also indicated that several unusual combinations of volume, occupancy, and speed values may be possible due to the manner in which data are collected in small time intervals:

  1. Records with small positive occupancy and zero volume could be caused by a vehicle signal that "bridges" across two different time polling intervals;
  2. Records with a zero occupancy and small positive volume could be caused if the occupancy values are reported as integer and the computer system truncates or rounds decimal places;
  3. Records with valid volume and occupancy values but no speed values could be caused if the computer controller is unable to sort out vehicle arrivals between the two loops

Dailey (1993)18 suggests and demonstrates three different methods to identify detector errors:

  1. The correlation between volume and occupancy;
  2. The correlation between data from adjoining lanes; and,
  3. The correlation between time series from detectors along the path of traffic flow.

Nihan (1997)19 recommends the use of storage rates between downstream and upstream detectors to identify data errors. The sum of the storage rate over several time intervals is calculated by taking the difference in vehicle volumes between upstream and downstream locations. A steadily increasing or decreasing storage rate indicates that one location is consistently over or under counting vehicles. The maximum storage rate for consecutive detectors can not exceed the number of vehicles at jam density.

Wall and Daily (2003)20 describe a similar process of comparing traffic counts from reference stations to a target station to identify poorly calibrated detectors. The process relies on accurately identifying valid reference stations, and then determining the proper time lag by which to compare upstream and downstream locations. The authors also suggest the use of a correction factor based on the difference in traffic counts.

Kwon et al. (2004)21 recommended automatic and semiautomatic statistical methods to detect spatial configuration errors in traffic detectors. The methods are based on the strong correlation between measurements made by spatially close detectors. The proposed automatic method uses a scoring scheme that computes the probability of detector mislabeling, whereas the semiautomatic method uses a multidimensional scaling (MDS) map of detectors, which visually displays the similarity between detector measurements.

Vanajakshi and Rilett (2004)22 also recommended the principle of vehicle conservation be used to compare the vehicle counts from consecutive detector locations. Their paper suggested two criteria:

  1. The difference between the cumulative volume at a downstream and an upstream detector should equal the number of vehicles between the downstream and upstream detectors; and,
  2. The maximum difference between the downstream and upstream detector cumulative flows can not exceed the maximum number of vehicles that can be accommodated on the length of road between these two detectors at jam density.

Achillides and Bullock (2004)23 describe a process used to evaluate the performance of microloop and radar-based traffic detectors. Most of their evaluation process relied on comparing the consistency of volume, lane occupancy, and speed measurements between closely spaced microloops and radar detectors. The authors also estimated AEVL as defined by Turochy and Smith (2000) using volume, occupancy, and speed data. AEVL values of 9 and 60 feet were used as minimum and maximum permissible ranges for vehicle length.

Detailed diagnostic data

There is several validity criteria in the literature that rely on detailed sensor output that are not typically available in archived traffic data. These criteria would be limited to specific locations where detailed sensor output can be obtained from the field controllers. For example, Coifman (1999)24 recommended an off-line test that compares the matched individual vehicle on-time between the upstream and downstream loops in a loop detector pair, and suggested a possible on-line implementation as well. In later work, Coifman and Dhoorjaty (2004)25 suggested eight off-line tests that rely on individual vehicle data. Five of the eight tests can be applied at single loop detectors, whereas all eight tests can be applied at loop detector pairs. The eight tests are as follows:

  1. Individual vehicle velocity versus moving median velocity
  2. Headway versus on-time
  3. Feasible range of vehicle lengths
  4. Feasible range of headway and on-time
  5. Length differences and ratios at dual-loop detectors
  6. Cumulative distribution of vehicle lengths
  7. Loss of loop in dual-loop detector
  8. Counting number of consecutive congested samples

Ameth et al. (2001)26 suggested measuring individual vehicle lengths using detector on-time from both loop detectors in a detector pair, and then comparing the average measured effective vehicle length to a an acceptable range for average vehicle length.

Miscellaneous

In a dissertation, Buchheit (2002)27 presents the following taxonomy for data errors and includes examples for weigh-in-motion data:

  1. Systematic errors
    1. Calibration
    2. Threshold
    3. Missing data
    4. Extra data
  2. Individual record errors
    1. Missing records
    2. Garbling errors
    3. Duplications
    4. Combinations

Aside from this taxonomy, no specific validity criteria were presented in this dissertation. Other errors that were not discussed in this dissertation but can exist include:

  • Latency of data transmission;
  • Communication line breaks/distortion; and,
  • Lack/loss of synchronization with a system clock.

Review of Current Practice

This section summarizes the use of validity criteria at nine data archives (Table 1) as of July 2006. Table 2 summarizes the validity criteria used in these archives' quality control processes, and the next several pages contain descriptions for each archive. The findings from the review of current practice are as follows.

  • The validity criteria are similar among the nine different data archives. Six of the criteria in common are simple range checks (minimum and maximum values) for the most common freeway detector data collected (volume, occupancy, and speed). Other criteria that are common across many archives are those that relate to logical multivariate consistency. The similarities should not be too surprising considering the open sharing of validity criteria among data archive developers.
  • The validity criteria are less sophisticated and complex than those described in the literature. These differences may be due to the need for easily-programmable criteria that requires limited computer processing time. Another factor may be that the criteria used in the literature were for "special studies" whereas the criteria in the nine data archives are used on a production basis.
  • Nearly all of the validity criteria are programmed on a simple pass/fail basis. For example, if a data record fails a criterion, the data archive administrator typically does not manually or visually review the failed data in the context of other good data. This is most likely due to the large amount of data being processed and loaded into these data archives on a daily basis. Manual or visual review may be performed by users of the archived data when special studies are conducted using selected times and locations from the data archive.
  • Most of the validity criteria do not have a specified order or sequence. The exceptions to this occur in two situations: 1) prescreening procedures are used to replace numeric error codes (e.g., 0, -1, 255) with null or missing values before validity criteria are applied; and 2) validity criteria are applied to disaggregate data first, then to aggregate data second.
  • It appears that all validity criteria are applied even if previous criteria indicate invalid data. The documentation reviewed for this report did not indicate that the validity criteria checking were stopped once a criterion fails a particular data value. Instead, the data archives seem to apply all validity criteria regardless of previous criterion failures. One viewpoint suggests that further review is not necessary if a data value has already failed a validity criterion (as a technique to save computer processing time). However, the consistent application of all validity criteria may be important for identifying multiple data quality issues that are caused by separate and distinct problems.
Table 1. Listing of Data Archives Surveyed
Data Archive Maintaining Agency Sponsoring Agency and Data Source(s)
ADMS Virginia University of Virginia Virginia DOT
California PeMS (Performance Measurement System) University of California-Berkeley/Berkeley Transportation Systems Caltrans
CATT (Center for Advanced Transportation Technology) Lab University of Maryland Maryland State Highway Administration
Central Florida Data Warehouse University of Central Florida Florida DOT
FHWA Mobility Monitoring Program Texas Transportation Institute and Cambridge Systematics, Inc. Federal Highway Administration
Kentucky ADMS University of Kentucky Kentucky Transportation Cabinet
Phoenix RADS (Regional Archived Data Server) Maricopa County DOT Maricopa County DOT
PORTAL (Portland Oregon Regional Transportation Archive Listing) Portland State University Oregon DOT
WisTransPortal V-SPOC (Volume-Speed-Occupancy Suite) University of Wisconsin Wisconsin DOT
Table 2. Summary of Validity Criteria in Nine Data Archives
Validity Criteria ADMS VA CA PeMS MD CATT Central FL FHWA MMP KY ADMS Phoenix RADS PORTAL (OR) V-SPOC (WI)
Time interval of data 1-minute 30-sec & 5-minute10 5-minute 30-sec & 5-minute Varies 15-minute 20-second 20-second 5-minute
Controller error codes (e.g., -1, 255, etc.)     check mark   check mark check mark check mark    
Check consistency of elapsed time and poll cycles     check mark   check mark check mark check mark    
Check for duplicate records (location ID, date, time identical)     check mark   check mark check mark check mark    
Check to ensure that location ID, date, time are in valid domain     check mark   check mark check mark check mark    
If VOL>0 & SPD=0 then invalid check mark   check mark check mark11 check mark check mark check mark    
If VOL=0 & SPD>0 then invalid check mark7   check mark check mark11 check mark check mark check mark    
If VOL=SPD=0 & OCC>0 then invalid check mark8   check mark check mark11 check mark check mark check mark    
If VOL=OCC=SPD=0, then set SPD=missing/null (no vehicles present)     check mark   check mark check mark check mark    
If OCC=0 and VOL>VOLmax1     check mark   check mark check mark check mark    
Minimum VOL 0 0         0   0
Maximum VOL 3,120 vphpl
(52@1-min)
3120 vphpl
(260@5-min)
3000 vphpl
(250@5-min)
3000 vphpl11
(25@30-sec)
3000 vphpl
(time varies)
3000 vphpl
(750@15-min)
2800 vphpl
(23@20-sec
700@15-min)
3060 vphpl6
(17@20-sec)
3100 vphpl
(258@5-min)
Minimum OCC 0% 0%         0%   0%
Maximum OCC 100%9 100% 80% 100%11 95%13
80%14
80% 100%@20-sec,
50%@15-min
95%6 100%
Minimum SPD 0 mph 0 mph 5 mph   5 mph 5 mph 0 mph 5 mph6 0
Maximum SPD 120 mph 100 mph 80 mph 100 mph11 100 mph13
80 mph14
80 mph 140 mph@20-sec,
85 mph@15-min
100 mph6  
Minimum AEVL2 2.2 m (9 ft)     10 ft12          
Maximum AEVL2 18 m (59 ft)     60 ft12          
Maximum vehicle density     220   220      
Maximum consecutive identical VOL&OCC&SPD values (including VOL=OCC=SPD=0) 8
(varies for zero values)
  8   8 8 8    
If OCC=0 or OCC=1, then FLOW<580 vphpl                 check mark
If 1<OCC≤15, then 1<FLOW<1400                 check mark
If 15<OCC<25, then 180<FLOW<2000                 check mark
If OCC=0 or OCC=1, then FLOW<580 vphpl                 check mark
If 1<OCC≤15, then 1<FLOW<1400                 check mark
If 15<OCC<25, then 180<FLOW<2000                 check mark
If OCC>25, then FLOW>500                 check mark
If OCC>25, then 0<SPD<80                 check mark
Minimum required daily data samples4   >60%              
Daily data samples where OCC > X% (default X=70%)4   >20%              
Daily data samples where VOL > X in 30-sec (default X=20veh)4   >20%              
Daily data samples with OCC=04   >59%              
Daily data samples with VOL=04   >95%              
Daily data samples with VOL=0 and OCC > X (default X=0%)4   >2%              
Daily data samples with consec. identical OCC values4   >50%              
Minimum entropy value5       <1          
Monthly detector samples where OCCmax=06                 check mark
Monthly detector samples where OCCavg<4%6                 check mark

Abbreviations:
VOL=volume, OCC= occupancy, SPD=speed, AEVL=average effective vehicle length, FLOW= hourly vehicle flow rate, in vehicle per hour per lane (vphpl)

Notes:
1 VOLmax = [(2.932×SPEED×ELAPSED_TIME)/600 - based on maximum possible volume when occupancy value is truncated to 0.
2 AEVL = 10 × SPD (km/hr) × OCC (%) / hourly equivalent flow rate (vphpl).
3 Density = [VOL ×(3600/ELAPSED_TIME)/SPEED].
4 These validity criteria identify a detector as invalid for the entire day.

5
The entropy of a traffic data variable is defined as sum of the product of the probability (frequency) of the traffic data variable and the log value of the traffic data variable.
where p(x) gives the probability of any x.

6 These validity criteria are used to generate monthly detector quality reports.
7 If VOL=0 and SPD>0 and not equal to the previous speed value reported, then speed is invalid. If the speed value equals the previous speed value, then the value is kept but marked as "DoNotUseForCalculations."
8 If VOL=0 and OCC ≠ 0, then mark invalid. If VOL>[(OCC×3)+4], then the value is kept but marked "SuspectRecord".
9 Criteria applied only to mainline freeway stations.
10 Criteria are applied only to data measured from 5 am to 10 pm.
11 Criteria applied to data at 30-second time intervals.
12 Criteria applied to data at 5-minute time intervals.
13 Criteria applied to 20-second and 30-second data.
14 Criteria applied to 1-minute, 5-minute, and 15-minute data.

ADMS Virginia

Prior to 2006, the University of Virginia's Smart Travel Laboratory used validity criteria in ADMS Virginia that were based mostly on work described in Turochy and Smith (2000). Empirical data analyses conducted since then have resulted in refined criteria as described below and in separate documentation.28

Table 3. Real-Time Screening Tests Recommended For Northern Virginia (NOVA) Smart Traffic Center (STC) Detector Data
Test Name Reason and details
A note on error code
  • If either of volume or speed or occupancy is 255, the record is a suspect and is failed. It should be noted that this error code is much above the acceptable upper limit values for all the three traffic variables, and would automatically be failed in the individual thresholds.
  • This is not a test. Only a note.
  • Reference: Turner et al. (2004); and discussion with the NOVA STC staff
A note on VOS=000
  • Volume,Occupancy,Speed(VOS)=0,0,0 is legitimately possible data. So, we recommend passing such records, when evaluated in real-time. Since it is never failed in real-time, this is not a test.
  • This is not a test. Only a note.
  • Reference: Discussion with the NOVA STC staff
No input data available
  • When studying data for long periods of time, only two cases of detector data have been found:
    • The detector data are present in the input data stream. In this case, values for all three traffic quantities (volume, occupancy, speed) are present as numbers. In this case, the record shall be subjected further to the screening tests described below.
    • The detector data are completely absent from the input data stream. This is interpreted and inserted in the database archive as V=NULL and O=NULL and S=NULL. There are two possible reasons for missing detector data: there is no data for that particular time interval or that particular detector is missing from the data stream. In this case, if V=NULL and O=NULL and S=NULL, the record shall be failed directly and shall not be subjected to any more screening tests.
Test 1

Volume Thresholds

Position 0
  • If the volume > 52, then the record shall be failed. A 1-minute detector volume of 52 is equivalent to 52*60=3120 vehicles per hour per lane, which is suspiciously high in almost all the cases.
  • If the volume is negative (i.e. < 0), the record shall be failed.
  • Reference: Turner et al. (2004) and Turochy and Smith (2000).
Test 2

Occupancy Thresholds

Position 1
  • If the lane is not reversible high-occupancy vehicle (RHOV) and not entrance or exist ramp, and occupancy > 100, the record shall be failed.
  • If the lane is on an entrance or exit ramp and occupancy >100, the record shall be failed.
  • If the lane is RHOV and the occupancy is between 100 and 127 (i.e. > 100 and <= 127), the record shall be failed. If occupancy > 228, the record shall be failed. If the occupancy is between 128 and 228 (>= 128 and <= 228), then a value of 128 shall be subtracted from the occupancy, and the record shall be passed. If a value of 128 is subtracted from the occupancy this situation will be marked with a "RhovOccChanged" code and the record shall not be failed. This operation is performed to account for an artifact observed with the data stream over a long time.
  • If the occupancy is negative (i.e. < 0), the record shall be failed (this lower threshold is irrespective of the lane type).
  • Reference: Turner et al. (2004) and discussion with the NOVA STC staff
Test 3

Speed Thresholds

Position 2
  • If the speed > upper threshold (120 mph), the record shall be failed.
  • If the speed is negative (i.e. < 0), the record shall be failed.
  • Reference: Turner et al. (2004) and Turochy and Smith (2000).
  • Note: From Turner et al's reference, speeds between 0-5 mph were tested for failing. A number of records were found with reasonable volume patterns in this speed range. Further, some accidents with corresponding changes in occupancy were also found in this speed range. For this reason, this test is eliminated.
Test 4

(VS relation)
Positive speed with zero volume
(V=0, S>0)
OR
Positive volume with zero/no speed
(V>0, and S=0)

Position 3
  • If the volume is zero and the speed is positive, then:
    • If the speed is equal to the speed from the most recent record for which volume was positive and speed was positive and equal to the speeds from all the records between this one and the most recent record with a positive volume and positive speed, then this record shall also be passed. In this case, the record shall be marked with a distinct code of "DoNotUseForCalculations", but shall not be failed. This is irrespective of whether the record with the most recent positive volume and positive speed was failed because of other reasons (like for ex. by the VO test).
    • If the speed is not equal to the previous record, this record shall be failed.
  • This is an artifact of many NOVA controllers. If V=0, the controller outputs the S from the previous record. Although this case is construed as S=0, the field data shall not be modified.
  • If volume is positive and speed is zero, the record shall be passed, irrespective of the detector type (single loop, double loop, RTMS, acoustic).
  • Reference: Turner et al. (2004) and discussion with the NOVA STC staff
Test 5

(VO relation)

Position 4
  • If the Volume = 0, and Occupancy is not 0, then the record shall be failed.
  • If V > O*3 + 4, record shall be marked with a "SuspectRecord" code, but shall not be failed.
  • Reference: Adapted from Jacobson et al. (2000) and empirical data analyses.
  • Note: A test (If the Volume > 5, and Occupancy = 0, then the record shall be failed) was originally designed, and then tested. A number of detector records were found having reasonable volume patterns, but with occupancy=0. So, this test has been eliminated.
Test 6

(VOS relation)
AEVL (Average Effective Vehicle Length)

Position 5
  • If volume, occupancy and speed are all positive numbers, then consider
  • AEVL = 10 * S * O / V where

    AEVL = Average Effective Vehicle Length (meters)
    S = speed (km/h)
    O = occupancy (%)
    V = hourly equivalent volume (vehicles/lane/hour)

    The record shall be marked with a "SuspectRecord" code if AEVL < 2.2 or AEVL > 18 and shall not be failed. If 2.2 <= AEVL <= 18, the record shall be passed.
  • If any of volume, occupancy, speed is zero, then this test is not applicable and the situation shall be marked by "AEVLNotApplicable" code and shall not be failed.
  • Reference: Turochy and Smith ( 2000)
Not a test

Data estimation

Position 6

There is no temporal aggregation for NOVA detector data. The detector data are not estimated before being inserted in the archive. This is reserved for traffic data from other regions where there is temporal aggregation for detector data. All NOVA detector data records shall be marked with a "DataNotInflated" code in this position and shall not be failed.

Table 4. Batch Screening Tests Recommended For Northern VA Detector Data
Test Name Reason and details
VOS=000
Or
VS=0?
Or
VOS=NULL

Position 7

VOS=000 could be legitimate traffic data, or could be due to errors in the detection system (such as power, communication, hardware or software failure). The potential error can be identified only with the context of the surrounding time periods. The following are the various cases for treating VOS=000 data:

  • Ramps may often have low flow conditions and should never be failed under this test.
  • It has been noted that more than 2 hours can pass without a single vehicle on a detector. When this happens, the controller locks up, and needs to be reset by the maintenance personnel. This scenario occurs mainly during nights i.e. very low flow conditions. Between 10:00 PM and 5:00 AM, the general purpose Norm and HOV Norm lane detectors shall be tested thus: If records for 120 or more consecutive minutes have VOS=000, then the records beyond the first 120 minutes shall be failed. The records for the first 120 minutes shall be passed as reasonable.
  • Between 5:00 AM and 10:00 PM, for general purpose Norm and HOV lane detectors, if records for 10 or more consecutive minutes display a pattern of VOS=000 or VOS=00<constant>, then all of them shall be failed. For this test, all the patterns shall be treated on par. i.e., if 3 records are VOS=00<constant> and 7 exhibit VOS=000, then all the records for all 10 minutes shall be failed.
  • For RHOV and other restricted lanes, VOS=000 is the expected pattern during the periods when vehicles are not legally allowed to use those lanes and those records shall be passed. Between 5:00 AM and 10:00 PM, AND during the periods when vehicles are legally allowed to use these lanes, if records for 10 or more consecutive minutes display the pattern of VOS=000 or VOS=00<constant>, then all of them shall be failed.

Reference: Adapted from Chen et al. (2003), discussion with the NOVA STC staff, and empirical data analyses.

Stuck detector

Position 8
  • If records for 8 or more consecutive minutes have the same values for VOS (other than 000 or 00<constant>), all of them shall be failed.
  • Reference: Turner et al. (2004), discussion with the NOVA STC staff, and empirical data analyses.
Speed threshold

Position 9
  • Between 5:00 AM and 10:00 PM, if 5 or more consecutive records have speeds > higher threshold (say, 85 mph), and non-zero volumes, all the records shall be failed.
  • Reference: Turner et al. (2004), discussion with the NOVA STC staff, and empirical data analyses.

Note: In this table, "Position X" refers to the position of the explanatory metadata code in a character string that is stored to document quality control results.

Some detectors are known not to give data or to give unreasonable data for legitimate reasons (like under construction or milled out). This is stored separately and can be combined with the results of the screening tests to completely assess the quality of the data from a particular detector or the overall data quality for a larger geographical area. All detectors are considered in the screening above, irrespective of their "active" or "not active" status.

Spatial aggregation from detectors to station. Station data are obtained by aggregating the detector data for each time stamp. Only stations for which all constituent detectors are "active" are considered in the aggregation. If a station has at least one "not active" detector, then no data are generated for that station. The detector volumes are added to get the station volume, the station speed is the volume-weighted average of the detector speeds, and the station occupancy is the simple average of the detector occupancy. The station volume is extrapolated if only few of the constituent detectors return data. If the VOS for all the constituent detectors are suspect or missing, then the station data record for that time stamp shall be left as NULL. Only the reasonable, non-failed detector data records are considered in calculating the station data.

If all the constituent detectors of a station have VOS=000, and they pass the screening tests, then the station record shall be stored as VO=00 and S=NULL.

If all the constituent detectors of a station have V=0 and S>0, and they pass the screening tests, then the station record shall be stored as V=0 and S=NULL.

Real-time screening tests recommended for Northern Virginia station data. The station records are obtained by linear combination of the reasonable detector records. The station records are therefore not prone to the threshold errors as the raw detector data, and therefore need not be subjected to those tests. However, the AEVL value is not a linear combination of volume, occupancy or speed and thus not guaranteed to pass at the station level even if it passed at the detector level. The AEVL test shall therefore be performed again at the station level.

Table 5. Batch Screening Tests Recommended For Northern VA Station Data
Test Name Reason and details
Test 1

No input data available

Position 0
If V=NULL and O=NULL and S=NULL, the record shall be failed.
Test 2

(VOS relation)
AEVL (Average Effective Vehicle Length)

Position 1
  • If volume, occupancy and speed are all positive numbers, then consider
  • AEVL = 10 * u * h / q where

    AEVL = Average Effective Vehicle Length (meters)
    u = speed (km/h)
    h = occupancy (%)
    q = hourly equivalent volume (vehicles/lane/hour)

  • The record shall be marked with a "SuspectRecord" code if AEVL < 2.2 or AEVL > 18 and shall not be failed. If 2.2 <= AEVL <= 18, the record shall be passed.
  • If any of volume, occupancy, speed is zero, then this test is not applicable and the situation shall be marked by "AEVLNotApplicable" code and shall not be failed.
  • Reference: Turochy and Smith (2000)
Not a test

Data estimation

Position 2
  • If not all the data from the constituent detectors pass the detector screening tests, the station shall be marked with a code of "DataInflated" and shall not be failed.
  • If all the detector of the constituent detector return data that passes the detector screening tests, the station shall be marked with a code of "DataNotInflated" and shall not be failed.

Batch screening tests recommended for NOVA station data. Further, the detector records may be failed in batch mode, even though they are passed in real-time. Therefore, the station data shall be calculated again, after the batch screening of detector data, and the station level Data Estimation code shall be recomputed and AEVL test at the station level shall be performed again.

Table 6. Codes Used To Explain Screening Tests
Code Value
Reasonable/feasible data (test passed) 1
Unreasonable/unfeasible data (test failed) 0
SuspectRecord 8
DoNotUseForCalculations 9
AEVLNotApplicable 2
DataInflated 2
DataNotInflated 3
RhovOccChanged 2

California PeMS (Performance Measurement System)

The California PeMS statewide data warehouse includes data from nine different Caltrans districts and is the most extensive data archive surveyed for this report. The PeMS quality control procedures are unique, in that the diagnostic tests include both microscopic and macroscopic data validity rules. The microscopic validity rules are similar to those used in other data archives:

  1. Vehicle flow: valid if 0 ≤ flow ≤ 260 vehicles/5-minutes
  2. Occupancy: valid if 0 ≤ flow ≤ 100%
  3. Speed: valid if 0 ≤ flow ≤ 100 mph

The macroscopic rules used in PeMS are focused on identifying detectors that are likely to have been malfunctioning for most of the day. These diagnostic tests are unique because they focus on identifying malfunctioning detectors rather than identifying erroneous data samples. The underlying principle for this approach is that each detector is fundamentally either working or broken. Each night PeMS examines the individual raw data samples over a configurable hour range throughout the day (the default hour range is 5:00am through 10:00pm). Table 7 lists the statistics gathered for mainlane (ML) and ramp (RM, flow data only) detectors.

Table 7. PeMS Macroscopic Diagnostic Statistics
Statistic Detector Types Description
diag_samples ML, RM Total number of samples received.
high_occ ML Total number of samples with occupancy greater than a user-defined threshold (default=70%).
high_flow RM Total number of samples with flow greater than a user-defined threshold (default=20veh/30-sec).
zero_occ ML Total number of samples with occupancy equal to 0.
zero_flow RM Total number of samples with flow equal to 0.
flow_occ_mismatch ML Total number of samples with occupancy greater than a user-defined threshold (default=0%) and zero flow.
repeat_occ ML The number of contiguous five-minute points with the same occupancy. This is the only test that operates on processed five-minute data. Imputed points are excluded from this test.

The sample counts are then compared against user-defined thresholds to determine if the detector is functioning correctly. Table 8 summarizes the thresholds and their default values, which are based on heavy traffic in urban areas. These default threshold values have been modified when PeMS has been used to warehouse rural traffic data.

Table 8. PeMS Default Threshold Values for Macroscopic Diagnostic Tests
Threshold Detector Types Default Setting Description
sample_pct ML, RM 60 Minimum amount of data needed for data quality tests.
high_occ_pct ML 20 Percentage of samples with occupancy higher than sig_occ for the high_occ test.
high_flow_pct RM 20 Percentage of samples with flow higher than sig_flow for the high_flow test.
zero_occ_pct ML 59 Percentage of samples with zero occupancy for zero_occ test.
zero_flow_pct RM 95 Percentage of samples with zero flow for the zero_flow test.
flow_occ_pct RM 2 Percentage of samples with flow/occ mismatch for the flow_occ_mismatch tests.
repeat_occ ML 50 Percentage of five-minute points for repeat_occ test.

Thresholds expressed as a percentage in Table 8 are normalized using the following equation.

Normalized Threshold = (% threshold / 100) × max(samples received)

The thresholds are normalized so that if the data feed fails for an hour during the day, the PeMS diagnostics will not penalize all of the other detectors. After normalizing the thresholds, PeMS applies the tests in Table 9 in the order listed. If a detector's statistics meet the test condition, PeMS marks the detector as bad and records the suspected error condition. It all tests succeed, the detector is marked as good.

Table 9. PeMS Macroscopic Diagnostic Tests
Test Detector Types Suspected Error
Diag_samples=0 ML, RM Communication down
Diag_samples < sample_pct ML, RM Insufficient data
High_occ > high_occ_pct ML High values
High_flow > high_flow_pct RM High values
Zero_occ > zero_occ_pct ML Card off
Zero_flow > zero_flow_pct RM Card off
Flow_occ_mismatch > flow_occ_pct ML Intermittent
Repeat_occ > repeat_occ (threshold) ML Constant

CATT (Center for Advanced Transportation Technology) Lab

The CATT Lab is in the process of developing a comprehensive transportation data archive for the Washington DC metropolitan area and the state of Maryland. This expanded archive will include enhancements to data validation tools and criteria.

In the current data archive that includes detector data from CHART, the quality control process has two stages:

  1. With each data packet, the CHART detectors also report status with the following possible messages:
    1. Operating Correctly
    2. Communications Failure
    3. Unknown Problem
    4. Offline
    5. Hardware Failure
  2. For those detectors that are "Operating Correctly," the subsequent data are compared to the 15 validity criteria used in the FHWA's Mobility Monitoring Program (see Turner et al. (2004) in the literature review).

Central Florida Data Warehouse

The University of Central Florida developed validity criteria to be used at several different temporal levels in the Central Florida Data Warehouse:

30-second criteria
  1. Data invalid if zero flow and non-zero speed
  2. Data invalid if non-zero flow and zero speed
  3. Data invalid if zero flow and speed, non-zero occupancy
  4. Data invalid if speeds greater than 100 mph
  5. Data invalid if occupancies greater than 100%
  6. Data invalid if flows greater than 25 vehicles (3,000 vphpl)
5-minute criteria
  1. Data invalid if average vehicle length is less than 10 feet or greater than 60 feet
24-hour criteria
  1. Data invalid if entropy of variables is less than 1 (where entropy is defined below)
    The entropy of a traffic data variable is defined as sum of the product of the probability (frequency) of the traffic data variable and the log value of the traffic data variable.
    where p(x) gives the probability of any x.

FHWA Mobility Monitoring Program

Turner et al. (2004)14 documented the 15 validity criteria used for FHWA's Mobility Monitoring Program, in which the Texas Transportation Institute and Cambridge Systematics currently compile and analyze archived traffic data from freeway detectors for congestion reporting purposes. The Program has gathered archived data from more than 30 cities. These validity criteria were presented in the literature review.

When performing more detailed data analyses in specific cities, TTI researchers also manually review various summary charts for data consistency (in addition to the automated criteria described earlier). These summary charts typically include:

  • Daily total traffic counts by lane/location;
  • Average 15-minute or 60-minute traffic count or speed profiles by time of day and lane/location (sometimes on the same chart);
  • Speed-flow relationship for all data points, by lane/location.

Manual or visual reviews of archived data are sometimes considered impractical when processing archived data for multiple cities in a "production" environment. Manual or visual reviews of average trends by location are more feasible when doing special studies for a limited number of cities and shorter time periods. These manual or visual reviews may identify calibration or other data quality issues that may not be detected by automated validity criteria.

Kentucky ADMS

The University of Kentucky is developing the Kentucky ADMS as a statewide data warehouse similar to PeMS and ADMS Virginia. This data warehouse is under development, but is planning to include freeway detector data from the ARTIMIS center in Cincinnati/Northern Kentucky and the TRIMARC center in Louisville/Southern Indiana. Preliminary design documents for the Kentucky ADMS indicate that their validity criteria will be adapted from the criteria used in the Mobility Monitoring Program (see page 3), with the exception that the maximum estimated density rule will not be used.

Phoenix RADS (Regional Archived Data Server)

The Maricopa County DOT is developing the RADS to be a regional data clearinghouse for the Phoenix metropolitan area. This regional data archive is under development, but is planning to use validity criteria that have been adapted from the criteria used in the Mobility Monitoring Program (see page 3), with the following two additions:

  • Volume counts invalid if greater than a configured minimum; and,
  • Occupancy values invalid if greater than a configured minimum.

PORTAL (Portland Oregon Regional Transportation Archive Listing)

The PORTAL system has adapted several of the validity criteria used in the Mobility Monitoring Program; however, the system developers have created conditions and thresholds for the purposes of identifying the worst malfunctioning detectors. The PORTAL system uses the criteria (consisting of conditions and thresholds) shown in Table 10 for 20-second original source data. Data quality reports are currently sent to the Oregon DOT on a monthly basis. These reports indicate the detectors which have failed the "High" threshold over the past month.

Table 10. Detector Configuration Tests in PORTAL Data Archive
Condition Threshold
Low Medium High
Count > 17 0.2% of samples 0.3% of samples 1% of samples
Occupancy > 95% 0.25% of samples 0.5% of samples 5% of samples
Speed > 100 mph 0.5% of samples 1% of samples 5% of samples
Speed < 5 mph 5% of samples 10% of samples 30% of samples
Low max occupancy Max occ < 25 Max occ < 20 Max occ = 0
Low average occupancy peaks Avg occ < 6% Avg occ < 5% Avg occ < 4%

WisTransPortal

The Wisconsin Traffic Operations and Safety (TOPS) Laboratory WisTransPortal Data Hub contains a complete archive of 5-minute traffic detector data from the Wisconsin DOT (WisDOT) Advanced Traffic Management System (ATMS). The WisTransPortal, located at the University of Wisconsin-Madison, is being developed in coordination with WisDOT Bureau of Highway Operations to support a variety of traffic operations and safety data clearinghouse and analysis applications.

The V-SPOC (Volume, Speed, and Occupancy) Application Suite was developed as a component of the WisTransPortal to provide online tools to retrieve and analyze traffic detector data from the ATMS data archive. The V-SPOC application provides access to traffic data from nearly 4,000 freeway detectors in the Milwaukee, Madison, and Wausau metropolitan regions.

The V-SPOC application provides a "Quality Assurance Report" which includes a "pre-screen test" that is used to identify invalid data based on range checking and a "data quality" report which flags missing or repeating values. The pre-screen test applies the following validity criteria to identify invalid 5-minute data:

  1. Invalid if volume, occupancy or speed is negative.
  2. Invalid if volume > 3100 or occupancy > 100
  3. Invalid if (occupancy = 0 or occupancy = 1) and volume > 580
  4. Invalid if 1 < occupancy ≤ 15 and (volume = 0 or volume > 1,400)
  5. Invalid if 15 < occupancy < 25 and (volume < 180 or volume > 2000)
  6. Invalid if occupancy > 25 and volume < 500
  7. Invalid if occupancy > 25 and speed > 80 mph

The data quality report includes the following columns:

  • Detector ID
  • Time Interval
  • Average Volume (VPH)
  • Average Speed (MPH)
  • Average Occupancy (%)
  • Maximum Volume (VPH)
  • Maximum Speed (MPH)
  • Maximum Occupancy (%)
  • Percentage of null values for Volume
  • Percentage of null values for Speed
  • Percentage of null values for Occupancy
  • Percentage of zero values for Volume
  • Percentage of zero values for Speed
  • Percentage of zero values for Occupancy
  • Percentage of repeating values for Volume
  • Percentage of repeating values for Speed
  • Percentage of repeating values for Occupancy
  • Percentage of simultaneous null values for Volume, Speed, Occupancy
  • Percentage of simultaneous zero values for Volume, Speed, Occupancy
  • Percentage of simultaneous repeating values for Volume, Speed, Occupancy

For the "percentage of repeating values" criteria, the reports are based on 5-minute detector values that have been repeating for at least four consecutive periods (20 minutes). In contrast, the "percentage of null values" and "percentage of zero values" are based on a simple count of all occurrences. The Quality Assurance Report is not fixed to any particular set of detectors or time interval. Rather, it is applied to a given user selection of detectors and time intervals and reports on the individual detectors and time intervals as well as aggregate results taken over the set of detectors and time intervals.

In 2006, TOPS Lab began sending an automatic daily email to the State Traffic Operations Center and others that summarizes the status of the detector data archive from the previous day. This daily status report categorizes data quality into one of three categories, in which the sum of these always equal 100 percent:

  1. percent data missing (null);
  2. percent data repeating, and,
  3. percent data available.

The daily email is part of a longer-term development to enhance V-SPOC with automated reporting tools that will help identify and rank data quality issues in the detector database. In addition to basic range checking, it is expected that the new tools will make use of detector status and other ATMS logging information that is stored in the WisTransPortal but is not currently available through V-SPOC.


6 Nihan, N., L.N. Jacobson, J.D. Bender, and G. Davis. Detector Data Validity, Publication WA-RD 208.1, Washington State Department of Transportation, March 1990.
7 Jacobson, L.N., N.L. Nihan, and J.D. Bender. Detecting Erroneous Loop Detector Data in a Freeway Traffic Management System. In Transportation Research Record 1287, Transportation Research Board, 1990, pp. 151-166.
8 Ishimaru, J. M. CDR User's Guide, Washington State Transportation Center (TRAC), University of Washington, Seattle, WA, March 1998.
9 Turochy, R.E. and B.L. Smith. A New Procedure for Detector Data Screening in Traffic Management Systems. In Transportation Research Record 1727, Transportation Research Board, 2000, pp. 127-131.
10 Hu, P., R. Goeltz, and R. Schmoyer. Proof of Concept of ITS as an Alternative Data Resource: A Demonstration Project of Florida and New York Data, Publication ORNL/TM-2001.247, Federal Highway Administration, September 2001.
11 Schmoyer, R., P.S. Hu, and R.T. Goeltz. Statistical Data Filtering and Aggregation to Hour Totals of Intelligent Transportation System 30-s and 5-min Vehicle Counts. In Transportation Research Record 1769, Transportation Research Board, 2001, pp. 79-86.
12 Chen, C., Kwon, J., Rice, J., Skabardonis, A., and Varaiya, P. Detecting Errors and Imputing Missing Data for Single-Loop Surveillance Systems. In Transportation Research Record 1855, Transportation Research Board, 2003, pp. 160-167.
13 A low entropy value indicates that data values are not changing much over time (in PeMS, over the period of 24 hours). Originally implemented in PeMS, the entropy criterion has since been replaced with a "consecutive identical values" criterion. The consecutive identical values criterion was easier for traffic practitioners to understand, and it provided comparable results to the entropy criterion.
14 Turner, S., R. Margiotta, and T. Lomax. Monitoring Urban Freeways in 2003: Current Conditions and Trends from Archived Operations Data. Publication FHWA-HOP-05-018, Federal Highway Administration, December 2004. Available at http://tti.tamu.edu/documents/FHWA-HOP-05-018.pdf. Accessed July 21, 2006.
15 Weijermars, W.A.M. and E.C. van Berkum. Detection of Invalid Loop Detector Data in Urban Areas. Presented at the 85th Annual Meeting of the Transportation Research Board, Washington, D.C., 2006.
16 Chen, L. and A.D. May. Traffic Detector Errors and Diagnostics. In Transportation Research Record 1132, Transportation Research Board, 1987, pp. 82-93.
17 Cleghorn, D., F. Hall, and D. Garbuion. Improved Data Screening Techniques for Freeway Traffic Management Systems, In Transportation Research Record 1320, Transportation Research Board, 1991, pp. 17-23.
18 Dailey, D.J. Improved Error Detection for Inductive Loop Sensors. Publication WA-RD 300.1, Washington State Department of Transportation, May 1993.
19 Nihan, N. Aid to Determining Freeway Metering Rates and Detecting Loop Errors. In Journal of Transportation Engineering, November/December 1997, pp. 454-458.
20 Wall, Z.R. and D.J. Daily. Algorithm for Detecting and Correcting Errors in Archived Traffic Data. In Transportation Research Record 1855, Transportation Research Board, 2003, pp. 183-190.
21 Kwon, J., C. Chen, and P. Varaiya. Statistical Methods for Detecting Spatial Configuration Errors in Traffic Surveillance Sensors. In Transportation Research Record 1870, Transportation Research Board, 2004, pp. 124-132.
22 Vanajakshi, L. and L.R. Rilett. Loop Detector Data Diagnostics Based on Conservation-of-Vehicles Principle. In Transportation Research Record 1870, Transportation Research Board, 2004, pp. 162-169.
23 Achillides, C.D. and D.M. Bullock. Performance Metrics for Freeway Sensors, Final Report. Publication FHWA/IN/JTRP-2004/37, Federal Highway Administration, December 2004.
24 Coifman, B. Using Dual Loop Speed Traps to Identify Detector Errors. In Transportation Research Record 1683, Transportation Research Board, 1999, pp. 47-58.
25 Coifman, B. and S. Dhoorjaty. Event Data-Based Traffic Detector Validation Tests. In Journal of Transportation Engineering, May/June 2004, pp.313-321.
26 Ametha, J., S. Turner, and S. Darbha. Formulation of a New Methodology to Identify Erroneous Paired Loop Detectors. IEEE Intelligent Transportation Systems Proceedings, 2001, pp. 591-596.
27 Buchheit, R.B. Vacuum: Automated Procedures for Assessing and Cleansing Civil Infrastructure Data. Department of Civil and Environmental Engineering, Carnegie Mellon University, Pittsburgh, PA, May 2002.
28 Validity criteria and other documentation for ADMS Virginia are available at http://cts.virginia.edu/nv_staflo.htm for Northern VA data and http://cts.virginia.edu/hr_stflw.htm for Hampton Roads data.

<< Previous Contents Next >>

More Information

 
 
Updated: 04/05/2011
 

FHWA
United States Department of Transportation - Federal Highway Administration