U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
202-366-4000
Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations
REPORT |
This report is an archived publication and may contain dated technical, contact, and link information |
|
Publication Number: FHWA-HRT-13-026 Date: March 2014 |
Publication Number: FHWA-HRT-13-026 Date: March 2014 |
Model calibration is an important step in the traffic simulation modeling process. The goal of this report is to provide practitioners with a statistical model calibration approach that provides a stronger link to field data and helps increase confidence in the model calibration results while remaining consistent with guidance provided in Traffic Analysis Tools.(9)
The objectives of this chapter are as follows:
This chapter also provides guidance on the model calibration methodology to perform the following:
Model Calibration Definition
Calibration is the process of systematically adjusting model parameters so that the model is able to reproduce the observed traffic conditions. The process is continued until the error between the performance measures taken from the field data and the performance measures calculated in the simulation is less than a predetermined margin of error. Once it is determined that the model does reproduce observed conditions, model calibration can focus on specific performance measures such as volume, speed, travel time, and bottleneck. It is important to note that in the model calibration process, there needs to be a tradeoff between the required precision and the available resources to collect data and conduct modeling.
Calibration Challenges
The goal of calibration is to make the model represent locally observed traffic conditions. However, since traffic may vary greatly day to day, it is not possible for one model to accurately represent all possible traffic conditions. Most simulation software are developed using a limited amount of data. The model parameter values are estimated using the limited data.
Driver behavior differs by region, and it may differ significantly from normal to non-typical days. For example, poor visibility, severe weather, incidents, presence of trucks, and pavement conditions all impact driver behavior. As a result, it is not recommended to use a model developed using data from one region to represent future traffic conditions in another region. Investment decisions made using a model that has not been calibrated to local field conditions will be flawed.
Simulation software tries to mimic driver behavior using the limited data to which the analysts have access. Given that fact, if the model is not properly calibrated, any flaws in the model will be magnified.
Calculations and procedures presented in this chapter assume the following conditions:
Guidance on the overall model calibration process is presented in Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software.(1) The overall model calibration process can be divided into the following four main steps:
Performance Measure and Critical Location Identification
During the model scoping process and the development of the data collection plan, the field measurements are determined, including speed, volume, queuing, and other congestion observations at different locations in the network. Since traffic conditions fluctuate daily, it is important to obtain field data from multiple days. The data from multiple days serve as a base in which field variations are used to determine the tolerance of error in the simulated results. For simulation of a typical day, it is preferred that data exclude incident days as well as Mondays, Fridays, and weekend days.
During model development, it is recommended that the spatial and temporal model limits extend beyond where and when the congestion in the field occurs. The statistical calibration should not necessarily include every link in the model but should focus on the critical design elements within the primary study area in the model. This report focuses on capacity and operational interchange modifications on the interstate system. Therefore, the critical elements include the mainline freeway, ramp roadways, and the crossing arterials. Priority should be given to the higher-volume roadway elements.
The number of locations selected for comparing the performance measures in field data against model outputs needs to be balanced against the quality and location of the available data, the desired level of statistical confidence, and the availability of resources. The model outputs and reporting for these statistical tests should be similar to the performance measures that will be used later on in the analysis.
The selection of the number of data days to be used should be based on an analysis of available data in terms of what data are available and cost effective to collect. In an urban area where freeway sensor data are archived and readily available, more days of data can be used. In areas where there is no surveillance and where manual or temporary data collection devices are used, the amount of data to be collected would be more resource intensive.
The use of speed data can be more of a challenge. If the speed profile is constructed from freeway sensor data, then the reliability and available days of data will be similar to the volume discussion. If the speed data are based on probe vehicle information, then the reliability will be reduced because the probe vehicle data will only be a sample of the actual traffic stream.
Determination of Strategy for Calibration
The strategy for calibration is related to the steps and model parameters that the modeler chooses to modify in order to achieve calibration. Different simulation software have their own parameters and recommended practices for adjusting the models to get the performance to match the field data.
In order for the modeler to be cost effective, there should be a strategy for approaching the model adjustments. Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software highlights the following three-step strategy for calibration:(1)
These strategies are still applicable, and this report focuses on refined statistical methods.
The purpose of conducting a statistical test on stochastic traffic simulation models is to ensure that the performance measure means across different simulation model runs differ significantly from the means of the data in the field. This statistical check requires two primary items. First, there must be a sufficient amount of field data from different days so that an acceptable margin of error in data can be determined. Second, there must be an error-free traffic simulation model that has been built to reflect the conditions in the field (i.e., the model reflects adequate spatial and temporal boundary conditions). With these two components, the following general statistical procedure can be conducted:
Analyze Field Data
Field data must be collected for multiple days. These data are initially used to establish model inputs and are then used in statistical tests. The first set of analyses is used to understand the variability in the data. The variability is addressed by calculating the margin of error, E, among different representative days, as shown in figure 14.
Figure 14. Equation. Margin of error.
Where:
E = Margin of error.
Zcritical = Critical Z statistic (for a 95 percent confidence interval, Z = 1.96).
σ = Standard deviation.
n = Sample size (number of observations).
The tolerance error percentage is calculated by dividing E by the mean of the field data, as shown in figure 15.
Figure 15. Equation. Tolerance error percentage.
Where:
e = Tolerance error percentage.
field = Mean of the field data.
Table 9 demonstrates how these two calculations are performed.
Table 9. Variability analysis of field data.
Day Number | Hourly Volume |
1 | 2,980 |
2 | 2,682 |
3 | 3,063 |
4 | 2,594 |
5 | 3,193 |
6 | 2,675 |
7 | 3,230 |
8 | 2,562 |
9 | 3,034 |
Based on the field data in table 9, the calculations are as follows:
Statistical Test 1—Determine the Minimum Required Number of Model Runs
Statistical test 1 is used to determine the minimum required number of model runs based on an error rate calculated using the procedure described in the previous subsection. For a given target level of tolerance (tolerable error determined using variability in field observations) and a given confidence level (usually a confidence level of 95 percent is selected), a minimum number of model runs is required using different random number seeds. The minimum required number of model runs is computed using the equation in figure 16.
Figure 16. Equation. Minimum number of model runs.
Where:
n = Minimum number of model runs required.
Z = Critical Z statistic (for a 95-percent confidence Interval, Z = 1.96).
σ = Standard deviation.
e = Tolerance error percentage.
model = Mean calculated on the basis of the performed model runs for the given MOE.
The minimum number of model runs should be calculated using two different performance measures, typically volume and speed, and for multiple locations. The highest resulting number of model runs should be used as the minimum required number of model runs. The example calculations show one performance measure (volume) for one location to demonstrate the statistical calculations and process. These same techniques should be performed at multiple locations and for different measures.
An example of how to calculate the minimum required number of model runs is shown in table 10. In this example, five model runs were conducted, and the results were used to perform the minimum number of runs required. The tolerance error, e, for this example was 6 percent as derived from the field data and calculation in table 9.
Table 10. Sample statistical calculation to determine the minimum number of runs.
Model Run Number | Hourly Volume |
1 | 3,591 |
2 | 3,000 |
3 | 2,655 |
4 | 3,680 |
5 | 2,720 |
Based on the field data in table 10, the calculations are as follows:
Figure 17. Equation. Number of model runs example.
The results of statistical test 1 indicate that the minimum number of model runs should be 26. As a result, 21 more model runs should be conducted before proceeding to the next step of comparing field data to model output.
Statistical Test 2—Compare Field Data to Model Output
The next statistical step is to compare the two populations (i.e., field data volume mean versus model output volume mean) by testing the following hypothesis:
Figure 18. Equation. Compare field data to model output.
Where:
field = Average (mean) of the field observations.
model = Average (mean) of output from different model runs.
σfield = Standard deviation from field observations.
σmodel = Standard deviation from the model runs.
nfield = Sample size of the field observations.
nmodel = Number of model runs with different random number seeds.
Zcalculated = Z-test of the field data and modeled data.
If Zcalculated ≥ Zcritical (1.96 for a 95 percent confidence level) or ≤ -Zcritical, reject H0.
The hypothesis test is a two-tailed Z-test based on a normal distribution. Figure 19 is a graph of the normal distribution. The area between Zcritical of ±1.96 is the do-not-reject range.
Figure 19. Graph. Normal distribution curve.
Using the same example as statistical test 1, the field data and 26 model runs are summarized in table 11 and table 12. Once all 26 model runs are conducted, the computation of the tolerance error is repeated to ensure that the margin of error is less than or equal to the desired tolerance error. The new tolerance error is 3.9 percent, which is lower than the desired 6.0 percent. Therefore, conducting 26 model runs was sufficient to satisfy the 95 percent confidence level and the 6.0 percent tolerance error.
Table 11. Sample calculation of field data.
Day Number | Hourly Volume |
1 | 2,980 |
2 | 2,682 |
3 | 3,063 |
4 | 2,594 |
5 | 3,193 |
6 | 2,675 |
7 | 3,230 |
8 | 2,562 |
9 | 3,034 |
Where:
field = 2,890.
σ = 262.4.
n = 9.
Zcritical = 1.96.
E = 172.
Table 12 . Sample calculation of model statistics.
Run Number | Modeled Data |
1 | 3,591 |
2 | 3,000 |
3 | 2,655 |
4 | 3,680 |
5 | 2,720 |
6 | 2,976 |
7 | 3,270 |
8 | 3,027 |
9 | 2,657 |
10 | 2,956 |
11 | 3,450 |
12 | 3,267 |
13 | 2,870 |
14 | 2,680 |
15 | 3,240 |
16 | 3,575 |
17 | 3,050 |
18 | 2,840 |
19 | 3,450 |
20 | 3,120 |
21 | 2,680 |
22 | 2,980 |
23 | 3,355 |
24 | 3,090 |
25 | 2,675 |
26 | 3,070 |
Where:
model = 3,074.
σ = 312.0.
n = 26.
Zcritical = 1.96.
E = 120.
Based on a comparison of the data in table 11 and table 12, Zcalculated can be determined, as shown in figure 20.
Figure 20. Equation Sample calculation of ZCalculated .
Because Zcalculated (-1.72) is not less than or equal to Zcritical (-1.96), H0 should not be rejected.
In this example, comparison of field data to the model output leads to not rejecting the null hypothesis. This means that there is insufficient evidence to conclude that the model output is significantly different than the field data. If the conclusion of the hypothesis was reject, then additional model trials using different calibration parameters would be required, along with a repeat of the hypothesis test. A complete iterative hypothesis test case study is presented in the next section.
The purpose of the case study was to test interchange improvements.
Data Collection
The field data were collected using freeway detection sensors. For the purposes of this exercise, nine different weekdays of data were selected to conduct the statistical tests. Of these 9 days of data, 1 day was selected as the typical day to build the base model. The data selected were free of any major incidents or crashes.
Build Base Model
The base model was developed according to the scope development process that identified spatial and temporal boundary limits. The model was constructed according to the procedures in the seven-step modeling process specified in Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software.(1) The model was checked for errors and was found to be free of any errors.
After the base model was completed, an initial number of model runs with different random number seeds was conducted to come up with an initial set of output MOEs. These MOEs were used to determine the minimum number of model runs.
Select Locations and Time Periods for Statistical Tests
The selection of locations and time periods for a statistical test should be based on the primary study area within the model limits and the peak periods. Within the study area, the key features that are being analyzed should be selected such as the mainline freeway and ramps. It is neither practical nor necessary to perform these rigorous statistical checks on every component of the model. For example, it is possible to select an unimportant location and have that selection drive the number of model runs too high. The goal is to ensure that there is a statistical confidence in the desired results from the models.
Analyze Model Output
The first step was a field data analysis to determine the corresponding sampling error, E. Nine days were selected for data collection, and the corresponding data were obtained. Table 13 and table 14 present the corresponding error computations.
Table 13. Error calculation of field data hourly volumes from 7:45 to 8:45 a.m.
Day | Field Volume Data | |
Mainline | Ramp | |
1 | 2,980 | 1,030 |
2 | 2,682 | 1,266 |
3 | 3,063 | 975 |
4 | 2,594 | 1,239 |
5 | 3,193 | 920 |
6 | 2,675 | 1,319 |
7 | 3,230 | 890 |
8 | 2,562 | 1,271 |
9 | 3,034 | 1,026 |
The calculations for mainline and ramp, respectively, are shown as follows:
Table 14. Calculation of modeled speed data error and variation from 7:45 to 8:45 a.m.
Day | Mainline Speed (mi/h) |
1 | 25.5 |
2 | 35.2 |
3 | 35.1 |
4 | 35.1 |
5 | 30.7 |
6 | 32.0 |
7 | 27.7 |
8 | 34.7 |
9 | 34.2 |
The calculations for mainline speeds are as follows:
After the error-free base model was developed, it was run five times with different random number seeds. The initial output was used to determine if there was an adequate number of runs.
Statistical Test 1: Determination of the Minimum Required Number of Runs
The first statistical test was used to determine whether the minimum number of model runs with different random number seeds have been satisfied. In order to conduct this test, an initial set of five runs was conducted on the error-free base model. The formula for determining the minimum number of runs as discussed in the previous section was used assuming a 95 percent confidence interval and a tolerance level determined by using the variability in the field data.
Table 15 is a summary of the data and calculations based on hourly volumes for the two locations (i.e., mainline and ramp) and speeds for the mainline location that were identified previously. Based on the analysis, it was determined that 16 model runs would be required.
Table 15. Summary table for minimum required number of model runs.
Model Run Number | Modeled Volume Data (Vehicles/h) | Modeled Speed Data (mi/h) | |
Mainline | Ramp | Mainline | |
1 | 2,980 | 1,051 | 29.1 |
2 | 3,333 | 923 | 19.6 |
3 | 2,782 | 1,265 | 23.8 |
4 | 3,273 | 875 | 23.0 |
5 | 3,583 | 935 | 25.3 |
The calculations for mainline volume, ramp volume, and mainline speed, respectively, are shown as follows:
Compare Field Data to Model Results—Trial 1
The results of the statistical test 1 (see table 15) indicate that the minimum number of model runs required for both locations and performance measurements is 16. Eleven more model runs should be conducted before proceeding to statistical test 2 to compare field data to model output. The 16 model runs are summarized in table 16. Once all 16 model runs were conducted, the computation of the tolerance error was repeated for each of the three data to ensure that the margin of error was less than or equal to the desired tolerance error. The new tolerance errors for the mainline volume, the ramp volume, and the mainline speed were 4.1 percent (desired was 6.0 percent), 6.8 percent (desired was 10.0 percent), and 7.2 percent (desired was 7.3 percent), respectively. All three tolerance errors were less than the desired. Therefore, 16 model runs were sufficient to satisfy the 95 percent confidence level and the desired tolerance errors.
Table 16. Trial 1 summary table for minimum required number of model runs.
Model Run Number | Modeled Volume Data (vehicles/h) | Modeled Speed Data (mi/h) | |
Mainline | Ramp | Mainline | |
1 | 2,980 | 1,051 | 29.1 |
2 | 3,333 | 923 | 19.6 |
3 | 2,782 | 1,265 | 23.8 |
4 | 3,273 | 875 | 23.0 |
5 | 3,583 | 935 | 25.3 |
6 | 3,122 | 1,023 | 28.6 |
7 | 3,465 | 902 | 22.8 |
8 | 2,879 | 1,155 | 20.8 |
9 | 2,865 | 879 | 26.5 |
10 | 3,045 | 978 | 18.0 |
11 | 2,765 | 925 | 23.8 |
12 | 3,346 | 1,235 | 28.7 |
13 | 2,870 | 931 | 26.7 |
14 | 2,989 | 1,010 | 23.8 |
15 | 3,455 | 1,312 | 18.9 |
16 | 3,198 | 1,102 | 22.3 |
The calculations for mainline volume, ramp volume, and mainline speed, respectively, are shown as follows:
Statistical Test 2: Hypothesis Test
The hypothesis test should be conducted on multiple locations. In order to illustrate the procedures, two locations were analyzed in the case study. Hypothesis testing is typically an iterative process involving trial and error. The following section illustrates that the initial base model results did not satisfy the hypothesis test. For the second trial, the calibration parameters were modified, and the results satisfied the second hypothesis test.
The previously selected mainline location was analyzed using the hypothesis test for hourly traffic volumes for the peak hour from 7:45 to 8:45 a.m. In table 17, the hypothesis test was rejected for model trial 1.
A second model iteration was conducted by adjusting the calibration parameters in the model. The random number seeds from model trial 1 were reused. In model trial 2, the null hypothesis was not rejected, and the results of this analysis are shown in table 18.
Table 17. Hypothesis test trial 1.
Description | Field Data | Model Results | Statistics | |||||
Mean | σf | nf | Mean | σm | nm | Zcalculated | Null Hypothesis | |
Mainline volume | 2,890 | 262.4 | 9 | 3,122 | 263.3 | 16 | -2.12 | Rejected |
Ramp volume | 1,104 | 168.2 | 9 | 1,031 | 142.7 | 16 | 1.10 | Cannot reject |
Mainline speed | 32.2 | 3.6 | 9 | 23.9 | 3.5 | 16 | 5.59 | Rejected |
σf = Standard deviation of field data. nf = Number of days field data were collected. σm = Standard deviation of model runs. nm = Number of model runs. |
Compare Field Data to Model Results—Trial 2
Since the first hypothesis test resulted in a reject conclusion for the mainline volume and speed results, a second trial was conducted. In this trial, calibration parameters were adjusted to attempt to closer match the results.
The calibration parameters that were adjusted include the following:
After these parameters were adjusted in a number of locations, the models were rerun five times, and the hypothesis test was conducted again. Table 18 is a summary of the second trial.
Table 18. Hypothesis test mainline location trial 2.
Description | Field Data | Model Results | Statistics | |||||
Mean | σf | nf | Mean | σm | nm | Zcalculated | Null Hypothesis | |
Mainline volume | 2,890 | 262.4 | 9 | 3,088 | 222.8 | 16 | -1.91 | Cannot reject |
Ramp volume | 1,104 | 168.2 | 9 | 1,200 | 121.2 | 16 | -1.51 | Cannot reject |
Mainline speed | 32.2 | 3.6 | 9 | 29.2 | 4.5 | 16 | 1.82 | Cannot reject |
σf = Standard deviation of field data. nf = Number of days field data were collected. σm = Standard deviation of model runs. nm = Number of model runs. |