U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
202-366-4000


Skip to content
Facebook iconYouTube iconTwitter iconFlickr iconLinkedInInstagram

Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations

 
REPORT
This report is an archived publication and may contain dated technical, contact, and link information
Back to Publication List        
Publication Number:  FHWA-HRT-13-026    Date:  March 2014
Publication Number: FHWA-HRT-13-026
Date: March 2014

 

Guidance on The Level of Effort Required to Conduct Traffic Analysis Using Microsimulation

CHAPTER 6. MODEL CALIBRATION

OVERVIEW

Model calibration is an important step in the traffic simulation modeling process. The goal of this report is to provide practitioners with a statistical model calibration approach that provides a stronger link to field data and helps increase confidence in the model calibration results while remaining consistent with guidance provided in Traffic Analysis Tools.(9)

The objectives of this chapter are as follows:

This chapter also provides guidance on the model calibration methodology to perform the following:

Model Calibration Definition

Calibration is the process of systematically adjusting model parameters so that the model is able to reproduce the observed traffic conditions. The process is continued until the error between the performance measures taken from the field data and the performance measures calculated in the simulation is less than a predetermined margin of error. Once it is determined that the model does reproduce observed conditions, model calibration can focus on specific performance measures such as volume, speed, travel time, and bottleneck. It is important to note that in the model calibration process, there needs to be a tradeoff between the required precision and the available resources to collect data and conduct modeling.

Calibration Challenges

The goal of calibration is to make the model represent locally observed traffic conditions. However, since traffic may vary greatly day to day, it is not possible for one model to accurately represent all possible traffic conditions. Most simulation software are developed using a limited amount of data. The model parameter values are estimated using the limited data.

Driver behavior differs by region, and it may differ significantly from normal to non-typical days. For example, poor visibility, severe weather, incidents, presence of trucks, and pavement conditions all impact driver behavior. As a result, it is not recommended to use a model developed using data from one region to represent future traffic conditions in another region. Investment decisions made using a model that has not been calibrated to local field conditions will be flawed.

Simulation software tries to mimic driver behavior using the limited data to which the analysts have access. Given that fact, if the model is not properly calibrated, any flaws in the model will be magnified.

Calculations and procedures presented in this chapter assume the following conditions:

CALIBRATION PROCESS

Guidance on the overall model calibration process is presented in Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software.(1) The overall model calibration process can be divided into the following four main steps:

  1. Identify the performance measures and critical locations for the models to be calibrated against.

  2. Determine the strategy for calibration, consistent with Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software.(1)

  3. Determine the statistical methodology to be used to compare modeled results to the field data. Identify tolerance (measure of precision) and confidence levels (measure of accuracy or variability among samples).

  4. Conduct model calibration runs following the strategy and conduct statistical checks. When statistical analysis falls within acceptable ranges, then the model is calibrated.

Performance Measure and Critical Location Identification

During the model scoping process and the development of the data collection plan, the field measurements are determined, including speed, volume, queuing, and other congestion observations at different locations in the network. Since traffic conditions fluctuate daily, it is important to obtain field data from multiple days. The data from multiple days serve as a base in which field variations are used to determine the tolerance of error in the simulated results. For simulation of a typical day, it is preferred that data exclude incident days as well as Mondays, Fridays, and weekend days.

During model development, it is recommended that the spatial and temporal model limits extend beyond where and when the congestion in the field occurs. The statistical calibration should not necessarily include every link in the model but should focus on the critical design elements within the primary study area in the model. This report focuses on capacity and operational interchange modifications on the interstate system. Therefore, the critical elements include the mainline freeway, ramp roadways, and the crossing arterials. Priority should be given to the higher-volume roadway elements.

The number of locations selected for comparing the performance measures in field data against model outputs needs to be balanced against the quality and location of the available data, the desired level of statistical confidence, and the availability of resources. The model outputs and reporting for these statistical tests should be similar to the performance measures that will be used later on in the analysis.

The selection of the number of data days to be used should be based on an analysis of available data in terms of what data are available and cost effective to collect. In an urban area where freeway sensor data are archived and readily available, more days of data can be used. In areas where there is no surveillance and where manual or temporary data collection devices are used, the amount of data to be collected would be more resource intensive.

The use of speed data can be more of a challenge. If the speed profile is constructed from freeway sensor data, then the reliability and available days of data will be similar to the volume discussion. If the speed data are based on probe vehicle information, then the reliability will be reduced because the probe vehicle data will only be a sample of the actual traffic stream.

Determination of Strategy for Calibration

The strategy for calibration is related to the steps and model parameters that the modeler chooses to modify in order to achieve calibration. Different simulation software have their own parameters and recommended practices for adjusting the models to get the performance to match the field data.

In order for the modeler to be cost effective, there should be a strategy for approaching the model adjustments. Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software highlights the following three-step strategy for calibration:(1)

  1. Calibrate to capacity.

  2. Calibrate route choice.

  3. Calibrate system performance.

These strategies are still applicable, and this report focuses on refined statistical methods.

STATISTICAL METHODOLOGY

The purpose of conducting a statistical test on stochastic traffic simulation models is to ensure that the performance measure means across different simulation model runs differ significantly from the means of the data in the field. This statistical check requires two primary items. First, there must be a sufficient amount of field data from different days so that an acceptable margin of error in data can be determined. Second, there must be an error-free traffic simulation model that has been built to reflect the conditions in the field (i.e., the model reflects adequate spatial and temporal boundary conditions). With these two components, the following general statistical procedure can be conducted:

  1. Select locations for statistical tests.

  2. Analyze field data.

    • Calculate sampling error (margin of error) of field data from multiple days of data.

    • Calculate tolerance (margin of error in percent of the mean). This percentage value will be used to determine the minimum number of model runs to perform.

  3. Analyze traffic model output.

    • Run traffic models 5–10 times with different random number seeds and conduct statistical tests on model results.

    • Conduct statistical test 1 to determine the minimum required number of model runs.

  4. Compare field data to traffic model results.

    • Conduct statistical test 2 to compare field data to traffic model results using a Z-test.

    • If the statistical relationship between modeled output and field data does not indicate a significant difference, then proceed with measures of effectiveness (MOE) comparisons, including volumes, speeds, travel times, and bottlenecks.

    • If the statistical relationships between field data and modeled output indicate a significant difference, adjust calibration parameters, rerun models, and repeat statistical test 2.

Analyze Field Data

Field data must be collected for multiple days. These data are initially used to establish model inputs and are then used in statistical tests. The first set of analyses is used to understand the variability in the data. The variability is addressed by calculating the margin of error, E, among different representative days, as shown in figure 14.

E equals Z subscript critical times open parenthesis sigma divided by the square root of n closed parenthesis.
 
Figure 14. Equation. Margin of error.

 

Where:

E = Margin of error.
Zcritical = Critical Z statistic (for a 95 percent confidence interval, Z = 1.96).
σ = Standard deviation.
n = Sample size (number of observations).

The tolerance error percentage is calculated by dividing E by the mean of the field data, as shown in figure 15.

e equals E divided by the average of X subscript field.
 
Figure 15. Equation. Tolerance error percentage.

 

Where:

e = Tolerance error percentage.
Italicized capital X with macronfield = Mean of the field data.

Table 9 demonstrates how these two calculations are performed.

Table 9. Variability analysis of field data.

Day Number Hourly Volume
1 2,980
2 2,682
3 3,063
4 2,594
5 3,193
6 2,675
7 3,230
8 2,562
9 3,034

 

Based on the field data in table 9, the calculations are as follows:

Statistical Test 1—Determine the Minimum Required Number of Model Runs

Statistical test 1 is used to determine the minimum required number of model runs based on an error rate calculated using the procedure described in the previous subsection. For a given target level of tolerance (tolerable error determined using variability in field observations) and a given confidence level (usually a confidence level of 95 percent is selected), a minimum number of model runs is required using different random number seeds. The minimum required number of model runs is computed using the equation in figure 16.

n equals open parenthesis Z closed parenthesis squared times open parenthesis sigma closed parenthesis squared divided by open parenthesis e times the average of X subscript model closed parenthesis squared.
 
Figure 16. Equation. Minimum number of model runs.

 

Where:

n = Minimum number of model runs required.
Z = Critical Z statistic (for a 95-percent confidence Interval, Z = 1.96).
σ = Standard deviation.
e = Tolerance error percentage.
Italicized capital X with macronmodel = Mean calculated on the basis of the performed model runs for the given MOE.

The minimum number of model runs should be calculated using two different performance measures, typically volume and speed, and for multiple locations. The highest resulting number of model runs should be used as the minimum required number of model runs. The example calculations show one performance measure (volume) for one location to demonstrate the statistical calculations and process. These same techniques should be performed at multiple locations and for different measures.

An example of how to calculate the minimum required number of model runs is shown in table 10. In this example, five model runs were conducted, and the results were used to perform the minimum number of runs required. The tolerance error, e, for this example was 6 percent as derived from the field data and calculation in table 9.

Table 10. Sample statistical calculation to determine the minimum number of runs.

Model Run Number Hourly Volume
1 3,591
2 3,000
3 2,655
4 3,680
5 2,720

 

Based on the field data in table 10, the calculations are as follows:

26 model runs equals open parenthesis 1.96 closed parenthesis squared times open parenthesis 481.1 closed parenthesis squared divided by open parenthesis 0.06 times 3,129 closed parenthesis squared.
 
Figure 17. Equation. Number of model runs example.

 

The results of statistical test 1 indicate that the minimum number of model runs should be 26. As a result, 21 more model runs should be conducted before proceeding to the next step of comparing field data to model output.

Statistical Test 2—Compare Field Data to Model Output

The next statistical step is to compare the two populations (i.e., field data volume mean versus model output volume mean) by testing the following hypothesis:

There are three equations presented. The first shows H subscript 0 if the average of X subscript field equals the average of X subscript model. The second shows H subscript 1 if the average of X subscript field does not equal the average of X subscript model. The third shows Z subscript calculated equals the average of X subscript field minus the average of X subscript model divided by the square root of sigma subscript field squared divided by n subscript field plus sigma subscript model squared divided by n subscript model.
 
Figure 18. Equation. Compare field data to model output.

 

Where:

Italicized capital X with macronfield = Average (mean) of the field observations.
Italicized capital X with macronmodel = Average (mean) of output from different model runs.
σfield = Standard deviation from field observations.
σmodel = Standard deviation from the model runs.
nfield = Sample size of the field observations.
nmodel = Number of model runs with different random number seeds.
Zcalculated = Z-test of the field data and modeled data.

If Zcalculated Zcritical (1.96 for a 95 percent confidence level) or ≤ -Zcritical, reject H0.

The hypothesis test is a two-tailed Z-test based on a normal distribution. Figure 19 is a graph of the normal distribution. The area between Zcritical of ±1.96 is the do-not-reject range.

This graph shows a normal distribution curve. The curve is in the shape of a bell centered over the zero deviation line. Two small blue lines extend to meet the curve at the ±1.96 points along the x-axis, thus defining the 95 percent confidence level. The area is broken into three areas: one below the -1.96 line (reject H subscript 0), one between the -1.96 and +1.96 lines (do not reject H subscript 0, which represents the 95 percent confidence level), and one above the +1.96 line (reject H subscript 0).
 
Figure 19. Graph. Normal distribution curve.

 

Using the same example as statistical test 1, the field data and 26 model runs are summarized in table 11 and table 12. Once all 26 model runs are conducted, the computation of the tolerance error is repeated to ensure that the margin of error is less than or equal to the desired tolerance error. The new tolerance error is 3.9 percent, which is lower than the desired 6.0 percent. Therefore, conducting 26 model runs was sufficient to satisfy the 95 percent confidence level and the 6.0 percent tolerance error.

Table 11. Sample calculation of field data.

Day Number Hourly Volume
1 2,980
2 2,682
3 3,063
4 2,594
5 3,193
6 2,675
7 3,230
8 2,562
9 3,034

 

Where:

Italicized capital X with macronfield = 2,890.
σ = 262.4.
n = 9.
Zcritical = 1.96.
E = 172.

Table 12 . Sample calculation of model statistics.

Run Number Modeled Data
1 3,591
2 3,000
3 2,655
4 3,680
5 2,720
6 2,976
7 3,270
8 3,027
9 2,657
10 2,956
11 3,450
12 3,267
13 2,870
14 2,680
15 3,240
16 3,575
17 3,050
18 2,840
19 3,450
20 3,120
21 2,680
22 2,980
23 3,355
24 3,090
25 2,675
26 3,070

 

Where:

Italicized capital X with macronmodel = 3,074.
σ = 312.0.
n = 26.
Zcritical = 1.96.
E = 120.

Based on a comparison of the data in table 11 and table 12, Zcalculated can be determined, as shown in figure 20.

Z subscript Calculated equals 2,890 minus 3,074 divided by the square root of 262.4 squared divided by 9 plus 312.0 squared divided by 26 equals -1.72.
 
Figure 20. Equation Sample calculation of ZCalculated .

 

Because Zcalculated (-1.72) is not less than or equal to Zcritical (-1.96), H0 should not be rejected.

In this example, comparison of field data to the model output leads to not rejecting the null hypothesis. This means that there is insufficient evidence to conclude that the model output is significantly different than the field data. If the conclusion of the hypothesis was reject, then additional model trials using different calibration parameters would be required, along with a repeat of the hypothesis test. A complete iterative hypothesis test case study is presented in the next section.

STATISTICAL METHOD EXAMPLE CASE STUDY

The purpose of the case study was to test interchange improvements.

Data Collection

The field data were collected using freeway detection sensors. For the purposes of this exercise, nine different weekdays of data were selected to conduct the statistical tests. Of these 9 days of data, 1 day was selected as the typical day to build the base model. The data selected were free of any major incidents or crashes.

Build Base Model

The base model was developed according to the scope development process that identified spatial and temporal boundary limits. The model was constructed according to the procedures in the seven-step modeling process specified in Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software.(1) The model was checked for errors and was found to be free of any errors.

After the base model was completed, an initial number of model runs with different random number seeds was conducted to come up with an initial set of output MOEs. These MOEs were used to determine the minimum number of model runs.

Select Locations and Time Periods for Statistical Tests

The selection of locations and time periods for a statistical test should be based on the primary study area within the model limits and the peak periods. Within the study area, the key features that are being analyzed should be selected such as the mainline freeway and ramps. It is neither practical nor necessary to perform these rigorous statistical checks on every component of the model. For example, it is possible to select an unimportant location and have that selection drive the number of model runs too high. The goal is to ensure that there is a statistical confidence in the desired results from the models.

Analyze Model Output

The first step was a field data analysis to determine the corresponding sampling error, E. Nine days were selected for data collection, and the corresponding data were obtained. Table 13 and table 14 present the corresponding error computations.

Table 13. Error calculation of field data hourly volumes from 7:45 to 8:45 a.m.

Day Field Volume Data
Mainline Ramp
1 2,980 1,030
2 2,682 1,266
3 3,063 975
4 2,594 1,239
5 3,193 920
6 2,675 1,319
7 3,230 890
8 2,562 1,271
9 3,034 1,026

 

The calculations for mainline and ramp, respectively, are shown as follows:

Table 14. Calculation of modeled speed data error and variation from 7:45 to 8:45 a.m.

Day Mainline Speed (mi/h)
1 25.5
2 35.2
3 35.1
4 35.1
5 30.7
6 32.0
7 27.7
8 34.7
9 34.2

 

The calculations for mainline speeds are as follows:

Analyze Model Output

After the error-free base model was developed, it was run five times with different random number seeds. The initial output was used to determine if there was an adequate number of runs.

Statistical Test 1: Determination of the Minimum Required Number of Runs

The first statistical test was used to determine whether the minimum number of model runs with different random number seeds have been satisfied. In order to conduct this test, an initial set of five runs was conducted on the error-free base model. The formula for determining the minimum number of runs as discussed in the previous section was used assuming a 95 percent confidence interval and a tolerance level determined by using the variability in the field data.

Table 15 is a summary of the data and calculations based on hourly volumes for the two locations (i.e., mainline and ramp) and speeds for the mainline location that were identified previously. Based on the analysis, it was determined that 16 model runs would be required.

Table 15. Summary table for minimum required number of model runs.

Model Run Number Modeled Volume Data (Vehicles/h) Modeled Speed Data (mi/h)
Mainline Ramp Mainline
1 2,980 1,051 29.1
2 3,333 923 19.6
3 2,782 1,265 23.8
4 3,273 875 23.0
5 3,583 935 25.3

 

The calculations for mainline volume, ramp volume, and mainline speed, respectively, are shown as follows:

Compare Field Data to Model Results—Trial 1

The results of the statistical test 1 (see table 15) indicate that the minimum number of model runs required for both locations and performance measurements is 16. Eleven more model runs should be conducted before proceeding to statistical test 2 to compare field data to model output. The 16 model runs are summarized in table 16. Once all 16 model runs were conducted, the computation of the tolerance error was repeated for each of the three data to ensure that the margin of error was less than or equal to the desired tolerance error. The new tolerance errors for the mainline volume, the ramp volume, and the mainline speed were 4.1 percent (desired was 6.0 percent), 6.8 percent (desired was 10.0 percent), and 7.2 percent (desired was 7.3 percent), respectively. All three tolerance errors were less than the desired. Therefore, 16 model runs were sufficient to satisfy the 95 percent confidence level and the desired tolerance errors.

Table 16. Trial 1 summary table for minimum required number of model runs.

Model Run Number Modeled Volume Data (vehicles/h) Modeled Speed Data (mi/h)
Mainline Ramp Mainline
1 2,980 1,051 29.1
2 3,333 923 19.6
3 2,782 1,265 23.8
4 3,273 875 23.0
5 3,583 935 25.3
6 3,122 1,023 28.6
7 3,465 902 22.8
8 2,879 1,155 20.8
9 2,865 879 26.5
10 3,045 978 18.0
11 2,765 925 23.8
12 3,346 1,235 28.7
13 2,870 931 26.7
14 2,989 1,010 23.8
15 3,455 1,312 18.9
16 3,198 1,102 22.3

 

The calculations for mainline volume, ramp volume, and mainline speed, respectively, are shown as follows:

Statistical Test 2: Hypothesis Test

The hypothesis test should be conducted on multiple locations. In order to illustrate the procedures, two locations were analyzed in the case study. Hypothesis testing is typically an iterative process involving trial and error. The following section illustrates that the initial base model results did not satisfy the hypothesis test. For the second trial, the calibration parameters were modified, and the results satisfied the second hypothesis test.

The previously selected mainline location was analyzed using the hypothesis test for hourly traffic volumes for the peak hour from 7:45 to 8:45 a.m. In table 17, the hypothesis test was rejected for model trial 1.

A second model iteration was conducted by adjusting the calibration parameters in the model. The random number seeds from model trial 1 were reused. In model trial 2, the null hypothesis was not rejected, and the results of this analysis are shown in table 18.

Table 17. Hypothesis test trial 1.

Description Field Data Model Results Statistics
Mean σf nf Mean σm nm Zcalculated Null Hypothesis
Mainline volume 2,890 262.4 9 3,122 263.3 16 -2.12 Rejected
Ramp volume 1,104 168.2 9 1,031 142.7 16 1.10 Cannot reject
Mainline speed 32.2 3.6 9 23.9 3.5 16 5.59 Rejected
σf = Standard deviation of field data.
nf = Number of days field data were collected.
σm = Standard deviation of model runs.
nm = Number of model runs.

 

Compare Field Data to Model Results—Trial 2

Since the first hypothesis test resulted in a reject conclusion for the mainline volume and speed results, a second trial was conducted. In this trial, calibration parameters were adjusted to attempt to closer match the results.

The calibration parameters that were adjusted include the following:

After these parameters were adjusted in a number of locations, the models were rerun five times, and the hypothesis test was conducted again. Table 18 is a summary of the second trial.

Table 18. Hypothesis test mainline location trial 2.

Description Field Data Model Results Statistics
Mean σf nf Mean σm nm Zcalculated Null Hypothesis
Mainline volume 2,890 262.4 9 3,088 222.8 16 -1.91 Cannot reject
Ramp volume 1,104 168.2 9 1,200 121.2 16 -1.51 Cannot reject
Mainline speed 32.2 3.6 9 29.2 4.5 16 1.82 Cannot reject
σf = Standard deviation of field data.
nf = Number of days field data were collected.
σm = Standard deviation of model runs.
nm = Number of model runs.

 

Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000
Turner-Fairbank Highway Research Center | 6300 Georgetown Pike | McLean, VA | 22101