REPORT

This report is an archived publication and may contain dated technical, contact, and link information

Top
< Prev
Main
1
2
3
4
5
6
7
8
9
Next >
>>

Publication Number: FHWA-HRT-13-026 Date: March 2014

Publication Number: FHWA-HRT-13-026
Date: March 2014

Guidance on The Level of Effort Required to Conduct Traffic Analysis Using Microsimulation

CHAPTER 6. MODEL CALIBRATION

OVERVIEW

Model calibration is an important step in the traffic simulation modeling process. The goal of this report is to provide practitioners with a statistical model calibration approach that provides a stronger link to field data and helps increase confidence in the model calibration results while remaining consistent with guidance provided in Traffic Analysis Tools.⁽⁹⁾

The objectives of this chapter are as follows:

To provide a step-by-step model calibration process and framework.
To provide a recommended statistical method for calibration (based on using field data from different days as a basis for determining tolerance).
To provide an example of this statistical method using a real case study.

This chapter also provides guidance on the model calibration methodology to perform the following:

Determine the minimum required number of model runs.
Determine whether the model data replicate the performance measures observed in the field.

Model Calibration Definition

Calibration is the process of systematically adjusting model parameters so that the model is able to reproduce the observed traffic conditions. The process is continued until the error between the performance measures taken from the field data and the performance measures calculated in the simulation is less than a predetermined margin of error. Once it is determined that the model does reproduce observed conditions, model calibration can focus on specific performance measures such as volume, speed, travel time, and bottleneck. It is important to note that in the model calibration process, there needs to be a tradeoff between the required precision and the available resources to collect data and conduct modeling.

Calibration Challenges

The goal of calibration is to make the model represent locally observed traffic conditions. However, since traffic may vary greatly day to day, it is not possible for one model to accurately represent all possible traffic conditions. Most simulation software are developed using a limited amount of data. The model parameter values are estimated using the limited data.

Driver behavior differs by region, and it may differ significantly from normal to non-typical days. For example, poor visibility, severe weather, incidents, presence of trucks, and pavement conditions all impact driver behavior. As a result, it is not recommended to use a model developed using data from one region to represent future traffic conditions in another region. Investment decisions made using a model that has not been calibrated to local field conditions will be flawed.

Simulation software tries to mimic driver behavior using the limited data to which the analysts have access. Given that fact, if the model is not properly calibrated, any flaws in the model will be magnified.

Calculations and procedures presented in this chapter assume the following conditions:

Traffic volume and speed data have been adequately scrubbed, and there are no internal inconsistencies.
There is adequate spatial coverage and consistency in detector data.
There is adequate and accurate sampling of speed data.
Speed performance measures are calculated in consistent ways in the field and in the model.

CALIBRATION PROCESS

Guidance on the overall model calibration process is presented in Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software.⁽¹⁾ The overall model calibration process can be divided into the following four main steps:

Identify the performance measures and critical locations for the models to be calibrated against.
Determine the strategy for calibration, consistent with Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software.⁽¹⁾
Determine the statistical methodology to be used to compare modeled results to the field data. Identify tolerance (measure of precision) and confidence levels (measure of accuracy or variability among samples).
Conduct model calibration runs following the strategy and conduct statistical checks. When statistical analysis falls within acceptable ranges, then the model is calibrated.

Performance Measure and Critical Location Identification

During the model scoping process and the development of the data collection plan, the field measurements are determined, including speed, volume, queuing, and other congestion observations at different locations in the network. Since traffic conditions fluctuate daily, it is important to obtain field data from multiple days. The data from multiple days serve as a base in which field variations are used to determine the tolerance of error in the simulated results. For simulation of a typical day, it is preferred that data exclude incident days as well as Mondays, Fridays, and weekend days.

During model development, it is recommended that the spatial and temporal model limits extend beyond where and when the congestion in the field occurs. The statistical calibration should not necessarily include every link in the model but should focus on the critical design elements within the primary study area in the model. This report focuses on capacity and operational interchange modifications on the interstate system. Therefore, the critical elements include the mainline freeway, ramp roadways, and the crossing arterials. Priority should be given to the higher-volume roadway elements.

The number of locations selected for comparing the performance measures in field data against model outputs needs to be balanced against the quality and location of the available data, the desired level of statistical confidence, and the availability of resources. The model outputs and reporting for these statistical tests should be similar to the performance measures that will be used later on in the analysis.

The selection of the number of data days to be used should be based on an analysis of available data in terms of what data are available and cost effective to collect. In an urban area where freeway sensor data are archived and readily available, more days of data can be used. In areas where there is no surveillance and where manual or temporary data collection devices are used, the amount of data to be collected would be more resource intensive.

The use of speed data can be more of a challenge. If the speed profile is constructed from freeway sensor data, then the reliability and available days of data will be similar to the volume discussion. If the speed data are based on probe vehicle information, then the reliability will be reduced because the probe vehicle data will only be a sample of the actual traffic stream.

Determination of Strategy for Calibration

The strategy for calibration is related to the steps and model parameters that the modeler chooses to modify in order to achieve calibration. Different simulation software have their own parameters and recommended practices for adjusting the models to get the performance to match the field data.

In order for the modeler to be cost effective, there should be a strategy for approaching the model adjustments. Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software highlights the following three-step strategy for calibration:⁽¹⁾

Calibrate to capacity.
Calibrate route choice.
Calibrate system performance.

These strategies are still applicable, and this report focuses on refined statistical methods.

STATISTICAL METHODOLOGY

The purpose of conducting a statistical test on stochastic traffic simulation models is to ensure that the performance measure means across different simulation model runs differ significantly from the means of the data in the field. This statistical check requires two primary items. First, there must be a sufficient amount of field data from different days so that an acceptable margin of error in data can be determined. Second, there must be an error-free traffic simulation model that has been built to reflect the conditions in the field (i.e., the model reflects adequate spatial and temporal boundary conditions). With these two components, the following general statistical procedure can be conducted:

Select locations for statistical tests.
Analyze field data.
- Calculate sampling error (margin of error) of field data from multiple days of data.
- Calculate tolerance (margin of error in percent of the mean). This percentage value will be used to determine the minimum number of model runs to perform.
Analyze traffic model output.
- Run traffic models 5–10 times with different random number seeds and conduct statistical tests on model results.
- Conduct statistical test 1 to determine the minimum required number of model runs.
Compare field data to traffic model results.
- Conduct statistical test 2 to compare field data to traffic model results using a Z-test.
- If the statistical relationship between modeled output and field data does not indicate a significant difference, then proceed with measures of effectiveness (MOE) comparisons, including volumes, speeds, travel times, and bottlenecks.
- If the statistical relationships between field data and modeled output indicate a significant difference, adjust calibration parameters, rerun models, and repeat statistical test 2.

Analyze Field Data

Field data must be collected for multiple days. These data are initially used to establish model inputs and are then used in statistical tests. The first set of analyses is used to understand the variability in the data. The variability is addressed by calculating the margin of error, E, among different representative days, as shown in figure 14.

Figure 14. Equation. Margin of error.

Where:

E = Margin of error.
Z_critical = Critical Z statistic (for a 95 percent confidence interval, Z = 1.96).
σ = Standard deviation.
n = Sample size (number of observations).

The tolerance error percentage is calculated by dividing E by the mean of the field data, as shown in figure 15.

Figure 15. Equation. Tolerance error percentage.

Where:

e = Tolerance error percentage.
_field = Mean of the field data.

Table 9 demonstrates how these two calculations are performed.

Table 9. Variability analysis of field data.

Day Number	Hourly Volume
1	2,980
2	2,682
3	3,063
4	2,594
5	3,193
6	2,675
7	3,230
8	2,562
9	3,034

Based on the field data in table 9, the calculations are as follows:

_field= 2,890.
σ = 262.4.
n = 9.
Z_critical = 1.96.
E = 172.
e = 6 percent.

Statistical Test 1—Determine the Minimum Required Number of Model Runs

Statistical test 1 is used to determine the minimum required number of model runs based on an error rate calculated using the procedure described in the previous subsection. For a given target level of tolerance (tolerable error determined using variability in field observations) and a given confidence level (usually a confidence level of 95 percent is selected), a minimum number of model runs is required using different random number seeds. The minimum required number of model runs is computed using the equation in figure 16.

Figure 16. Equation. Minimum number of model runs.

Where:

n = Minimum number of model runs required.
Z = Critical Z statistic (for a 95-percent confidence Interval, Z = 1.96).
σ = Standard deviation.
e = Tolerance error percentage.
_model = Mean calculated on the basis of the performed model runs for the given MOE.

The minimum number of model runs should be calculated using two different performance measures, typically volume and speed, and for multiple locations. The highest resulting number of model runs should be used as the minimum required number of model runs. The example calculations show one performance measure (volume) for one location to demonstrate the statistical calculations and process. These same techniques should be performed at multiple locations and for different measures.

An example of how to calculate the minimum required number of model runs is shown in table 10. In this example, five model runs were conducted, and the results were used to perform the minimum number of runs required. The tolerance error, e, for this example was 6 percent as derived from the field data and calculation in table 9.

Table 10. Sample statistical calculation to determine the minimum number of runs.

Model Run Number	Hourly Volume
1	3,591
2	3,000
3	2,655
4	3,680
5	2,720

Based on the field data in table 10, the calculations are as follows:

_model = 3,129.
σ = 481.1.
n = 5.
Z_critical = 1.96.
E = 172.
e = 6 percent.
Number of model runs = 26, as calculated using the equation in figure 17.

Figure 17. Equation. Number of model runs example.

The results of statistical test 1 indicate that the minimum number of model runs should be 26. As a result, 21 more model runs should be conducted before proceeding to the next step of comparing field data to model output.

Statistical Test 2—Compare Field Data to Model Output

The next statistical step is to compare the two populations (i.e., field data volume mean versus model output volume mean) by testing the following hypothesis:

Figure 18. Equation. Compare field data to model output.

Where:

_field = Average (mean) of the field observations.
_model = Average (mean) of output from different model runs.
σ_field = Standard deviation from field observations.
σ_model = Standard deviation from the model runs.
n_field = Sample size of the field observations.
n_model = Number of model runs with different random number seeds.
Z_calculated = Z-test of the field data and modeled data.

If Z_calculated ≥ Z_critical (1.96 for a 95 percent confidence level) or ≤ -Z_critical, reject H₀.

The hypothesis test is a two-tailed Z-test based on a normal distribution. Figure 19 is a graph of the normal distribution. The area between Z_critical of ±1.96 is the do-not-reject range.

Figure 19. Graph. Normal distribution curve.

Using the same example as statistical test 1, the field data and 26 model runs are summarized in table 11 and table 12. Once all 26 model runs are conducted, the computation of the tolerance error is repeated to ensure that the margin of error is less than or equal to the desired tolerance error. The new tolerance error is 3.9 percent, which is lower than the desired 6.0 percent. Therefore, conducting 26 model runs was sufficient to satisfy the 95 percent confidence level and the 6.0 percent tolerance error.

Table 11. Sample calculation of field data.

Day Number	Hourly Volume
1	2,980
2	2,682
3	3,063
4	2,594
5	3,193
6	2,675
7	3,230
8	2,562
9	3,034

Where:

_field = 2,890.
σ = 262.4.
n = 9.
Z_critical = 1.96.
E = 172.

Table 12 . Sample calculation of model statistics.

Run Number	Modeled Data
1	3,591
2	3,000
3	2,655
4	3,680
5	2,720
6	2,976
7	3,270
8	3,027
9	2,657
10	2,956
11	3,450
12	3,267
13	2,870
14	2,680
15	3,240
16	3,575
17	3,050
18	2,840
19	3,450
20	3,120
21	2,680
22	2,980
23	3,355
24	3,090
25	2,675
26	3,070

Where:

_model = 3,074.
σ = 312.0.
n = 26.
Z_critical = 1.96.
E = 120.

Based on a comparison of the data in table 11 and table 12, Z_calculated can be determined, as shown in figure 20.

Z subscript Calculated equals 2,890 minus 3,074 divided by the square root of 262.4 squared divided by 9 plus 312.0 squared divided by 26 equals -1.72.

Figure 20. Equation Sample calculation of Z_Calculated.

Because Z_calculated (-1.72) is not less than or equal to Z_critical (-1.96), H₀ should not be rejected.

In this example, comparison of field data to the model output leads to not rejecting the null hypothesis. This means that there is insufficient evidence to conclude that the model output is significantly different than the field data. If the conclusion of the hypothesis was reject, then additional model trials using different calibration parameters would be required, along with a repeat of the hypothesis test. A complete iterative hypothesis test case study is presented in the next section.

STATISTICAL METHOD EXAMPLE CASE STUDY

The purpose of the case study was to test interchange improvements.

Data Collection

The field data were collected using freeway detection sensors. For the purposes of this exercise, nine different weekdays of data were selected to conduct the statistical tests. Of these 9 days of data, 1 day was selected as the typical day to build the base model. The data selected were free of any major incidents or crashes.

Build Base Model

The base model was developed according to the scope development process that identified spatial and temporal boundary limits. The model was constructed according to the procedures in the seven-step modeling process specified in Traffic Analysis Toolbox Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software.⁽¹⁾ The model was checked for errors and was found to be free of any errors.

After the base model was completed, an initial number of model runs with different random number seeds was conducted to come up with an initial set of output MOEs. These MOEs were used to determine the minimum number of model runs.

Select Locations and Time Periods for Statistical Tests

The selection of locations and time periods for a statistical test should be based on the primary study area within the model limits and the peak periods. Within the study area, the key features that are being analyzed should be selected such as the mainline freeway and ramps. It is neither practical nor necessary to perform these rigorous statistical checks on every component of the model. For example, it is possible to select an unimportant location and have that selection drive the number of model runs too high. The goal is to ensure that there is a statistical confidence in the desired results from the models.

Analyze Model Output

The first step was a field data analysis to determine the corresponding sampling error, E. Nine days were selected for data collection, and the corresponding data were obtained. Table 13 and table 14 present the corresponding error computations.

Table 13. Error calculation of field data hourly volumes from 7:45 to 8:45 a.m.

Day	Field Volume Data
Day	Mainline	Ramp
1	2,980	1,030
2	2,682	1,266
3	3,063	975
4	2,594	1,239
5	3,193	920
6	2,675	1,319
7	3,230	890
8	2,562	1,271
9	3,034	1,026

The calculations for mainline and ramp, respectively, are shown as follows:

_field = 2,890 and 1,104.
σ = 262.4 and 168.2.
n = 9 and 9.
Z_critical = 1.96 and 1.96.
E = 172 and 110.
e = 6 and 10 percent.

Table 14. Calculation of modeled speed data error and variation from 7:45 to 8:45 a.m.

Day	Mainline Speed (mi/h)
1	25.5
2	35.2
3	35.1
4	35.1
5	30.7
6	32.0
7	27.7
8	34.7
9	34.2

The calculations for mainline speeds are as follows:

_model = 32.2.
σ = 3.6.
n = 9.
Z_critical = 1.96.
E = 2.35.
e = 7.3 percent.

Analyze Model Output

After the error-free base model was developed, it was run five times with different random number seeds. The initial output was used to determine if there was an adequate number of runs.

Statistical Test 1: Determination of the Minimum Required Number of Runs

The first statistical test was used to determine whether the minimum number of model runs with different random number seeds have been satisfied. In order to conduct this test, an initial set of five runs was conducted on the error-free base model. The formula for determining the minimum number of runs as discussed in the previous section was used assuming a 95 percent confidence interval and a tolerance level determined by using the variability in the field data.

Table 15 is a summary of the data and calculations based on hourly volumes for the two locations (i.e., mainline and ramp) and speeds for the mainline location that were identified previously. Based on the analysis, it was determined that 16 model runs would be required.

Table 15. Summary table for minimum required number of model runs.

Model Run Number	Modeled Volume Data (Vehicles/h)		Modeled Speed Data (mi/h)
Model Run Number	Mainline	Ramp	Mainline
1	2,980	1,051	29.1
2	3,333	923	19.6
3	2,782	1,265	23.8
4	3,273	875	23.0
5	3,583	935	25.3

The calculations for mainline volume, ramp volume, and mainline speed, respectively, are shown as follows:

Mean = 3,110, 1,010, and 24.2.
σ = 227.0, 156.6, and 3.5.
n = 5, 5, and 5.
Z_critical = 1.96, 1.96, and 1.96.
e = 6, 10, and 7.3 percent.
Minimum number of runs = 6, 10, and 16.

Compare Field Data to Model Results—Trial 1

The results of the statistical test 1 (see table 15) indicate that the minimum number of model runs required for both locations and performance measurements is 16. Eleven more model runs should be conducted before proceeding to statistical test 2 to compare field data to model output. The 16 model runs are summarized in table 16. Once all 16 model runs were conducted, the computation of the tolerance error was repeated for each of the three data to ensure that the margin of error was less than or equal to the desired tolerance error. The new tolerance errors for the mainline volume, the ramp volume, and the mainline speed were 4.1 percent (desired was 6.0 percent), 6.8 percent (desired was 10.0 percent), and 7.2 percent (desired was 7.3 percent), respectively. All three tolerance errors were less than the desired. Therefore, 16 model runs were sufficient to satisfy the 95 percent confidence level and the desired tolerance errors.

Table 16. Trial 1 summary table for minimum required number of model runs.

Model Run Number	Modeled Volume Data (vehicles/h)		Modeled Speed Data (mi/h)
Model Run Number	Mainline	Ramp	Mainline
1	2,980	1,051	29.1
2	3,333	923	19.6
3	2,782	1,265	23.8
4	3,273	875	23.0
5	3,583	935	25.3
6	3,122	1,023	28.6
7	3,465	902	22.8
8	2,879	1,155	20.8
9	2,865	879	26.5
10	3,045	978	18.0
11	2,765	925	23.8
12	3,346	1,235	28.7
13	2,870	931	26.7
14	2,989	1,010	23.8
15	3,455	1,312	18.9
16	3,198	1,102	22.3

The calculations for mainline volume, ramp volume, and mainline speed, respectively, are shown as follows:

Mean = 3,122, 1,031, and 23.9.
σ = 263.3, 1,427, and 3.5.
n = 16, 16, and 16.
Z_critical = 1.96, 1.96, and 1.96.
e = 4.1, 6.8, and 7.2 percent.

Statistical Test 2: Hypothesis Test

The hypothesis test should be conducted on multiple locations. In order to illustrate the procedures, two locations were analyzed in the case study. Hypothesis testing is typically an iterative process involving trial and error. The following section illustrates that the initial base model results did not satisfy the hypothesis test. For the second trial, the calibration parameters were modified, and the results satisfied the second hypothesis test.

The previously selected mainline location was analyzed using the hypothesis test for hourly traffic volumes for the peak hour from 7:45 to 8:45 a.m. In table 17, the hypothesis test was rejected for model trial 1.

A second model iteration was conducted by adjusting the calibration parameters in the model. The random number seeds from model trial 1 were reused. In model trial 2, the null hypothesis was not rejected, and the results of this analysis are shown in table 18.

Table 17. Hypothesis test trial 1.

Description	Field Data			Model Results			Statistics
Description	Mean	σ_f	*n_f*	Mean	σ_m	*n_m*	*Z_calculated*	Null Hypothesis
Mainline volume	2,890	262.4	9	3,122	263.3	16	-2.12	Rejected
Ramp volume	1,104	168.2	9	1,031	142.7	16	1.10	Cannot reject
Mainline speed	32.2	3.6	9	23.9	3.5	16	5.59	Rejected
σ_f = Standard deviation of field data. n_f = Number of days field data were collected. σ_m = Standard deviation of model runs. n_m = Number of model runs.

Compare Field Data to Model Results—Trial 2

Since the first hypothesis test resulted in a reject conclusion for the mainline volume and speed results, a second trial was conducted. In this trial, calibration parameters were adjusted to attempt to closer match the results.

The calibration parameters that were adjusted include the following:

Car-following sensitivity factors were adjusted on key mainline locations.
Warning signs for exits and major ramp locations were adjusted (where drivers begin to consider changing lanes to achieve a desired destination).
Free-flow speeds were increased at the boundary locations where traffic is free flow and uncongested.

After these parameters were adjusted in a number of locations, the models were rerun five times, and the hypothesis test was conducted again. Table 18 is a summary of the second trial.

Table 18. Hypothesis test mainline location trial 2.

Description	Field Data			Model Results			Statistics
Description	Mean	σ_f	*n_f*	Mean	σ_m	*n_m*	*Z_calculated*	Null Hypothesis
Mainline volume	2,890	262.4	9	3,088	222.8	16	-1.91	Cannot reject
Ramp volume	1,104	168.2	9	1,200	121.2	16	-1.51	Cannot reject
Mainline speed	32.2	3.6	9	29.2	4.5	16	1.82	Cannot reject
σ_f = Standard deviation of field data. n_f = Number of days field data were collected. σ_m = Standard deviation of model runs. n_m = Number of model runs.

Page Owner: Office of Research, Development, and Technology, Office of Operations, RDT

Topics: research, operations, intelligent transportation systems, ITS
Keywords: research, operations, intelligent transportation systems, ITS, Microsimulation, Modeling, Traffic analysis tools, Operations, Base model, Model calibration, Data collection, Analysis, Results, Statistical methodology, Alternative analysis
TRT Terms: research, Communication and control, Telematics, Intelligent transportation systems
Scheduled Update: Archive - No Update needed

This page last modified on 05/22/2014