Skip to content U.S. Department of Transportation/Federal Highway AdministrationU.S. Department of Transportation/Federal Highway Administration
Office of Planning, Environment, & Realty (HEP)

A Freight Analysis and Planning Model

3. Task 2.1 - Model Updating

This Chapter described the model updating process. Model updating is a major task in regional transportation planning. Regional plans are typically generated every four years, and each update requires a new baseline year. The more data that must be collected via surveys or other special methods, the more costly and time consuming the process becomes. As government at all levels becomes more fiscally constrained, less data are collected on a regular basis, and major data collection efforts such as travel surveys are conducted less often. The cumulative effect is less reliable transportation forecasting and planning. Any methods for using existing data sources are therefore worth pursuing.

The model updating process includes collecting data from all the data sources for a new target year, and then combining the data to generate the various flows. After surveying the availability of the various data sources, we set our target year as 2007, the most recent available for some of the key data sources.

3.1 Intraregional Flows

The two data sources for generating intraregional flows are the IMPLAN input/output data and the SCAG employment data. IMPLAN is commercially available and is updated annually. We purchased the 2007 IMPLAN data for the 5 county Los Angeles Region. IMPLAN provides county level inter-industry flows by 509 IMPLAN sectors. It also provides state and national foreign imports and exports. In our previous research, we evaluated the quality and reliability of alternative data sources, and concluded that the IMPLAN totals come closest to other corroborating data sources (e.g. total imports and exports as reported by the Department of Commerce). We therefore assume the IMPLAN county totals to be the "true totals", and adjust all other data sources to be consistent with IMPLAN. IMPLAN data cannot be used directly, because it includes some import and export transactions. These are factored out.

SCAG generates small area employment data from state employment and tax records. The finest version of the data is employment by establishment (located by latitude and longitude), with employment categorized by 3 digit NAICS code. These data are not available to the public; we were able to obtain the 2007 version by special request. We aggregate the employment data into TAZs, the spatial unit of analysis. ESRI's ArcGIS software package and an ArcGIS freeware called Hawth's Tool were used to aggregate point-level data by TAZ. In 2007 the SCAG region has 4190 TAZs (including virtual TAZs such as regional exit/entry TAZs).

Table 3-1 lists the main data sources for intraregional flows. No changes were made in the structure of the IMPLAN data. The SCAG employment data changed coding system from SIC to NAICS. The shape files for TAZs were also provided by SCAG.

Table 3-1: Main data sources for intraregional flows

Source Code system Year Changes
SCAG 3 digit NAICS 2007 Previous 2000 data in SIC codes

demand. employment data provided the basis for disaggregating the IMPLAN county level supply and demand to TAZ level supply andThe SCAG employment data provides the basis for disaggregating the IMPLAN county level supply and demand to TAZ level supply and demand. We use employment by sector as the measure of each zone's production and consumption share of sector activity, taking into account the I/O inter-industry demand and supply coefficients. These calculations take place within the Argos planner.

It was noted in Chapter 1 that data sources utilize a variety of industry code systems, and there is no universal conversion system. A team of researchers at USC developed the "USC" coding system to be able to convert from one code system to another. The USC system has 47 codes, 29 commodity codes and 18 non-commodity codes. Bridge tables have been developed to convert SCTG, SIC, NAICS, SITC and HS codes to USC codes. As part of the NSF funded research, these bridge tables were used to create a web service that automatically converts data from one code system to another.[5]

How detailed should the industry sector data be? Although many of our data sources have highly detailed sector data, the question is how much detail is needed to make reasonable estimates of truck flows? For example, commodities differ in their value to weight ratio, so value per weight unit must be taken into account. The more we aggregate commodity categories, the more the variation across commodity categories is lost. We are not aware of any research that considers the effects of different commodity categories, and it is beyond the tasks of this project to explore the question. In the first version of Argos, we were constrained by the CFS data, which provided the most disaggregate flow data in one digit SCTG codes. We retained this structure in the update, but at later steps converted all data to the USC code system to take advantage of the more detailed data available from IMPLAN and the SCAG employment data. These conversions are done within the Argos planner.

3.2 Interregional Flows

Referring back to Figure 1-1, it can be seen that the generation of interregional flows is far more complicated than the generation of intraregional flows. The primary data sources for interregional flows are IMPLAN and CFS. However, neither is structured to provide the flow data we need. The logic of constructing the interregional flows is as follows.

We divide the world into three regions, the metropolitan area (in this case the Los Angeles CMSA, referred to as LA), the rest of the US (US) and the rest of the world (W). Four commodity flows with an origin or destination in LA are possible, as illustrated in Figure 3-1. Transshipments are captured by combinations of the four flows; for example import cargo arriving at the LA ports for consumption in Iowa would be included in "W2LA" + "LA2US." We must identify these four flows in order to obtain a complete accounting of all import/export flows for the region. Because of the way commodity flows are reported in IMPLAN and CFS, we must also consider the internal (intraregional) flow, "LA2LA", in order to maintain sum totals consistent with the IMPLAN data. Note that "LA2LA" corresponds to the intraregional flows discussed above.

None of the flows in Figure 3-1 can be obtained directly from IMPLAN or CFS. The CFS data has "Total Outbound" (LA2LA + LA2US) and "Total Inbound (LA2LA + US2LA) flows by mode and sector. IMPLAN has various import/export totals by sector that can be used to derive the five flows, but has no mode data. We therefore use IMPLAN to derive the five flows, CFS to assign flows to mode for US based flows, and WCUS and WISERTrade to assign modes to world based flows. The formulae for computing the five flows are given in Giuliano et al, 2008.

LA2LA = LA to LA

LA2US = LA to US

LA2W = LA to rest of world

US2LA = US to LA

W2LA = rest of world to LA

Figure 3-1. Conceptual Freight Flows

This figures s is a graphical representation of how freight flows can be classified into five groups:

In order to calculate the five flows, we aggregate the IMPLAN 2007 information to the 9 SCTG commodity sectors. Note that only the modal shares are taken from CFS, as we use IMPLAN to generate LA2LA, LA2US, and US2LA. The rest of world flows are calculated from IMPLAN and WISERTrade. We estimate a total of 45 different flows: each of the 5 flows for each of 9 sectors. These flows are in annual dollars.

Once the flows have been estimated, we use CFS to proportionately allocate flows to air, water, rail and truck. Because CFS data is categorized as either "inbound" or "outbound," and because the LA2LA flow is embedded in both inbound and outbound data, we back out the LA2LA portion and assume it is all truck, use the CFS mode proportions for LA2US and US2LA (and WISERTrade where CFS data are not available), and use WISERTrade for the rest of world flow modes. The result of these computations is five 9x4 matrices of proportions corresponding to the five trade flows in Figure 3-1.

Table 3-2 lists the data sources used for generating interregional flows. Some data sources were improved, making the process of compiling data for the Argos planner somewhat easier. As noted above, IMPLAN is used as the source of "true values," and the five flows are derived from IMPLAN. CFS is used for assigning most of the mode shares. Although drawn from a smaller sample in 2007, there is a little more data at the LACMSA level, allowing us to replace some of the state level data with CMSA level data. WISERTrade is used for export and import values in dollars for air and water. Because the WISERTrade data has been expanded, we no longer need WCUS or special sources for airport data. The Intermodal Transportation Management System (ITMS) data have not been updated (the 2002 edition offers GIS files of the previous data). This source is used only to allocate interregional truck flow shares to the freeway entry/exit points.

Table 3-2: Data sources for interregional flows

Source Code System Year How Used Changes
IMPLAN IMPLAN 2007 Flow totals by sector None
CFS SCTG 2007 Mode shares Smaller sampleAdds air, water outbound freight $$ for LACMSA
WISERTrade HS, SITG 2007 Export or import values in tons or dollar for air and water modes Adds flows in tons by HS code for ports
ITMS None 2002 Flow shares in dollar and ton among different entry/points and rail yards Never updated

3.3 Transportation System Data

Our transportation network data was provided by SCAG. We used the network (link) and node files in the TransCAD format from the 2007 SCAG Regional Travel model. These files were built for 4191 TAZs in six counties in the SCAG region. Our 2001 Argos flows were for 3191 TAZs in five counties in the SCAG region. In order to preserve consistency with the prior 2001 model applications, we retained the 3191 TAZ geography. We therefore adjusted the TransCAD network to 3191 TAZs by adjusting the zone connectors.

3.4 Argos Planner Updates

We conducted a full review of the Argos Planner to verify all computational elements and data sources of the workflow. In doing so we found one error in the operator that computed the intraregional demand due to an ambiguous interpretation of supply and demand coefficients of formulas (1) and (2) in (Giuliano et al., 2008). We corrected the error in the updated Argos planner.

One of the challenges of using secondary data sources to generate freight flows is the difference in units and codes across the data sources. In section 3.1 we discussed our conversions across the various industry sector codes. Here we provide a simplified overview of Argos Planner and explain how the intraregional and interregional flows are generated in a form compatible with the TransCAD file structures, The complete workflows are given in Appendix A. Figure 3-2 summarizes the generation of intraregional flows. The two main data sources are IMPLAN (dollars in IMPLAN sectors) and TAZ level employment (jobs in NAICS). We convert both to USC codes. Employment is used to allocate supply and demand by industry sector across the TAZs. In our previous work, dollar flows were allocated, and the end result was a set of productions and attractions in dollars. In this research, we have added the conversion from dollars to PCEs, as shown in Figure 3-2. The conversion accounts for sector specific dollars to tons to trucks relationships. Now Argos generates productions and attractions in PCEs by USC code. Note that only the commodity codes are allocated, as these represent the physical flows to be modeled. The result of this process is Ps and As by sector (n=29) by TAZ (n=3191).

Figure 3-2: Argos Planner: Intraregional Portion

This figure is a flowchart of the intraregional part of the Argos planner. The flowchart is described in the text. The outpus of the intraregional workflow are 2 matrices: productions by USC code and TAZ, and attractions by USC code and TAZ.

Figure 3-3 shows the interregional portion of Argos Planner. The four data sources are shown in the top row of boxes. As discussed above, each has data in different units. Here we convert to single digit SCTG codes, and, as described earlier, allocate import and export flows to modes. Air and water flows are allocated to the airports and ports, and rail flows are allocated to the major rail centers (based on volume shares). We have a total of 22 entry/exit nodes: 12 highway, 2 port, 5 airport, and 3 rail. Every export and import has an origin or destination inside the region. Thus for air, water, and rail there is an intraregional portion of the flow that we assume is by truck. Once all flows are allocated to modes, the resulting truck trips are converted to PCEs.

The distribution of imports/exports to/from points within the region has been added to Argos Planner. The distribution is based on the relative attraction/production of each TAZ (see Chapter 2 for details). Argos generates a 22 by 3191 OD matrix for each of the 29 USC codes. These matrices are the final output of the interregional portion of Argos.

Figure 3-3: Argos Planner: Interregional Portion

This figure is a flowchart of the inter-regional part of Argos planner. The flowchart is described in the text. The output of this portion of the workflow is an OD matrix in PCEs by USC code and by TAZ.

3.5 2007 Baseline Results

The outputs from Argos (intraregional Ps and As, interregional OD matrices) are the input for TransCAD. We processed the Argos output as described in Chapter 2. In order to evaluate our model results, we once again need actual ground count data. We worked closely with SCAG modeling staff in this part of our work. SCAG regional transportation modeling is conducted using TransCAD. SCAG provided their 2007 full OD matrix (passenger and freight), their 2007 network, and the 2007 screenline ground count data. As before, the screenline count is Average Annual Weekday Traffic (AAWT). In 2007 SCAG has 23 screenlines. Two are beyond the network we used, so we restrict our comparisons to the 21 that are within our network. Figure 3-4 shows the screenlines

We had anticipated using the SCAG model results as another comparison for our results, as we had done in the first Argos study. This gave us a benchmark for what is considered acceptable model performance in professional practice. However, SCAG has not yet produced a 2007 baseline model that generates results they consider to be satisfactory. Our comparisons are therefore limited to the AAWT data.

Figure 3-4: 2007 Screenlines

This figure is a map of the region that identifies the screenlines of the 2007 SCAG regional travel model. The screenlines are identified by numbers and lines.

The TransCAD simulation is for a defined time period, in this case "AM peak." Therefore our Ps and As and interregional matrices must be adjusted from annual flows to day flows to AM peak flows. We simply divide annual flows by 365 days. This is likely an underestimate of daily flows, since we do not expect weekend flows to be the same as weekend. On average there are about 255 "weekdays." If we were to use 255, we would clearly overestimate average daily flow, because some flows take place on weekends. We have no data on variation in truck flows by day of week, and hence have no basis for adjusting the data to generate a typical weekday flow. We therefore generate the average based on 365 days. As discussed in Chapter 2, we used a factor from prior literature to estimate the AM peak portion of flows.

TransCAD allows for different categories of flows. The identity of the flow is retained throughout the simulation, so that the final equilibrium assignment identifies total VMT, VHT, and link flows for each category. SCAG uses 8 categories of flows, 5 for passenger trips and 3 for truck trips. Traffic flows should be segmented if there are differences in impedance factors, traffic performance, or routes available. Different impedance factors for each USC code were developed by Pan (2003), and we used them to generate the intraregional truck OD matrix. It would be possible in future work to create these categories within TransCAD and use TransCAD to generate the OD matrix. We have no reason to believe that PCEs behave differently across industry sectors in traffic assignment, unless truck characteristics differ across industry sectors. We had no data to test this possibility. We therefore combined the intra- and interregional OD matrices and used only one truck category.

In order for us to use the SCAG OD matrix as the basis of our assignment, we must factor out the truck PCEs in the SCAG matrix. This is straightforward, given the flow categorizations. We simply eliminate all the "truck" categories from the OD matrix.

A final preparatory step is to adjust our PCE totals to regional totals. In this case, the only available total is the SCAG model estimated total truck PCEs. We used this total to adjust the Argo generated data so that total PCEs were equivalent to the SCAG model estimate. Note that because the SCAG baseline model has not yet been fully validated, there is some uncertainty associated with this estimate.

Table 3-3 gives results by screenline. The 2007 AAWT is the actual daily count recorded at the screenline. The third column in Table 3-3 factors the AAWT to PCE, based on the fixed factor of 2.25. The difference between the Argos estimate and actual is given in the last two columns. It can be seen that results differ substantially by screenline; this is consistent with our 2001 results. The average difference is -4.3%, the weighted average difference is -16.8%, and the simple correlation between Argos and actual is 0.72.

Table 3-3: 2007 Baseline Results by Screenline

Screenline AAWT PCE Argos Difference %Difference
1 52,656 118,476 133,197 14,721 12.43%
2 102,649 230,959 206,481 -24,478 -10.60%
3 79,387 178,620 91,815 -86,805 -48.60%
4 83,796 188,541 103,394 -85,147 -45.16%
5 59,169 133,130 158,367 25,236 18.96%
6 82,019 184,542 151,687 -32,855 -17.80%
7 49,733 111,900 35,653 -76,247 -68.14%
8 53,893 121,259 120,683 -576 -0.48%
9 36,911 83,049 47,157 -35,892 -43.22%
10 19,956 44,901 36,544 -8,357 -18.61%
11 14,691 33,055 25,123 -7,932 -24.00%
12 20,114 45,257 42,924 -2,333 -5.15%
13 23,368 52,578 100,799 48,221 91.71%
14 12,891 29,004 38,317 9,313 32.11%
15 18,516 41,661 27,195 -14,466 -34.72%
16 143,077 321,923 149,670 -172,253 -53.51%
17 71,873 161,714 175,323 13,609 8.42%
18 26,163 58,867 134,742 75,875 128.89%
19 9,922 22,325 23,144 819 3.67%
20 16,255 36,574 95,069 58,496 159.94%
21 17,779 40,003 9,274 -30,729 -76.82%

We estimated a simple regression of the actual data on the Argos estimate. Results are shown in Figure 3-5. Finally, in Table 3-4 we compare these results with the 2001 data TransCAD application. Results are rather mixed. The 2007 results have smaller average differences, but a larger weighted mean square error and lower goodness of fit. Differences in the results could be due to many factors, including the accuracy and reliability of the 2007 data, changes made within Argos planner, changes in the network, etc. Although the research team conducted many checks of the data and the workflow, we could not identify a specific reason for the difference in the robustness of the results.

For the purpose of demonstrating the feasibility of updating Argos and using the model system for sketch level scenario planning, results are adequate. However, results are not adequate for the type of detailed analysis required in long range planning. More work on the modeling system would be required to use the system in a conventional regional planning context. The model would have to be calibrated and adjusted to more closely fit the data. This type of effort is beyond the scope of this research. The Los Angeles Region is a particularly difficult context for testing new models due to its size and complexity. We note that SCAG planners and consultants have been working for several years to achieve acceptable baseline results for the 2007 model year.

Figure 3-5: Baseline Estimation Regression Results

This figure presents the plot of Argos estimated HDT (y axis) and actual HDT (x axis) in PCEs. It shows the regression line and estimated coefficients: estimated = 0.532(actual + 34294.

Table 3-4: Comparison of 2001 and 2007 Baseline Results

  2001 data 2007 data
Ave % difference 70.5 -4.3
Min % difference 1.7 -0.48
Max % difference 288.8 159.9
Ave weighted % diff 31.6 -16.8
Weighted % mean sq error 21.3 38.4
Regression R2 0.73 0.52
Updated: 10/6/2011
HEP Home Planning Environment Real Estate
Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000