Southeast Michigan Council of Governments (SEMCOG): Peer Review

FHWA-HEP-18-034

Also available as Adobe PDF (1.1 MB)

December 2017

Notice
This document is disseminated under the sponsorship of the U.S. Department of Transportation in the interest of information exchange. The U.S. Government assumes no liability for the use of the information contained in this document.

The U.S. Government does not endorse products or manufacturers. Trademarks or manufacturers' names appear in this report only because they are considered essential to the objective of the document.

Quality Assurance Statement
The Federal Highway Administration (FHWA) provides high-quality information to serve Government, industry, and the public in a manner that promotes public understanding. Standards and policies are used to ensure and maximize the quality, objectivity, utility, and integrity of its information. The FHWA periodically reviews quality issues and adjusts its programs and processes to ensure continuous quality improvement.

1. Report No. FHWA-HEP-18-034	2. Government Accession No.	3. Recipient’s Catalog No.
4. Title and Subtitle Southeast Michigan Council of Governments (SEMCOG) Peer Review	5. Report Date December 2017
6. Performing Organization Code
7. Authors Jason Lemp, Ph.D.	8. Performing Organization Report No.
9. Performing Organization Name and Address Cambridge Systematics, Inc. 101 Station Landing, Suite 410 Medford, MA 02155	10. Work Unit No. (TRAIS) Contract or Grant No. FHWA-HEP-18-036
12. Sponsoring Agency Name and Address United States Department of Transportation Federal Highway Administration 1200 New Jersey Ave. SE Washington, DC 20590	13. Type of Report and Period Covered Final Report October 2017 to November 2017
14. Sponsoring Agency Code HEPP-30
15. Supplementary Notes The project was managed by Task Manager for Federal Highway Administration, Sarah Sun, who provided technical directions.
16. Abstract This report details the proceedings of a joint peer review of the Metropolitan Council (Met Council), Mid-America Regional Council (MARC), and Southeast Michigan Council of Governments (SEMCOG) use of the new wave of passively collected transportation data. The peer review was intended to assist agency staff better understand the variety, value, and limitations of passively collected transportation data for model development, validation, and application and to identify how these data can be used for different planning and modeling purposes.
17. Key Words Peer review, MPO, Met Council, MARC, SEMCOG, Big Data, passive data, travel modeling, activity-based model	18. Distribution Statement No restrictions.
19. Security Classif. (of this report) Unclassified	20. Security Classif. (of this page) Unclassified	21. No. of Pages 50	22. Price N/A

1. Report No.

FHWA-HEP-18-034

2. Government Accession No.

3. Recipient’s Catalog No.

4. Title and Subtitle

Southeast Michigan Council of Governments (SEMCOG) Peer Review

5. Report Date

December 2017

6. Performing Organization Code

7. Authors

Jason Lemp, Ph.D.

8. Performing Organization Report No.

9. Performing Organization Name and Address

Cambridge Systematics, Inc.
101 Station Landing, Suite 410
Medford, MA 02155

10. Work Unit No. (TRAIS)

Contract or Grant No.

FHWA-HEP-18-036

12. Sponsoring Agency Name and Address

United States Department of Transportation

Federal Highway Administration

1200 New Jersey Ave. SE

Washington, DC 20590

13. Type of Report and Period Covered

Final Report

October 2017 to November 2017

14. Sponsoring Agency Code

HEPP-30

15. Supplementary Notes

The project was managed by Task Manager for Federal Highway Administration, Sarah Sun, who provided technical directions.

16. Abstract

This report details the proceedings of a joint peer review of the Metropolitan Council (Met Council), Mid-America Regional Council (MARC), and Southeast Michigan Council of Governments (SEMCOG) use of the new wave of passively collected transportation data. The peer review was intended to assist agency staff better understand the variety, value, and limitations of passively collected transportation data for model development, validation, and application and to identify how these data can be used for different planning and modeling purposes.

17. Key Words

Peer review, MPO, Met Council, MARC, SEMCOG, Big Data, passive data, travel modeling, activity-based model

18. Distribution Statement

No restrictions.

19. Security Classif. (of this report)

Unclassified

20. Security Classif. (of this page)

Unclassified

21. No. of Pages

22. Price

N/A

SI* (Modern Metric) Conversion Factors
APPROXIMATE CONVERSIONS TO SI UNITS
Symbol	When You Know	Multiply By	To Find	Symbol
LENGTH
in	inches	25.4	millimeters	mm
ft	feet	0.305	meters	m
yd	yards	0.914	meters	m
mi	miles	1.61	kilometers	km
AREA
in²	square inches	645.2	square millimeters	mm²
ft²	square feet	0.093	square meters	m²
yd²	square yard	0.836	square meters	m²
ac	acres	0.405	hectares	ha
mi²	square miles	2.59	square kilometers	km²
VOLUME
fl oz	fluid ounces	29.57	milliliters	mL
gal	gallons	3.785	liters	L
ft³	cubic feet	0.028	cubic meters	m³
yd³	cubic yards	0.765	cubic meters	m³
NOTE: volumes greater than 1000 L shall be shown in m³
MASS
oz	ounces	28.35	grams	g
lb	pounds	0.454	kilograms	kg
T	short tons (2000 lb)	0.907	megagrams (or "metric ton")	Mg (or "t")
TEMPERATURE (exact degrees)
°F	Fahrenheit	5 (F-32)/9 or (F-32)/1.8	Celsius	°C
ILLUMINATION
fc	foot-candles	10.76	lux	lx
fl	foot-Lamberts	3.426	candela/m²	cd/m²
FORCE and PRESSURE or STRESS
lbf	poundforce	4.45	newtons	N
lbf/in²	poundforce per square inch	6.89	kilopascals	kPa
APPROXIMATE CONVERSIONS FROM SI UNITS
Symbol	When You Know	Multiply By	To Find	Symbol
LENGTH
mm	millimeters	0.039	inches	in
m	meters	3.28	feet	ft
m	meters	1.09	yards	yd
km	kilometers	0.621	miles	mi
AREA
mm²	square millimeters	0.0016	square inches	in²
m²	square meters	10.764	square feet	ft²
m²	square meters	1.195	square yards	yd²
ha	hectares	2.47	acres	ac
km²	square kilometers	0.386	square miles	mi²
VOLUME
mL	milliliters	0.034	fluid ounces	fl oz
L	liters	0.264	gallons	gal
m³	cubic meters	35.314	cubic feet	ft³
m³	cubic meters	1.307	cubic yards	yd³
MASS
g	grams	0.035	ounces	oz
kg	kilograms	2.202	pounds	lb
Mg (or "t")	megagrams (or "metric ton")	1.103	short tons (2000 lb)	T
TEMPERATURE (exact degrees)
°C	Celsius	1.8C+32	Fahrenheit	°F
ILLUMINATION
lx	lux	0.0929	foot-candles	fc
cd/m²	candela/m²	0.2919	foot-Lamberts	fl
FORCE and PRESSURE or STRESS
N	newtons	0.225	poundforce	lbf
kPa	kilopascals	0.145	poundforce per square inch	lbf/in²

1.0 Introduction
2.0 Peer Review Objectives
- 2.1 Agency Objectives
- 2.2 Discussion Topics
3.0 Overview of the Agency
4.0 Big Data Overview
5.0 Peer Review Discussion
6.0 Peer Review Recommendations
Appendix A List of Peer Review Panel Participants
Appendix B Peer Review Panel Meeting Agenda
Appendix C Peer Review Panel Member Biographies
Appendix D References

List of Figures

Figure 1. SEMCOG Geography
Figure 2. Charlotte Example Expansion Factor Curves
Figure 3. Time Use of Population: Observed Data vs. Regional ABMs vs. PDM

List of Tables

Table 1. Properties of Different Location-Based Data Sources
Table 2. Peer Review Panel Members
Table 3. Met Council, MARC, and SEMCOG Agency Staff
Table 4. TMIP Peer Review Support Staff
Table 5. October 31, 2017 Agenda
Table 6. November 1, 2017 Agenda

List of Abbreviations and Symbols

Abbreviations

Will be generated automatically using PerfectIt.

Symbols

TBD

1.0 Introduction

1.1 Disclaimer

The views expressed in this document do not represent the opinions of FHWA and do not constitute an endorsement, recommendation or specification by FHWA. The document is based solely on the discussions that took place during the peer review sessions and supporting technical documentation provided by the participating agencies.

1.2 Acknowledgments

The FHWA would like to acknowledge the peer review members for volunteering their time to participate in this peer review. Panel members included:

Vince Bernardin—Resource Systems Group (RSG)
Chris Johnson—Portland Metro
Josie Kressner—Transport Foundry
Kimon Proussaloglou—Cambridge Systematics (CS)
Erik Sabina—Colorado Department of Transportation (CDOT)
Kermit Wies—Northwestern University

Additional biographical information of each peer review panel member is located in Appendix C.

1.3 Report Purpose

This peer review was supported by the Travel Model Improvement Program (TMIP), sponsored by FHWA. TMIP sponsors peer reviews in order that planning agencies can receive guidance from and ask questions of officials from other planning agencies across the nation. The peer review process is specifically aimed at providing feedback to agencies on travel modeling endeavors.

The main objective of the joint peer review was to engage participating agencies in discussions on the types of data sources available for travel modeling and planning activities and the ways in which the data can be used. Further, the peer review panel and participating agencies developed recommendations for the use and procurement of passively collected data into the future. The peer review brought together three agencies: Metropolitan Council (Met Council, Minneapolis / St. Paul, MN region), Mid-America Regional Council (MARC, Kansas City, MO region), and Southeast Michigan Council of Governments (SEMCOG, Detroit, MI region). While the peer review had three participating agencies, this report is specifically aimed at providing appropriate context to SEMCOG. Two additional reports were prepared for Met Council and MARC. The other reports provide context specific to the two other agencies, but significant overlap exists since each report documents the same peer review meeting.

In addition to the agency staff, the peer review convened a panel of experts, including planners from other agencies, consultants, and academics, to provide guidance and relate their experiences. In effect, each agency’s peer review panel consisted of these experts plus the participating staff from the other two agencies.

The peer review panel convened for two full days (October 31, 2017 to November 1, 2017). During that time, Met Council, MARC, and SEMCOG each presented background information of their region’s planning context and data needs. Panel members presented background information on various Big Data topics and the participating agencies and panel members had in-depth discussions and prepared a series of formal recommendations.

1.4 Report Organization

The remainder of this report is organized into the following sections:

Peer Review Objectives—This section outlines the overall objectives of the peer review, including objectives of the participating agencies.
Overview of the Agency—This section highlights the responsibilities of SEMCOG as well as some key characteristics of the region.
Big Data Overview—This section provides background details of Big Data that was presented by peer review panel members, including definition of key terms, identification of various available data types, and specific applications of Big Data.
Peer Review Discussion—This section details the key discussions the peer review panel had with Met Council, MARC, and SEMCOG over the course of the peer review meeting.
Peer Review Recommendations—This section highlights the official recommendations made by the peer review panel.

Four appendices also are included:

Appendix A—List of Peer Review Panel Participants;
Appendix B—Peer Review Panel Meeting Agenda;
Appendix C—Peer Review Panel Member Biographies; and
Appendix D—References

2.0 Peer Review Objectives

The primary objective of the peer review was to assist agencies in gaining a better understanding of the types of data sources available for model development, validation, and application as well as for other planning purposes and how these different data sources can best be used to support agency decisions.

In addition to the conventional sources of data that have been used in the travel modeling process, a number of new data sources have emerged in recent years, some of which are referred to as “Big Data.” Conventional data sources include travel surveys (including the National Household Travel Survey [NHTS]), Census data (including the Census Transportation Planning Products [CTPP]), employment data from state sources and federal data such as the Longitudinal Employer-Household Dynamics (LEHD) data and the Quarterly Census of Employment and Wages (QCEW) data, and private data sources on employment (e.g., InfoGroup). More recently, travel and level of service data are available from GPS tracking devices, cell phone data, and other sources such as INRIX/HERE and the National Performance Management Research Data Set (NPMRDS). Each data source has its strengths and weaknesses (and, in some cases, costs), and agencies must decide how best to make use of the various data sources.

The peer review brought together staff from three Metropolitan Planning Organizations (MPOs), each with their own planning context and experience with data and models, to discuss how they have used data and what they have been exploring to improve the data they use. The agencies include the following:

Metropolitan Council, Minneapolis / St. Paul, MN;
Mid-America Regional Council, Kansas City, MO; and
Southeast Michigan Council of Governments, Detroit, MI.

2.1 Agency Objectives

In discussions prior to the peer review meeting, the three agencies presented their objectives for the peer review. These include the following:

Recognize differences in technical approaches and levels of sophistication among agencies.
Consider how policy boards / stakeholders reach planning decisions.
Consider contracting issues and the potential for pooling resources among agencies.
Consider the needs for model validation, including backcasting.
Note the need to solve logistical issues:
1. Compare different types of data products with an eye toward deciding which data products to use.
2. Recognize how information is communicated.
3. Recognize how resources are allocated for data acquisition compared to other agency planning and modeling needs.
4. Identify strengths and weaknesses common to different types of data, including biases.
Consider the “data velocity” including the definition of which data are “real time”.
Acknowledge lessons learned from previous model development efforts and peer reviews.
Define the technical specifics of how to collect data.
Note the funding needed and available for different types of data acquisition.
Consider tradeoffs in using resources for Big Data versus conventional data collection such as surveys.
Note that data should be collected for more than just modeling purposes.

2.2 Discussion Topics

The three agencies identified several topics of specific focus for the peer review discussions. These include topics directly related to the use of data in modeling as well as other broader modeling and policy related topics.

Data Related Topics

Use of Big Data in modeling, balanced with its cost (examples of partnering on data purchasing). Consider contracting issues and the potential for pooling resources among agencies.
Tradeoffs / substitutions between conventional data (i.e., surveys) and Big Data. Benefits and drawbacks of applying Big Data as a supplement for the traditional household survey (THS). Can the THS be completely replaced by Big Data analysis in the near future?
What drives agencies to purchase Big Data? Compare different types of data products.
Use of INRIX/HERE data in modeling, both for demand and supply questions including dynamic traffic assignment.
Travel time calibration and the data needed (Big Data and/or traditional data).
Data validity and limitations.
Should agencies share Big Data with their stakeholders, and if so, how?
Other uses of data beyond modeling.
Data visualization (including examples).
Survey data collection timing including frequency, the question of continuous data collection, and the use of the NHTS.
Use of data for modeling external travel (Big Data versus conventional external station surveys).
Network coding and verification.

3.0 Overview of the Agency

3.1 SEMCOG Responsibilities

The primary responsibilities of the MPO’s transportation modeling group include the following:

Conformity analysis and long-range planning;
Activities related to the Transportation Improvement Program (TIP) in the region;
Providing planning and technical support for the agency’s planning partners; and
Developing and maintaining the region’s transportation model.

3.2 Regional Characteristics and Travel Model

SEMCOG serves seven counties as shown in Figure 1. The seven-county region covers 4,600 square miles and has a population of roughly 4.7 million people and 2.5 million jobs.

The figure depicts the SEMCOG geography, and shows the boundaries of the seven counties, including St. Clair, Livingston, Washtenaw, Monroe, Oakland, Macomb, and Wayne Counties. On the border of the SEMCOG region to the east lies Lake Erie, Lake St. Clair, Lake Huron, and Ontario, Canada. — Figure 1. SEMCOG Geography
(Source: SEMCOG, presentation slides from peer review)

The regional travel demand model maintained by SEMCOG is a trip-based model, most recently updated in 2015. The zone system used by the model consists of approximately 2,800 traffic analysis zones (TAZs) and the highway network consists of about 35,000 roadway links. SEMCOG is currently completing a recent model update, where several model enhancements were made, including use of disaggregate population synthesis data in some models. In the future, SEMCOG is considering transitioning to an activity based model (ABM) and was interested in obtaining feedback from the panel and other agencies on different elements of making that decision, including:

Motivation for the move to an ABM modeling approach;
Experience with ABM, including ABM capabilities and lessons learned; and
Cost vs. benefit.

3.3 SEMCOG Data

SEMCOG uses a variety of data in their planning activities to inform the travel demand model, including the following traditional data products:

Master roadway network (derived from Michigan Geographic Framework 2016);
Fixed route transit networks;
Cutline vehicle classification counts;
Regional traffic count database;
Congestion and travel time database;
Sociodemographic data and forecasts;
1994 external station survey;
2010 Transit on-board survey; and
2015 Household travel survey.

SEMCOG is also currently conducting a commercial vehicle survey of the region, is planning for a transit on-board survey next year, and is thinking about when and how to conduct a new external station survey. Other data collected and planning areas of SEMCOG included bicycle and pedestrian counts, airport modeling including survey data, and border crossings.

In addition to traditional data sources, SEMCOG has been working with several new, Big Data products. The Airsage dataset was purchased for calibrating travel patterns related to air travelers. Overall, the data were reasonable and produced credible results. SEMCOG also purchased Streetlight O-D pattern data, but found that some large differences existed between the travel patterns in the Streetlight data versus the travel patterns from the household travel survey and the patterns that were expected for commercial vehicles. SEMCOG is also using HERE/INRIX congestion and travel time data for their congestion management process.

SEMCOG was particularly interested in better understanding best practices related to Big Data sources. This included identifying best practices for validating the data and how to use the data to replace or complement traditional data sources. Moreover, SEMCOG was interested in identifying guidelines for the purchase of data as they have had mixed experiences in past data purchases.

4.0 Big Data Overview

This section summarizes the presentations by the peer review panel experts relating to uses of data (specifically Big Data) in travel models and other planning activities. Four presentations were made, including the following:

“Data for Transportation Planning and Beyond”—Kimon Proussaloglou and Cemal Ayvalik
“It’s Scary How Much We Know—And How Much We Don’t”—Vince Bernardin
“Big (and Semi-Big) Data in the Colorado Statewide Model”—Erik Sabina
“Data-Driven Planning with a Passive Data Model”—Josie Kressner

4.1 Data for Transportation Planning and Beyond

The first expert presentation focused on defining what is meant by the term Big Data and how it is different than other data sources. The presenters summarized some of the Census data products—which are neither really “small” nor Big Data—and examined the value of Big Data in the context of informing models of travel behavior.
Big Data typically are large in at least one of the following five qualities:

Volume or the size of the dataset;
Velocity at which the data are generated and/or processed;
Variety in the types of data that are linked together;
Valence which measures the degree of connectedness; and
Veracity which relates to the end product quality.

Unfortunately, Big Data typically lack socioeconomic information, and the raw data are not accessible to the analyst. Small data, on the other hand, are characterized by modest sample sizes, but provide detailed information at the unit individual or household level.

A number of data sources may be considered “in-between” data, including many of the Census data products. Several Census data products were summarized, including the following:

American Community Survey (ACS);
Quarterly Census of Employment and Wages (QCEW);
Longitudinal Employer-Household Dynamics (LEHD); and
Census Transportation Planning Products (CTPP).

In considering the future, new questions are going to be asked of transportation planners and modelers, including changes in traveler behavior and attitudes, the role of information, the emergence of the sharing economy, predicting changes dynamically, and risk and uncertainty quantification. Locational data represent a new source of data that can be used in a number of ways, including aggregations of the data to support activities like model validation, but also in disaggregate analyses to support the study of passenger and freight movements, augment traditional data sources, and to examine sampling and response bias issues.

To this end, the presentation summarized NCHRP 08-95: Cell Phone Location Data for Travel Behavior Analysis. In this June 2017 NCHRP study, original research studies and vendor-based estimates based on locational data were compared to traditional surveys and models for the Boston metropolitan area. The analyses focused on typical elements of model outputs including total regional travel, distribution of travel by time of day and by purpose, and O-D patterns at different levels of detail. This contrast highlighted the strengths and weaknesses of locational and traditional data sources.

Furthermore, the study produced a guidebook with practical advice to transportation practitioners on potential uses of locational data based on their strengths and weaknesses. Summary tables are used to compare traditional surveys to locational data as a potential data sources for different types of models.

This report discusses the extent to which call detail record data, GPS data, and cell phone app-based data can augment or replace traditional means of data collection. The pros and cons of each method are discussed (as shown in Table 1) including sample size, sample representativeness, and the richness of each data source.

Table 1. Properties of Different Location-Based Data Sources
Data Property	CDR Data	Personal GPS Derived Data	Smartphone Surveys	Custom Bluetooth Data
CDR Data in Raw Form	Raw data likely not available due to privacy concerns.		Raw data are available to data analysts.
Processed CDR Data Available to Analyst	Processing methodology is not known to analyst.		Methodology can be shared with the analyst.	Limited data processing is possible.
Zonal Size and Spatial Resolution	Low spatial accuracy. Zone size and number of zones affect pricing.	Spatial accuracy greater than CDR data.	Spatial accuracy similar to personal GPS data.	Data can be used to support corridor traffic analysis.
External Zones and External Stations	External travel may be obtained.		Depends on survey methodology and participant travel.	Yes but it depends on survey locations.
Trip Purposes	Activities and purposes are inferred. Three purposes are available—HBW, HBO and NHB.		Detailed trip purposes through prompted recall.	Not possible.
Socioeconomics	Not available.		Available.	Not available.
Technology	Advances in technology will yield more accurate data. More frequent data points. Greater spatial accuracy.	Standardized technology. Potential to improve pulse rates vs. battery life.		Standardized technology
Time Periods Temporal Resolution	Depends on cell utilization & interaction with network.	Depends on level of interaction with network.	Very detailed resolution.	Possible to summarize data by time of day.
Commercial and Passenger Travel	Not possible to differentiate between vehicle classes.	Able to differentiate between vehicle classes.		Not possible to differentiate.
Sample Expansion	Expansion is driven by population and geography. No socioeconomic or market segment data. Vendor-driven methods are used.		Customized expansion by socioeconomics and geographic detail.	Expansion can be made to vehicle counts.
Path Traces	Unreliable path traces. Infrequent transactions. Low spatial accuracy.	Unreliable traces for slow data transaction rate.	Very reliable path traces.	Not possible.

(Source: Visual aids distributed during peer review)

The report also discusses how different individual components of a regional model (trip generation, distribution and mode choice) or aspects of regional models (estimation, validation, and calibration) can be supported by each of these new forms of data.

4.2 It’s Scary How Much We Know—And How Much We Don’t

The second presentation summarized some of the emerging data sources, how they have been used in practice, issues that come up with these datasets, and considerations when purchasing these data. The key data types that were summarized and contrasted include the following:

Cell tower signaling datasets;
Location-based services datasets;
GPS-based datasets; and
Bluetooth datasets.

Each dataset type has particular advantages and disadvantages. Sample penetration is a big issue with these data sources. Data will not necessarily be ready to go out of the box because the samples are not guaranteed to be representative of the population. The data often come with locational, trip length, or demographic biases, which means that some additional budget will be required for data analysis and expansion. For the Charlotte region, Figure 2 illustrates the expansion factors that were developed to correct for biases in locational data as functions of trip distance and area type. The first two series of the figure show that as distance increases, the factors used to expand the trips reduce for both residents and visitors. The last two series show that as area type changes from rural to urban, the factors used to expand the trips grow for both residents and visitors. Moreover, a larger sample size does not necessarily correct these issues.

The figure depicts expansion factors used in Charlotte as a function of distance that a trip traveled and as a function of area type. For each curve, separate expansion factors were developed for residents of the region and visitors. — Figure 2. Charlotte Example Expansion Factor Curves
(Source: Presentation slides from peer review)

Several uses of these datasets were demonstrated via examples from projects around the U.S. For instance, in a couple of Tennessee examples, origin-destination (OD) data, which covered roughly 30 percent of the O-D pairs in the models, were purchased to get a better sense of the O-D patterns, as compared to the standard household travel survey (which covers 2 percent or less of O-D pairs). In Ohio, a smartphone app is being used to collect disaggregate data, which have then been compared to the disaggregate data available from location-based service (LBS) data providers, and has identified the possibility of data gaps, particularly for short trips.

Looking forward, changes in travel behavior have been documented to be one of the biggest reasons for travel forecast errors. The emergence of passively collected data provides an opportunity to see a larger picture and identify shifts in travel behavior more quickly.

4.3 Big (and Semi-Big) Data in the Colorado Statewide Model

The third presentation focused on the use of Big Data in the development of the Colorado statewide model. Four sources of Big Data were used in the model, including the following:

QCEW;
Addresses;
GPS-trace O-D data; and
GPS speed data

The QCEW was used in conjunction with state demographer estimates to enhance the richness of the overall land use data used by the model. A number of issues arise in using the addresses data, including distinguishing between residential and commercial addresses. The GPS-trace O-D data were used to validate travel patterns for areas of the state that were not included in the household travel survey data, which was limited to only four MPOs in the state. GPS speed data were used in validation, and the main issue that arises in using these data is conflating the network with the model network to ensure roadway speed data are attributed properly to the correct links in the travel model network.

4.4 Data-Driven Planning with a Passive Data Model

The last presentation made by experts focused on a new and innovative travel model framework. Unlike traditional trip-based models and newer activity-based models, which rely on a local household travel survey and a set of sequential steps with the goal of mimicking trip-making patterns, a passive data model (PDM) is fundamentally different. It starts with archetypal daily patterns from the National Household Travel Survey and feeds these patterns into a simulation that probabilistically joins O-D matrices, network data, and household and firm data from commercial providers. The result of this simulation is a synthetic population with synthetic travel diaries that feeds into a network assignment model.

For NCHRP Report 184, an investigation was performed to determine whether a passive data model would be transferrable between regions, where only the input data requirements need to be updated. In Seattle and Atlanta, the synthetic diaries compared favorably with the results of recent household travel surveys on multiple dimensions, indicating that the synthetic diary was a plausible substitution of the household survey for at least some measures. Figure 3 shows one example measure used as comparison for the PDM to evaluate how transferrable the model is from region to region. The figure shows that at the start of the day and end of the day, most individuals are located at home. In the middle of the day, people leave home for work, travel, and other activities. Overall, the report suggested that the passive data model did have good transferability properties across regions.

The figure depicts the percentage of time use of the entire population of a region devoted to each of four activity types as a function of time of day. Activity types are differentiated as home, work, travel, and other. The Atlanta ABM is depicted side-by-side with National Household Travel Survey (NHTS) data, Seattle Household Travel Survey data, the PDM for Atlanta, and the PDM for Seattle. — Figure 3. Time Use of Population: Observed Data vs. Regional ABMs vs. PDM

(Source: Presentation slides from peer review. Notes: Starting from the bottom of each graph depicted in the figure, the first activity type depicted is ‘Home’, the second is ‘Other’, the third is ‘Travel’, and the fourth is ‘Work’.)

In Asheville, the trip diaries were assigned to a highway by both static user equilibrium assignment and microsimulation (using MATSim), and the forecasted traffic volumes were compared to a recently calibrated four-step travel model for the region. The PDM assignment was within acceptable error margins, and it could be improved with elementary calibration techniques.

The passive data model can be quite useful for planning purposes as well. Several examples were illustrated, including a bridge closure example, a peak hour tolling example, and an autonomous vehicles example, with each case demonstrating the reasonableness of the approach.

5.0 Peer Review Discussion

Following the agency and expert panel presentations on the morning of the first day of the peer review, the group discussed a variety of topics over the next day and a half. This section documents the key points that were discussed during the meeting.

5.1 MPO Stakeholders

To provide context for the subsequent discussion, one topic of discussion involved the key questions being asked of MPOs by their respective stakeholders and the types of policies and goals that they want their travel model to be capable of addressing. Some common themes emerged and others were unique to the agencies.

Connected and autonomous vehicles (CAVs) were important to all of the MPOs. Stakeholders are hearing more and more about the CAVs and how they are likely to impact the way people travel. In response, stakeholders want information about what those impacts are likely to be.
Changing demographics (including the aging population and increase of retirees) is another important topic to stakeholders across the agencies.
A couple of the agencies also noted the importance of being able to forecast where housing and jobs will grow in the region, and how the transportation system will be impacted in areas where growth is expected to occur.
Forecasting transit ridership is another important area to the MPO stakeholders, who specifically want to ensure that the model is appropriately sensitive to transit policy and can reasonably predict changes in transit ridership.
Met Council noted the importance to stakeholders of changes and volatility in the economy and understanding how the shared economy and online retail will bring changes to the transportation environment.

Credibility of the model with stakeholders, more generally, was noted as a key challenge in answering stakeholder questions. This is particularly the case in an environment where technological changes occur rapidly often making 10-year old data irrelevant to the analysis of key policies in which stakeholders are now interested.

5.2 Themes and Priorities Discussion

Several themes and priorities for the peer review meeting were discussed early on in the discussion period. Several topics were highlighted, as noted below:

How do we know passive data are good enough for planning and modeling applications? One peer review participant expanded on this, noting that travel patterns are changing and this fact will make it difficult to ensure that data represent true travel patterns. Another participant noted that passive data could be a key resource in terms of evolving travel patterns due to its longitudinal characteristics and the speed with which it can be collected and processed, something that is missing in traditional data sources. Another avenue for utilizing the data lies with machine learning and ensemble forecasting methods, and moving to a suite of forecasting tools (rather than a single travel model) could help us learn a great deal.
How do we make decisions about what data to use for different purposes? One participant highlighted the need to use the correct tool for each application, including the travel model and a regional household travel survey, which is typically not explored enough and should be used for more than model development. Another participant noted that passive data currently are not useful for building an activity based model, since the socioeconomic characteristics of the travelers are unknown.
What is the cost of data? SEMCOG noted that the cost of Streetlight dataset for their agency was about $1 million. Met Council has a unique arrangement to use Streetlight data at the state level for one year for about $700,000. Met Council noted that under their new travel survey data collection procedure, they will be spending about $1.25 million per year for household travel survey data each year for 10 years.
What skills are needed to work with new passive data sources? One peer review member emphasized the importance of data analytical skills. Agencies need to understand the data sources fully. This would allow them to build their own processing algorithms or vet the results instead of relying exclusively on the tools from data vendors, where transparency of the processing methods is usually lacking.

5.3 Big Data Uses and Needs

Representativeness of Big Data Samples
Several peer review members noted the need for transparency in the processing of raw passive datasets into the data products sold by vendors. One panel member suggested that while open source data and software can be very appealing, it is often very challenging for private companies. A middle ground solution might make sense, for instance, where pseudo-code is provided to an agency with the data so that the general approach and methods used by passive data vendors can be evaluated. Another panelist agreed and suggested that asking some basic questions of the data vendors during the purchasing process would be warranted (e.g., what is the sample size, what are the data processing procedures, what are the heuristics). Another participant commented on the importance of transparency in the data products that allows agencies to speak to the reliability of the forecasts they are providing. Met Council has taken the approach of requiring consultants to provide transparency in the methods that are used so that someone else could later replicate the process. For that agency, this comes down more to a contracting issue.

In addition to transparency in the methods, one peer review member suggested that there should be data validation standards. Vendors should be able to demonstrate that the results they are finding are valid, and this is probably as important as transparency.

Big Data Issues—Defining Guidelines
SEMCOG has been using origin-destination (OD) data from Airsage and Streetlight and has found false locations in the datasets. The first dataset they were provided was a GPS-based sample. SEMCOG mapped the data and compared to observed travel patterns from the household travel survey, finding large percentage errors in the distribution of trips, particularly in rural areas. In bringing this issue to Streetlight’s attention, Streetlight found that moving to a more detailed spatial resolution would have helped, but that option would also have been more expensive. One peer review member noted that having guidelines for what to expect with passive datasets is a good start, but nothing is going to substitute for actually working with the data to understand their merits and flaws.

The panel discussed whether it would make sense to define guidelines for what MPOs and other agencies should expect and demand from passive O-D data vendors. Several metrics for sample penetration were discussed, including vehicle-miles of travel (VMT), person-miles of travel (PMT), and sightings of unique vehicles per day. It was discussed that whatever metric is used, it should be measured both before and after the processing of the data since large chunks of raw data are typically not used. The effective sampling frame, after data cleaning, is critical and the best practices in expanding the data may require the use of multiple datasets to address key areas where the data may be biased including locational, demographic, and duration biases.

One participant noted that there are clear advantages to the fact that there are no prescribed rules from the federal agencies, but federal guidance will often carry a lot of weight. Another panel member noted the importance for the travel modeling community to gain a clear and objective understanding of these data quickly, because many agencies are already purchasing the data without a baseline of knowledge about data specifics. To this end and as discussed above, transparency in the methods used to process data is one clear item the panel believed was important. In addition, having conversations with the data vendors and communicating with planning partners is important.

The panel discussed the merits of disaggregate versus aggregate Big Data. Several panel members suggested that for O-D data, getting disaggregate data really was very important and should be highly favored over aggregate data products. It is important to understand that Big Data are not collected for transportation planning purposes, and therefore, understanding the data at the disaggregate level is a critical step for effectively customizing the data to various transportation planning applications. For instance, disaggregate data are needed for environmental justice analyses. Moreover, travel is inherently an individual or group activity, so it is only possible to understand the behavioral aspects of travel with disaggregate data. Another participant noted the importance of being able to segment data in a variety of ways, which can be important for certain analyses and/or validation exercises. One panelist suggested that technical professionals really need to be looking at disaggregate data to ensure they pass simple sense checks, but was concerned about privacy implications.

Dealing with Big Data Vendors
One challenge that a couple participants noted was that the sales people for these data products often do not fully understand how the data is used in travel demand modeling and forecasting. Some of the data vendors are starting to learn that the transportation community requires more technical detail about the source and pre-processing of proprietary data products. The transportation community must continue to demand that vendors fully disclose their data collection procedures and the methods by which the final data products are prepared.

Another participant noted how the market for passive datasets is continually evolving and prices are constantly changing. As such, it would be valuable to the travel modeling community for travel modelers to agree on what the value really is, to help evaluate whether a passive data purchase makes sense in different circumstances. Agencies should be able to define a clear need for the data also, before deciding to purchase these data products. Another panelist noted that when the companies get bigger and refine their products, their willingness to provide custom data products (as they often do currently) may erode.

Several peer review members were interested in how contract negotiations for data purchases are typically handled and how they should be handled. One participant noted that state departments of transportation typically have legal departments, procurement protocols, and expert negotiators who can be very helpful in contracting for data purchase while many MPOs do not have these resources. On the other hand, states do not deal in these types of data procurements as much and certain skills and knowledge at the MPO level is really needed in the procurement process. Another panel member noted that data standards defined by the transportation community could be an important element to reducing prices of passive data products in the future.

A number of problem areas with passive O-D data were discussed. Several participants noted that when issues are identified in the passive datasets, the data vendors often ask for the MPO’s data to enhance the product. In cases where MPO validation data are used to enhance a vendor data product, the panel was concerned about the value of the underlying travel patterns in the data and the methods used to arrive at these estimates. It also creates difficulty for the MPO in conducting an independent external validation of the vendor data. A couple of panel members noted that the data vendors are still in the process of understanding their data, and they may not be as knowledgeable about the methods and approaches that are acceptable to adjust and/or expand data for transportation planning purposes.

Filling Data Needs over Traditional Data Sources
One area of travel behavior where Big Data has great potential to influence our knowledge and/or modeling approach is visitor and tourism travel. Demographic data are often thinner than corresponding data for residential travel, and tourism agencies often collect different types of information than is useful to travel models. In the case of the Colorado statewide model, for instance, their new travel model (which was still under development at the time of the meeting) will likely not perform well in resort towns in the state, where very little data existed for model calibration or validation. However, the Colorado Department of Transportation would like to have the capacity to do project level analyses in those areas. One panel member suggested this may be a case where project specific approaches are more appropriate than trying to use the statewide travel model. While a consistent project evaluation platform can be very useful, other tools may be warranted in certain circumstances. Special events are another area where Big Data may be incredibly useful for identifying the locations were individuals are coming from.

One participant noted the importance of obtaining data for rural areas, especially at a state level, and using these data for model validation in rural areas. This is a particular area where Big Data may be of great value. On the other hand, very little other data is available for many rural areas, which means both that getting any information is valuable (as long as it is somewhat reliable), but also that there may not be any good ways in which to validate these data.

Panel members suggested that one traditional data source could be retired in favor of new Big Data sources. The panel agreed that external intercept surveys are no longer needed, noting some recent studies. However, a couple of participants noted there may be special circumstances where intercept surveys are still warranted and feasible, like border crossings.

5.4 Future Data Needs for Travel Forecasting

One discussion item of the panel where several opinions emerged was forecasting with Big Data. A big concern with Big Data sources is the missing socioeconomic characteristics of the traveler, which is a key driver of current behavioral models. One panelist cautioned that unless the Big Data could be used to inform the behavior of travel models (i.e., individual cause/effect determinants), then all Big Data would be useful for is replicating current conditions, not forecasting the impacts of transportation system policies. A new behavioral paradigm that can be used to develop models with Big Data sources is needed. Another panelist remarked that the longitudinal nature of the data will be extremely useful in the future to understand changing travel patterns (whether the current behavioral paradigm persists or not). Another suggestion was to expect change in the data and methods used for forecasting and to adapt with them.
Several panelists pointed out the need to understand the limitations of the models. For instance, forecasts made 20 years ago looking forward to current day would not have been able to foresee the changes that have transpired, but that does not mean that the forecasts were not useful. Understanding the merits and limitations of models is also important to identifying ways to make incremental model improvements, possibly informed by Big Data sources.

Several panelists opined on how machine learning could be one important tool for using Big Data to inform new models of travel behavior. One participant suggested that classical statistics were never particularly well-suited for travel modeling due to the high dimensions faced in transportation problems, but machine learning is oriented in that very way. To use those methods, valid data are essential, just as it is in classical statistics. Another peer review member noted that machine learning tools will need more frequent or continuous updating of observed data, which will require new data contracting mechanisms and business practices at MPOs.

5.5 Household Travel Surveys

Collecting Household Travel Survey Data
On the topic of frequency with which household travel survey data is collected, several panel members suggested that more frequent surveying is almost always better, even if it means smaller sample sizes. This approach which is similar to the ACS paradigm, has recently been adopted by some MPOs. Met Council is moving to such a longitudinal surveying approach, collecting household survey data every couple of years using a prompt and recall cell phone type survey approach. They understand that certain demographic groups will be underrepresented in the sample (e.g., those that do not have Smartphones), but they believe that controlling for this underrepresentation will be manageable. One benefit of the approach will be in the longitudinal nature of the data, but years can also be combined to form a more robust, larger sample.

The practices in Europe and Australia were brought up by a couple of panelists as examples where a great deal more emphasis is placed on data collection than in the U.S. In Europe and Australia, there is an order of magnitude difference in the resources allocated to data collection, compared to the U.S. In Australia, continuous household surveys have become popular. One participant noted that the longitudinal data can be used to better track how travel behaviors change over time. This will become increasingly important as travel patterns have been and likely will continue to rapidly change in the face of emerging technologies. In addition, large data collection efforts that take years to complete may render the data obsolete before it is even usable. From an agency budget planning perspective, a couple of participants offered that a continuous, or otherwise more frequent, data collection plan could be more manageable to keep a line item in the MPO budget each year, rather than a single big expense every 8 to 10 years. A panel survey element was also suggested as an element that could add even more value to more frequent data collection, but it was noted that in the few examples in travel survey data collection, attrition of respondents has been a major issue that may not justify the additional value provided by this option.

There was some interest among participants in looking for indicators that could be used to prompt agencies to know when it is time to collect new data. For instance, SEMCOG compared 2005 to 2015 household travel survey data to see how travel patterns changed, finding that some patterns changed (such as the increase in percentage of individuals working from home) while other patterns did not.

Combining Big Data and Household Travel Survey Data
One panelist noted a tradeoff between traditionally collected data and Big Data. Several participants suggested that the data types were not interchangeable. One possibility is that new methods could be used with Big Data, but another might be that the data sources are used as complements to one another.

There was interest among panel members in identifying specific ways to combine Big Data and traditional household survey data. One panel member suggested that the household travel survey could be used to validate passive O-D data if full GPS traces are included in the household travel survey. If not, more aggregate validation may be all that is possible. MARC, in its partnership with Sidewalk Labs, will be obtaining synthetic household survey data. MARC will also be conducting a household travel survey in the next year, and they plan to examine the synthetic household survey data quality to decide whether it can be used to complement a traditional survey, perhaps allowing for a smaller traditional survey dataset. One panel member commented that Big Data will continue to require validation, and the household travel survey could perhaps provide that function, and in the future, that may evolve as its primary function.

Several panel members emphasized that a household survey remains a critical data item with the current forecasting methods being utilized, but can be very useful for other modeling activities outside of model development also. One peer review member noted that the full travel pattern information that comes from household surveys is actually split into small snippets of information used to develop individual model components and wondered whether something similar could be done using passively collected data. The passive data model (PDM) discussed in the previous section, in fact, does precisely this, but under a different modeling paradigm than most modern travel models. None of the data vendors in the passive data market are currently interested in summarizing the data in a way that would be useful for developing a modern activity-based model.

5.6 Dealing with Uncertainty

The panel had two discussions about uncertainty in data and uncertainty in travel forecasting. In the first, one participant noted the importance of acknowledging the uncertainty in survey data (particularly due to the small sample sizes) and suggested that travel modelers could better convey margins of error when reporting information directly from surveys. While several participants agreed with the sentiment, one panel member disagreed, noting that MPOs often have trouble simply defending their point forecasts. Attempting to provide an explanation of the uncertainty in travel survey data could obscure the value that travel modelers and transportation data can provide to policy makers. Of course, Big Data does not solve the problem of uncertainty since the data is not necessarily representative of the population.

In the second discussion, one panelist suggested that examining the accuracy of forecasts would be a very useful exercise for demonstrating their value to the public and policy makers. A couple of participants noted that forecasting horizons are often too long (e.g., 20 or 30 years) for this type of analysis to be useful due to the high levels of uncertainty. Instead of estimating bounds on one forecast, it could be more helpful to run many different scenarios and compare scenario outcomes. However, there are forecasts made with shorter forecasting horizons, and a few research papers exist looking at this very issue, specifically related to toll and transit forecasts. A couple of panel members noted that public agencies do some of this, specifically when new transit lines or extensions open.

5.7 Data Visualization & Communicating with Stakeholders

The peer review participants had wide ranging experiences in terms of visualizing data. For instance, one panel member’s agency has had success in data visualization tools, a key driver of which was a focus on data-driven analysis and solutions, an area where visualization can be very useful. One key element of the success at this agency was data integration. Met Council is another agency that has done a lot of data visualization. Some of the visualization tools they are using were demonstrated during the discussion, including visualization of transit trips by line, heat maps, and transit boarding and alighting station graphics. Another participant mapped accessibility measures to universities and hospitals to garner public support for projects in locations with poor accessibility. It was also noted that data visualization can be useful for debugging models and identifying issues. At another agency, a great deal was invested in data visualization (as opposed to model quality) with limited success. Another panelist noted that some data visualization tools require that data be pushed into the public sphere (at least if the tools are going to be made publicly available), which can present key challenges for agencies.

Some concern was expressed about whether policy makers can properly interpret data visualization products without active coaching. Several participants agreed that like most modeling results, data visualizations need clear explanations to accompany them. Data visualization tools are particularly useful to catch stakeholder attention and influence stakeholder perception of important travel forecasting results. However, they cannot replace critical thinking and interpretation by transportation modelers. Other panel members emphasized the need to tailor any discussion of data and results to the audience. Some audiences may be technical, but others may very much not be, and perhaps planners at MPOs could help bridge the gap. Having different visualization tools to communicate with stakeholders could also be effective, since people learn and understand things in different ways. Consistency in the presentation of results might also be important so that stakeholders see a certain type of information multiple times.

Interfacing with the public was another topic that was discussed. Often agencies have separate communications departments whose role is to interface with the public and geographic information systems (GIS) departments. It is critical for the modeling staff to be communicating with these other individuals in an organization to ensure that the messages emanating from the modeling effort are consistent with other agency messages. A couple of panel members noted that the public is often misinformed or has misconceptions about transportation systems, and communicating nuanced themes and concepts can be a challenge. Visualization can be a useful tool for explaining complex concepts and showing the importance of good data.

A couple of participants were interested in the types of skills that others look for in hiring modeling staff at MPOs and whether MPOs should be looking for staff with non-modeling skills over those with modeling skills. One panel member gave an example of hiring a person with GIS and programming background, but with little modeling background. The thinking was that, as transportation modelers, we are well positioned to teach those skills more so than other skills. On the other hand, other peer review members highlighted that interdisciplinary skills that are needed including knowledge in modern programming languages, statistical software, and basic modeling skills. Another panelist agreed and stressed the importance of a variety of skills, including interpreting data, formulating mathematical models, and communicating with decision makers and the public among others.

5.8 On-Board Surveys

Best practices related to on-board surveys were discussed in relation to the types of information that would be useful for ABM development and validation. One participant asked whether a tour-based on-board survey would be the best approach. Another panel member suggested that the right approach might be to ask a couple of questions about the tour in order to establish a better sense of the context of the tour, but was not sure that a full blown tour-based approach would be appropriate. In their most recent on-board survey, Met Council obtained some information about the tours.

5.9 Activity-Based Models

The discussion on ABMs was more structured in that each of the participating MPOs had an opportunity to talk about why they considered (or are currently considering) an ABM and what factors enter into that decision.

Participating Agency Considerations
SEMCOG is still in the planning stages in terms of ABM model development. They currently have a trip-based model and are considering moving to an ABM in the future. There are several reasons why they are thinking of developing an ABM, mostly related to the types of policy questions they are facing as an MPO. These include wanting to be able to analyze the effects of an aging population, doing a better job of forecasting non-motorized travel, forecasting the potential impacts of CAVs, forecasting the impacts of the stagnant regional economy, and understanding issues related to transit. In addition, SEMCOG is concerned about model credibility in the future, given that most large MPOs in the country already have ABMs or are in the process of developing ABMs. SEMCOG was also interested in hearing about the policies and questions that other agencies are addressing with their ABMs.

MARC currently has a trip-based model, but is thinking about what the next version of the travel model will look like. There is a desire from the director to improve the model, but no consensus on the best way of doing so. The top question being asked in the region has to do with the effect of CAVs, something that the current model has no capability of addressing. They are also interested in the impact of capacity projects on land use patterns, and the integration of the travel model with the land use model would improve MARC’s capabilities. Other considerations for MARC included analyzing transit policies and peak hour travel patterns.

Met Council moved to an ABM a couple years ago for various reasons. Met Council wanted the ability to analyze congestion pricing policies and the effect of congestion on the timing of travel. More broadly, Met Council staff found that more of the questions being asked could not be addressed by the trip-based model. They were also worried about agency credibility in terms of travel model without an ABM that could address many of those questions in some fashion. While they have used the ABM to perform CAV-related analyses recently, this capability was not part of their decision to move to an ABM structure. Agency staff emphasized that critical thinking about model results is perhaps even more important with an ABM than a trip-based model due to the larger number of model components. Moreover, they believe that a large survey dataset is needed due to the low incidence of several of the choice elements and market segments in the model.

Panel Member Experiences
After the participating agencies gave their views on the factors they have thought about in the context of transitioning to activity-based models, three of the panel members commented on their experience having each worked at MPOs that made the transition to ABMs.

At the Chicago Metropolitan Agency for Planning (CMAP), many of the considerations already discussed were also considered there. Four-step models were considered to provide very reliable and consistent answers, but they are not based in behavioral theory, which was considered a major drawback. When the management at CMAP changed, a new emphasis was placed on understanding the regional economy and land use and collecting data to make more informed decisions. Equity planning also came to the forefront with an interest in identifying the groups that would benefit from projects. There was also an emphasis on exploring non-motorized and transit modes, including quantifying the impacts of premium transit attributes (e.g., quality of ride, Wi-Fi, etc.) and bike to transit. However, the most important consideration for transitioning to an ABM in Chicago was the ability to analyze highway pricing, which was something that the trip-based model could not adequately do.

At the Denver Regional Council of Governments (DRCOG), before transitioning to their ABM, they were conducting comprehensive analyses looking at land use, transit and highway. However, the four-step model was insensitive to many of the inputs and the model was not able to answer many of the new questions that were being asked. Once DRCOG moved to the ABM, it was very rare that the model could not answer a question at least to some extent. DRCOG was particularly concerned about model run times during model development and, over time, invested in model performance to reduce the model run times.

At the Puget Sound Regional Council (PSRC), issues that were common to other agencies emerged since the trip-based model could not answer many of the policy questions being asked. Such questions included congestion pricing, transit and non-motorized modes, and the integration of land use and transportation. This led to the development of a strategic model improvement plan, which was a deliberative planning process. It ultimately led to building a consensus that an ABM was the appropriate direction for the region which pursued and built an ABM model.

General Discussion
Following the more structured format, a number of topics were covered in more detail. One panel member suggested that one of the main reasons an ABM can answer a lot more questions is that it is based in behavioral theory. Moreover, because they are disaggregate, travel pattern predictions of the model are clear and this can make it easier to explain model results. It was also argued that adding disaggregation is almost always better than aggregate models. For instance, even if there is not 100 percent certainty about the accuracy of parcel data within a traffic analysis zone (TAZ), an aggregate model makes implicit assumptions about the location of parcels (they are all assumed to be at the centroid of the TAZ).

Two questions on transitioning to ABM from trip-based models were raised by participants. First, in terms of the actual transition process from a trip-based model to an ABM, one panel member stressed the importance of maintaining the trip-based model for a period of time so that a gradual transition can be made. Another panelist suggested that the process of ABM development should engage MPO staff, allow them to contribute to the ABM, and use the experience as a training exercise. Second, in terms of survey data requirements, one participant suggested that a household sample of at least 10,000 households is likely needed in order to provide sufficient sample size for all model components. Another panel member noted a sample size of 7,000 households in the development of another ABM, but suggested that some model components could be borrowed between agencies, something that is common with four-step models (at least in terms of model sensitivities). In terms of data collection, one panelist suggested that over-sampling certain hard-to-reach populations may be a good idea not only because reaching these populations is becoming harder, but also because ABMs use many more groupings of households. Therefore, the travel behavior of distinct market segment needs to be represented well, making it even more important to reach all types of households with the survey. Another panel member suggested that, more than in the past, survey firms are better equipped to respond to potential issues in under-sampling populations in real time to ensure responses are obtained from hard-to-reach segments of the population.

One panel member remarked that choice of ABM vs. trip-based model is a false choice because there is a spectrum to model designs and many hybrid models exist. Moreover, moving from one end of the spectrum to the other does not necessarily say anything about model accuracy, but the types of analyses that can be performed. For instance, travel demand management scenarios, pricing scenarios, non-motorized modeling, and environmental justice analyses all require an ABM. However, a trip-based model may be better suited for certain types of analyses, like project level analyses or air quality analysis due to the fact that they are deterministic.

Several panel members commented on differences between land use and network data input needs of ABMs compared to trip-based models. For current conditions, there was general agreement that the differences in land use and network data inputs were similar. A couple of panel members commented that cleaning and preparing these data items already constitute one of the most time-consuming and intensive components of developing the model. On the other hand, one panelist commented that development of forecast year land use data can be much more challenging when ABM geographies use higher resolution than the TAZ. Another peer review member discussed that land use forecasts for the future need not be that much more challenging if simpler tools are employed. Simpler tools may require making additional assumptions, but this should not be a key point for decision to move to an ABM. Another participant suggested that it is really a matter of weighing the costs versus the benefits of adding detail to the model.

Several panelists noted the importance of an ABM for integration with dynamic traffic assignment (DTA) models. A holistic model system with DTA cannot be achieved with a trip-based demand model. On the other hand, an integrated ABM and DTA is not currently practical due to run times, which would be measured in weeks. One participant noted that static assignment methods used with ABMs make many of the potential benefits of ABMs irrelevant.

Forecast uncertainty as it relates to ABMs was mentioned by a couple of panel members, both of whom commented that ABMs are time consuming to run. Since addressing uncertainty requires many model runs, ABMs may not be well-suited to in-depth uncertainty analysis. A couple of panel members also remarked that validation of an ABM is more time consuming due to the larger number of model components and model parameters. Also, model validation should be viewed as an ongoing process. Looking to the future, one participant opined that because of the lengthy development cycles of ABMs and because the newest questions being asked of travel modelers cannot be answered due to the absence of relevant data (e.g., CAVs, generation preferences, and shared mobility), new more responsive tools may be needed to utilize data faster as they comes on line to meet the rapidly evolving planning environment.

6.0 Peer Review Recommendations

On the last half day of the meeting, the peer review spent time discussing specific recommendations for the three agencies (and for agencies in general) regarding the use of Big Data. This section of the report details the recommendations of the panel, organized by topic.

6.1 Informed Purchasing—Questions to Ask Big Data Vendors

The panel made a number of recommendations for the types of information that agencies can ask of Big Data vendors to better inform the agency on the data they will receive. The following are questions and discussion items the peer review panel thought were important and could be asked of most Big Data vendors.

One of the main points made by the peer review panel was the importance of understanding exactly what is included in the data they receive (and what is not).
- Ask about the sampling frame and source of the data. What types of devices does the data come from? When data come from multiple device types or services, ask how duplicate records (e.g., when an individual has multiple devices) are handled. Vendors should be able to provide a clear explanation of the data source(s).
- Ask the data vendor to provide the methodology for data processing. Specifically, what assumptions are used in data processing and how is information imputed when it can’t be observed directly. Are there any groups of collected data that are suppressed? Actual source code is not necessary but answers to such questions can provide useful insights.
- Ask data vendors about the data expansion processes and how it is applied to ensure a representative sample. Ask the data vendors to identify population segments that are missing from the data reflecting potential income level, trip length/duration, and locational biases. In what ways may the data be biased?
- Get access to a sample of the data to better understand what will be delivered, if possible. Alternatively, ask what the format of the delivered data set will be and what it will look like before making a purchase decision.
Information about the sample size, composition, and metrics should be provided.
- Ask about the persistence of devices in the data.
- Ask about the percentage of VMT/PMT in the sample.
- Geographic precision of the data is important and should be identified. Precision criteria may depend on whether the data is network or O-D data.
- Persistence of the data should be provided: number of people persistent in the data for a month, number of sightings per day (median), or median length of device ID age.
Other considerations/questions for agencies included the following.
- Specifically ask for data that come from your region, instead of national averages.
- Give the appropriate context of questions to data vendors so they understand why they are important. Tiered questioning may be useful.
- Have a conversation with data vendors. Tell vendors exactly how the data will be used. Press vendors to clarify the source and processing methods of data items that are critical to model development and validation. Open-ended questions are useful for screening how the data will fill agency needs.

The panel emphasized that data vendors may not be able to provide complete answers to all of these questions, and thus, data acquisition decisions may have to be made on incomplete information. That does not necessarily mean data should not be purchased. The utility of the data (and their limitations) should be weighed against costs. The panel emphasized that communication with the data vendors will also help them to further improve their data products.

It is also important to identify staff resources that the agency can devote for a better understanding of the data and getting the data into the ultimate desired format.

6.2 Guidance on Metrics

The panel made several recommendations on guidelines that can be used for key metrics of Big Data sources.

For O-D data, the sample should provide at least two to three percent of VMT or PMT of population total. Another rule-of-thumb that could be used is a sample about 10 times the size of the household travel survey. Otherwise, the benefit, in terms of filling gaps in the household travel survey, is too limited.
Another metric is the percent of the O-D space that is filled by the data. In a typical household travel survey, about 5 percent of the O-D cells are filled. A rule-of-thumb for passively collected O-D data could be 15 to 35 percent.
Persistence of data was identified as a key metric, using the median number of days a device ID appears in the data or the median number of sightings of a device ID per day. No specific criteria was identified for this metric at this time.
Geographic precision requirements will depend on the application. For O-D datasets, 50 to 100 m precision is sufficient. For network linking, a greater precision around 10 m is required.
The sample should use a timeframe and format that is consistent with other data sources.
Market Segmentation
- Demographic segmentation is not possible currently.
- Segmentation by vehicle class (e.g., truck type) should be possible and would provide the ability to isolate passenger travel patterns. This could be asked of the data vendor.
- Segmentation by residents versus visitors might be possible.

6.3 Uses of Big Data

The panel provided guidance on the purposes for which Big Data should be used and those for which they should not be used. These run the gamut from model development and validation uses to other planning processes.

Passively Collected Origin-Destination Data
- Use for validation of internal person trip tables.
- Use to develop external trip tables.
- “Fuse” data by combining passively collected O-D data with household travel survey to create more robust datasets, but be cognizant that it is unknown who are missing from the Big Data.
- Use to fill gaps in the household travel survey dataset with the understanding that Big Data may not be represenative.
- Use to revise/update model sensitivities (rather than using judgment).
Travel Time and Speed Data
- Use for calibrating model speeds.
- Use as initial skims to the travel model (in feedback loop process).
- Complete skim matrices using the data.
- Examine origin-destination reliability metrics.
- Calibrate parameters of volume-delay functions (e.g., decomposing travel time into free flow and delay times).
- Use to identify free flow speeds.

6.4 Other Recommendations

Recommendations on several other topics were discussed, including research topics.

Weekend data are important and worthwhile to obtain for some types of analyses (e.g., tourist areas). However, it may not be important enough to spend resources on a complete weekend model in most situations.
Passive O-D data could be very useful (and potentially inexpensive) for special event analyses. However, there is almost no experience in this area currently.
The Census Bureau needs to improve LEHD data quality.

6.5 Research Topics

User information for rideshare modes (e.g., Uber and Lyft). One panelist would like to see a detailed analysis of the Transportation Network Company (TNC) data that is supposed to be shared. The panel discussed the experiment done by San Francisco County Transportation Authority (SFCTA) for collecting information from the Uber API. Another panelist suggested examining taxi data from airports over time to see how Uber has impacted the taxi economy for airport travel. Another panel member noted that there will be a paper presented at TRB that discusses data on TNCs in Austin. Gathering and collating the studies that actually have information would be useful. The Boston MPO has some Uber data, but it includes aggregate speed/travel time data; they may also have obtained Zip Code level O-D information from Uber.
Agency experience in acquiring data from passive sources. Synthesizing information about Big Data that have been purchased around the nation and understanding procurement procedures and prices as well as what types of data has been used where. One panelist noted a recent research project through NCHRP on this topic.
Data fusion approaches. Are there data fusion approaches that may be useful? One agency is considering combining 2005 and 2015 surveys. Other panelists noted that agencies need to be conscientious about how data are weighted and controlling for different timeframes in model estimation.

Appendix A: List of Peer Review Panel Participants

This section lists all individuals who attended the meetings, including panel members, agency staff, and peer review support staff.

Table 2. Peer Review Panel Members
Panel Member	Affiliation
Vince Bernardin	Resource Systems Group (RSG)
Chris Johnson	Portland Metro
Josie Kressner	Transport Foundry
Kimon Proussaloglou	Cambridge Systematics, Inc.
Erik Sabina	Colorado Department of Transportation (CDOT)
Kermit Wies	Northwestern University

Table 3. Met Council, MARC, and SEMCOG Agency Staff
Panel Member	Affiliation
Thomas Bruff	Southeast Michigan Council of Governments (SEMCOG)
Paul Bushore	Mid-America Regional Council (MARC)
Jilan Chen	Southeast Michigan Council of Governments (SEMCOG)
Jonathan Ehrlich	Metropolitan Council (Met Council)
Dennis Farmer	Metropolitan Council (Met Council)
Liyang Feng	Southeast Michigan Council of Governments (SEMCOG)
Guangyu Li	Southeast Michigan Council of Governments (SEMCOG)
Saima Masud	Southeast Michigan Council of Governments (SEMCOG)
Eileen Yang	Mid-America Regional Council (MARC)

Table 4. TMIP Peer Review Support Staff
Panel Member	Affiliation
Cemal Ayvalik	Cambridge Systematics, Inc.
Jason Lemp	Cambridge Systematics, Inc.
Thomas Rossi	Cambridge Systematics, Inc.
Sarah Sun	Federal Highway Administration (FHWA)

Appendix B: Peer Review Panel Meeting Agenda

Table 5. October 31, 2017 Agenda
Time	Description
8:30 a.m.	Welcome and introductions
8:45 a.m.	Summary of objectives for and logistics of the meeting —Thomas Rossi
9:00 a.m.	Presentation: “What is Big Data?”—Kimon Proussaloglou and Cemal Ayvalik
9:45 a.m.	Break
10:00 a.m.	Presentations by the three agencies MARC Met Council SEMCOG
11:00 a.m.	Presentations by experts from the peer review panel Vince Bernardin—“It’s Scary How Much We Know—And How Much We Don’t” Erik Sabina—“Big (and Semi-Big) Data in the Colorado Statewide Model” Josie Kressner—“Data-Driven Planning with a Passive Data Model”
11:45 a.m.	Working lunch—Summary of morning discussions and identification of themes and priorities for further discussion
12:45 p.m.	Agency and panel discussion on Big Data What Big Data sources and products are useful for MPOs? Best practices for use of Big Data Data validity and limitations Travel time calibration and the data needed (Big Data, INRIX/HERE, and/or traditional data)
2:45 p.m.	Break
3:00 p.m.	Agency and panel discussion on Big Data / conventional data Tradeoffs / substitution between conventional (i.e., survey and Big Data) Survey data collection timing (how often, continuous data collection, use of NHTS) Use of data for modeling external travel (Big Data versus conventional external station surveys)
5:00 p.m.	Adjourn

Table 6. November 1, 2017 Agenda
Time	Description
8:30 a.m.	Recap of Tuesday discussions
9:00 a.m.	Agency and panel discussion on other data topics Other uses of data beyond modeling Data visualization
10:00 a.m.	Break
10:15 a.m.	Agency and panel discussion on other modeling topics How policy boards / stakeholders reach planning decisions with modeling and technical analysis What new or additional technical skills MPOs are considering in recruitment of technical staff Dynamic traffic assignment
11:45 a.m.	Working lunch—Discussion of activity-based modeling
12:45 p.m.	Summary of recommendations for the agencies, and discussion with the agencies about the recommendations
2:00 p.m.	Break
2:15 p.m.	Discussion on recommendations for additional research into data collection and use of “Big Data” in modeling and planning
3:00 p.m.	Discussion of next steps to be taken by the agencies
4:00 p.m.	Adjourn

Peer Review Panel Member Biographies

C.1 Vince Bernardin, Resource Systems Group (RSG)

Vince Bernardin, Ph.D., is Director of RSG’s Travel Forecasting Group and manages their Indiana office. Vince has project experience in over twenty states and abroad developing and applying statewide, urban, and corridor-level travel forecasting models for both plan development and major project studies. Vince has been working actively with large-scale, anonymous, passively collected data to support travel forecasting for over seven years. He was the first to use truck GPS data to support statewide freight modeling over six years ago, the first to combine GPS and cell-phone datasets, the first to incorporate anonymous Big Data in an activity-based model, and the first to use disaggregate trace auditing for expanding LBS data. Vince has worked with every major source of mobile device data including Cuebiq, AirSage, ATRI, Streetlight, NPMRDS, INRIX, and HERE, with applications in more than ten states. Vince holds a BA in Philosophy from the University of Notre Dame, and an MS and Ph.D.in Transportation Engineering from Northwestern University.

C.2 Chris Johnson, Portland Metro

Chris Johnson currently manages the travel demand modeling team at Portland Metro. Chris earned his B.S. (History) and M.S. (Urban and Regional Planning) from the University of Wisconsin—Madison. Chris has spent the bulk of his professional career in the public sector, working for Metropolitan Planning Organizations in Wisconsin (Madison), Washington (Seattle), and Oregon (Portland). In addition to providing analytical support to a broad range of regional planning projects and studies, he has overseen several advanced travel demand and land use model development initiatives during the course of his career.

C.3 Josie Kressner, Transport Foundry

Josie Kressner started Transport Foundry in 2014 to enable transportation planners to use “big” data. Her efforts focus on new ways to utilize passively collected data. In particular, the National Science Foundation and the Transportation Research Board have funded projects to synthesize travel diaries from multiple passive data sources, including consumer and mobile phone data. This has morphed into a new modeling approach, passive data modeling. She has a Ph.D. in Transportation Systems Engineering from Georgia Tech, B.S. in Civil Engineering from Washington University in St. Louis, and a B.A. in Architecture from Washington University in St. Louis.

C.4 Kimon Proussaloglou, Cambridge Systematics, Inc.

Kimon Proussaloglou is an Executive Vice President with Cambridge Systematics. He leads its travel demand modeling and market research practice and is the director of the Chicago office. He received both a Doctorate and a M.Sc. in Civil Engineering from Northwestern University and a B.Sc. from Aristotelian University, Greece. He has worked for over 25 years with federal, state, regional planning and public transportation agencies. He has designed and analyzed dozens of customized surveys and has applied rigorous analytical and market segmentation techniques. He has integrated data from different sources to assess the size of travel markets, develop sophisticated statistical models, and inform the decision making process of transportation agencies.

C.5 Erik Sabina, Colorado Department of Transportation (CDOT)

Erik Sabina, P.E. is the manager of the Information Management Branch at the Colorado Department of Transportation, where he is leading a project to develop an advanced statewide travel model, helping to direct CDOT’s Chief Data Office initiative, and is conducting early planning activities for a statewide travel survey to be conducted in 2020. In his management role, he directs GIS, travel data acquisition, and mobility analysis activities for the DOT. In the course of his career he has led several leading-edge model development and data projects, including: the project to develop an activity-based travel model for the DRCOG region; the first regional travel survey to cover the entire Colorado Front Range area; and initial phases of the development of an implementation of the UrbanSIM land use model for the Denver region. Mr. Sabina has published numerous papers on travel surveying, activity-based model development and related topics, and has frequently served as an invited speaker and panelist throughout the US, including recently participating as an invited participant on the FHWA’s DOT—MPO Data Coordination Peer Exchange (Portland, OR, 2016), as a panelist on the NCHRP project 08-95, “Cell Phone Location Data for Travel Behavior Analysis”, and most recently as an invited speaker at the Department of Energy’s “Smart Mobility Modeling and Simulation Tools” (Oak Ridge, Tennessee, 2016) and “Designing Innovative Transportation Systems Solutions: Starting with the Data” (Berkeley, CA, 2017) workshops. He is also a member of the TRB committee on Urban Transportation Data and Information Systems (ABJ30). Mr. Sabina holds a BS degree in Aerospace Engineering from the University of Colorado, and an MS in Transportation from the Massachusetts Institute of Technology.

C.6 Kermit Wies, Northwestern University

Kermit Wies, Ph.D. is a Senior Research Fellow and Adjunct Professor with Northwestern University Transportation Center. Prior to joining Northwester, Dr. Wies served over 30 years with the Chicago Area Transportation Study (CATS) and the Chicago Metropolitan Agency for Planning (CMAP). During his tenure in MPO practice, he served in a variety of technical management and leadership capacities including travel demand modeling, long-range planning, survey research and socioeconomic forecasting.

Appendix D: References

The peer review made use of several references related to the Big Data topic. Some of these were referred to the meeting participants beforehand while others were presented or referenced during the meeting. The list below includes the publicly available reference materials, some of which have been updated from drafts since the meeting date.

Adler, T., V. Bernardin, J. Dumont, L. Flake, and H. Sadrsadat. The Promise and Limitations of Locational App Data for Origin-Destination Analysis: A Case Study. Final Draft, prepared for Federal Highway Administration, October 2017.
Adler, T., V. Bernardin, R. Chamberlin, M. Outwater, X. Ban, C. Chen, M. Bradley, C. Daniels, M. Fowler, J. Freedman, K. Shabani, C. Smith, and W. Woodford. Executive Summary on Understanding Origin-Destination Data of TMIP Bridging Data Gaps Series. Final draft, prepared for Federal Highway Administration, October 2017.
Bernardin, V., M. Bradley, R. Chamberlin, C. Daniels, M. Fowler, J. Freedman, M. Outwater, K. Shabani, C. Smith, and W. Woodford. Understanding Traditional Origin-Destination Data: A Survey. Final draft, prepared for Federal Highway Administration, October 2017.
Cambridge Systematics, Inc. Model Estimation and Validation Report, 2010 Travel Behavior Inventory. Prepared for Metropolitan Council, July 2015.
Cambridge Systematics, Inc. and Massachusetts Institute of Technology. Cell Phone Location Data for Travel Behavior Analysis, NCHRP Project 08-95. Draft report, prepared for National Academy of Sciences, June 2017.
Cambridge Systematics, Inc. “StreetLight Data Processing, SEMCOG E7 Model Update.” Prepared for SEMCOG, October 2017.
Chen, C., X. Ban, F. Wang, J. Wang, N. Siddique, R. Fan, J. Lee. Understanding GPS and Mobile Phone Data for Origin-Destination Analysis. Prepared for Federal Highway Administration, October 2017.
McAtee, S. “Validating Trip Distribution Using GPS Data in Southeast Michigan.” Presentation to the 2017 Transportation Planning and Applications Conference, Raleigh, N.C., May 2017.
Mid-America Regional Council Transportation Department. Travel-Time Study 2012 Report. October 2012.
RSG and SRF Consulting Group. Strategic Transportation Modeling and Data Collection Program Study. Prepared for Metropolitan Council, July 2015.

Southeast Michigan Council of Governments (SEMCOG): Peer Review

FHWA-HEP-18-034

December 2017

Table of Contents

List of Figures

List of Tables

List of Abbreviations and Symbols

1.0 Introduction

1.1 Disclaimer

1.2 Acknowledgments

1.3 Report Purpose

1.4 Report Organization

2.0 Peer Review Objectives

2.1 Agency Objectives

2.2 Discussion Topics

Data Related Topics

Other Modeling Related Topics

3.0 Overview of the Agency

3.1 SEMCOG Responsibilities

3.2 Regional Characteristics and Travel Model

3.3 SEMCOG Data

4.0 Big Data Overview

4.1 Data for Transportation Planning and Beyond

4.2 It’s Scary How Much We Know—And How Much We Don’t

4.3 Big (and Semi-Big) Data in the Colorado Statewide Model

4.4 Data-Driven Planning with a Passive Data Model

5.0 Peer Review Discussion

5.1 MPO Stakeholders

5.2 Themes and Priorities Discussion

5.3 Big Data Uses and Needs

5.4 Future Data Needs for Travel Forecasting

5.5 Household Travel Surveys

5.6 Dealing with Uncertainty

5.7 Data Visualization & Communicating with Stakeholders

5.8 On-Board Surveys

5.9 Activity-Based Models

6.0 Peer Review Recommendations

6.1 Informed Purchasing—Questions to Ask Big Data Vendors

6.2 Guidance on Metrics

6.3 Uses of Big Data

6.4 Other Recommendations

6.5 Research Topics

Appendix A: List of Peer Review Panel Participants

Appendix B: Peer Review Panel Meeting Agenda

Peer Review Panel Member Biographies

C.1 Vince Bernardin, Resource Systems Group (RSG)

C.2 Chris Johnson, Portland Metro

C.3 Josie Kressner, Transport Foundry

C.4 Kimon Proussaloglou, Cambridge Systematics, Inc.

C.5 Erik Sabina, Colorado Department of Transportation (CDOT)

C.6 Kermit Wies, Northwestern University

Appendix D: References