# EXECUTIVE SUMMARY

According to the 2001 National Household Travel Survey (NHTS), Americans take 2.6 billion long-distance trips (defined in the NHTS as 50 miles or more one-way) per year, or 7.2 million trips per day. About 90 percent of long-distance trips are taken by personal vehicle while ten percent use public/commercial transportation modes. Over seven percent of long-distance trips are taken by air while two percent are by bus. Train travel represents almost one percent of long-distance trips (BTS, 2006). The Office of Highway Policy Information is interested in learning more about what factors influence the choice of travel mode for long-distance trips. Thus, the objective of this research is to develop quantitative mathematical methods to analyze how long-distance passenger travelers make their modal choices. Factors including -- but not limited to -- social, economic, demographic, trip length, trip purpose, available infrastructure facilities such as rail, airport, and highways, and available modal choices (air, train, bus, and personal passenger vehicles) were evaluated. Prior to the model development, a comprehensive literature and practice review was conducted, with the goal of assessing current knowledge on long distance multimodal passenger travel modeling.

Overall, literature and practices that explore the following topics were reviewed:

- Research performed on mathematical techniques for long distance passenger travel modal choice modeling;
- Data sources used for long distance passenger travel modeling that could supplement the NHTS data; and
- Factors that were found to influence long-distance passenger travel mode choice.

## Mathematical Techniques for Long Distance Passenger Travel Modal Choice Modeling

The literature review showed that analyses of long-distance multimodal passenger travel mode choice range in complexity from simple summary statistics and cross tabulations to more sophisticated mathematical modeling techniques such as nested multinomial logistic regression. There have been numerous research projects conducted on U.S.-based long-distance travel mode choice using the 2001 NHTS and the 1995 American Travel Survey (ATS) that utilize descriptive techniques such as summary statistics, cross-tabulations, and graphical representations to understand the relationship between mode choice and attributes such as socioeconomic and demographic factors as well as trip aspects (distance, duration, purpose).

The more complicated analyses involve discrete choice modeling which are statistical procedures that model choices made by people among a finite set of alternatives. In terms of long-distance travel, discrete mode choice models consider the travel mode that travelers choose for a particular long-distance trip based on certain attributes about the traveler or the trip to be taken. Although discrete choice models can take many forms, the majority of the mode choice models encountered in the literature review are based on some form of logistic regression. Logistic regression models are used to predict the probabilities of the different possible outcomes of a categorical dependent variable (mode choices), given a set of independent variables (socioeconomic characteristics, trip purpose, trip length, etc). Various forms of logit models were encountered in the literature review. Some examples include the most basic binary logit model which models a dichotomous choice in mode (e.g., airplane vs. train), a multinomial logit model which generalizes binary logistic regression by allowing more than two discrete outcomes (e.g., airplane, train, bus, automobile), and other forms of the multinomial logit such as nested or mixed logit models.

## Data Sources Used for Long-Distance Passenger Traveler Modeling

The lack of available U.S.-based data on long distance travel is the main hindrance to long distance travel research. Most research identified in this literature review focused on the U.S. has made use of either the relatively recent 1995 ATS and 2001 NHTS surveys or their precursor National Travel Surveys (NTS) conducted by the U.S. Census Bureau. According to the Bureau of Transportation Statistics (BTS), the NHTS provides the only authoritative source of information at the national level on the relationships between the characteristics of personal travel and the demographics of the traveler. Even though the 1995 ATS and 2001 NHTS are the richest and most used data sources on domestic long-distance travel, there are some drawbacks. First, the surveys do not contain information on level-of-service variables such as travel time and travel cost. Second, geographical information at the origin and destination of trips is aggregated to protect the confidentiality of respondents. Because of these issues, researchers have had to look at external data sources such as published fare and schedule guides for airline, railroad, and bus to consider travel cost and time or limit their analysis by only focusing on travel to and from a Metropolitan Statistical Area (MSA) to compensate for the data shortcomings. Although there do exist other travel surveys that have some data on long-distance travel, the literature review found that they lack the richness and size of the ATS or NHTS. National surveys (e.g. versions of the NTS) prior to the 1995 ATS were not reviewed in detail given the time elapsed since they were performed.

The American Automobile Association (AAA) has developed an approach for forecasting actual domestic travel volumes based on macroeconomic drivers such as unemployment, output, household net worth, asset prices including stock indices, interest rates, and housing market indicators. The report also includes variables related to travel and tourism, including prices of gasoline, airline travel and hotel stays as well as historical travel volume estimates from travel survey databases.

## Factors That Influence Mode Choice

For a lot of studies that examine long-distance travel, the focus has been primarily on the impact of socioeconomic factors at the individual and household levels. In these studies, the relationship between factors such as age, income, gender, and household location (urban vs. rural) were examined. Another area of focus for long-distance travel studies is the incorporation of land-use factors. Research has found that land-use factors have a significant impact on travel mode choice. For example, Algers (1993) found that the total number of trips over 100 kilometers was sensitive to the characteristics of the destination including population size and number of jobs. Using the 1989 Netherlands National Travel Survey, Limtanakool et al (2006) studied the effects of land use attributes such as population density, proximity to infrastructure, and land use diversity on travel mode choice and concluded that spatial configuration of land use and transport infrastructure has a significant impact even when socioeconomic characteristics and travel time are taken into account. Other studies have found that travel time and travel costs heavily influence mode choice.

One constant across all the research encountered is that the relationship between mode choice and certain factors varies by trip purpose. For example, the mode share for automobiles is higher for personal or social trips, while air travel is the preferred method for business travel. This finding as well as all others from the literature and practice review presented here was used to develop the mode choice models presented below.

## Mathematical Models for Predicting Long-Distance Passenger Mode Choice

The research team decided on the 2001 NHTS as the primary data source for this modeling effort. The 2001 NHTS is a national survey of daily and long-distance travel. The survey includes demographic characteristics of households, people, vehicles, and detailed information on long-distance travel for all purposes by all modes. NHTS survey data are collected from a sample of U.S. households and expanded to provide national estimates of trips and miles by travel mode, trip purpose, and a host of household attributes. According to the Bureau of Transportation Statistics (BTS), the NHTS provides the only authoritative source of information at the national level on the relationships between the characteristics of personal travel and the demographics of the traveler. The NHTS collected travel data from a national sample of the civilian, non-institutionalized population of the United States. There were approximately 66,000 households in the final 2001 NHTS dataset. The final datasets contained about 45,000 long distance trips.

Predictive factors from the NHTS that were used in the modeling included characteristics of the traveler (age, race, employment status, frequency of internet use, frequency of public/commercial transportation use), characteristics of the trip (distance, number of nights away, number of people traveling, and whether it included a weekend), household and land-use characteristics such as household income, number of vehicles, population density, and urban/rural status.

Although the NHTS gives detailed information on individual and trip level demographic information, several variables from external data sources were included in the model. These variables account for economic and environmental factors that were identified as determinants of individual travel choice mode by other studies but that are not present in the NHTS data. Two main factors governing individual choice of travel mode that these variables particularly seek to include are the economic burden of particular modes of travel as well as the availability and access to transportation infrastructure. Along with demographic information, this additional information can serve as a means to increase the resolution of predictions about travel mode choice based on observed data. Economic variables include the Research and Innovative Technology Administration (RITA) Air Travel Price Index and the Consumer Price Index (CPI) Private and Public Transportation Components. The number of different types of transportation sites within a 25 mile radius of the traveler’s origin was also used. These include airports, bus depots, light and transit rail stations and standard rail stations.

Discrete choice models are statistical procedures that model choices made by people among a finite set of alternatives. Specifically, discrete choice models statistically relate the choice made by each person to the attributes of the person and the attributes of the alternatives available to the person. In terms of long-distance travel, discrete mode choice models consider the travel mode that travelers choose for a particular long-distance trip based on certain attributes about the traveler or the trip to be taken. Although discrete choice models can take many forms, the majority of the mode choice models involving transportation are logit based. For this research, logistic regression models were used to predict the probabilities of the different possible outcomes of a categorical dependent variable (mode choices of personal vehicle, air, bus, and train), given a set of independent variables (characteristics of the traveler, trip, and household, land-use factors, economic variables, and availability of transportation infrastructure).

A separate model was developed for each trip purpose: business, pleasure, and personal business. The 2001 NHTS provides an analysis weight for each long-distance trip. The weight is defined at the person trip/travel period level. These weights reflect the selection probabilities and adjustments to account for nonresponse, undercoverage, and multiple telephones in a household. Point estimates of population parameters as well as coefficients of predictors are impacted by the value of the analysis weight for each observation. To obtain estimates that are minimally biased the analysis weight was used to weight the results.

Coefficients associated with each predictive factor were estimated using the maximum likelihood estimation technique using the SAS® (version 9.3) statistical software package. The SURVEYLOGISTIC procedure was used to take into account the complex nature of the 2001 NHTS sample design. Model coefficients for the predictor variables as well as marginal probability effects were estimated from the model.

Validation of the long-distance passenger travel modal choice models was conducted by testing the models on long-distance travel survey data. The same 2001 NHTS dataset used for model calibration was used for model validation. *K*-fold cross-validation is a statistical technique for assessing how the results of the statistical model will generalize to an independent dataset. The data is first partitioned into *k* equally (or nearly equally) sized segments, or folds. Then, *k* iterations of calibrating and validation are performed such that a different fold of the data is held out for validation while the remaining *k*-1 folds are used to calibrate the model within each iteration. For this research, 10-fold cross-validation was conducted separately to validate each of the three multinomial mode choice models (one for each trip purpose). In each iteration, the fitted model was applied to the validation dataset (i.e., predicted probabilities for each mode of transportation were calculated for each trip in the validation dataset). Aggregate mode shares were calculated by summing the calculated probabilities for each trip record in the validation dataset. These were compared against the observed aggregate mode shares of the validation dataset in order to observe how well the model could replicate the observed mode shares. This process was repeated nine times, each time choosing a different segment of the data to be held out as the validation dataset. Once all iterations were complete, the comparison of predicted versus observed aggregate mode shares were combined across the ten iterations and statistics summarizing the predictive ability of the model were calculated.

Major findings from this research are as follows:

- Summary statistic and model results provide evidence that mode choice varies by trip purpose and that separate models are warranted;
- There were a much greater number of factors found to significantly influence mode choice observed across trip purpose types for personal vehicle and air travel outcomes than bus and train outcomes. This is due, in part, to the low frequency of bus and train trips in the NHTS;
- Characteristics of the survey respondents who were taking the trips tended to be more significant predictors of travel mode choice than the characteristics of the trips themselves. Specifically, familiarity with public/commercial transportation systems through frequent usage resulted in a large decrease in the likelihood of taking personal vehicles for business travel (eight percent) as well as a smaller but still significant decrease in the likelihood of taking personal vehicles for pleasure travel (three percent). Interestingly, high public/commercial transportation use was highly statistically significant for predicting increases in the use of air travel (four percent for business, 1.2 percent for pleasure). For business travel, frequent web use also increased chances of taking air travel by about 4.5 percent. Income was also a strong predictor of travel mode choice for both business and pleasure travel. Lower income travelers were more likely to take personal vehicles and less likely to take air travel. The lower likelihood of air travel as income decreases shows the stronger statistical significance trend, and this reinforces the hypothesis that fixed attributes like income are much stronger determinants of travel mode. Overall, income and behavioral variables seemed to display the highest statistical significance in model results. This indicates that people’s travel mode choices may be driven largely by fixed attributes that revolve around residence and demographics rather than consideration of the dynamic costs and benefits of different modes of travel;
- Marginal effects for variables describing trip characteristics other than distance tended to have mixed effects for different travel mode outcomes. A weekend trip had a statistically significant marginal effect for personal vehicle and air travel for the two largest travel purpose types (business and pleasure). There was a two to three percent decrease in the probability of taking a personal vehicle and a two percent increase in the probability of taking air travel if the trip included a weekend for business and pleasure travel. The number of persons on the trip also significantly impacted likelihoods of different mode choices; for business travel it corresponded to a 0.5 percent decrease in the chances of taking personal vehicle per person and a 0.5 percent increase in the chances of taking air travel while for pleasure travel it increased chances of taking bus travel by 0.12 percent per person. Lastly, for pleasure travel, the number of nights away increased the probability of taking personal vehicles by 0.19 percent a night and decreased the probability of taking bus travel by 0.15 percent a night.
- The results suggest that respondents’ demand for different modes of travel is relatively decoupled from cost considerations such as the price of airfares or gasoline and that the preference set may be fairly inelastic in the short run – that is, not responsive to changes in price;
- Available transportation infrastructure only appeared to be influential for business travel. The number of airports in a 25 mile radius increased the chances of taking air travel by 1.7 percent per airport. Other existing transportation infrastructure did not appear to play a significant role in travel choice, but this could also be a product of large numbers of observations in the data set that chose personal vehicle as the primary mode of transport and thus do not display any preferences towards certain types of existing networks.
- One of the most consistently significant variables in predicting mode choice was route distance of a trip from origin to destination. The probability of choosing to travel in a personal vehicle decreases exponentially with travel distance while the probability of choosing air travel increases exponentially with travel distance; and
- The model predicts very well for the personal vehicle and air modes but loses some predictive power for the bus and train modes. The relative lack of predictive power for bus and train modes indicates that the survey data may not be sufficient to accurately assess some outcomes and that alternative sampling techniques should be explored in future national travel surveys that provide more data for bus and train trips.