U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations
|This report is an archived publication and may contain dated technical, contact, and link information|
Publication Number: FHWA-RD-98-166
Date: July 1999
Guidebook on Methods to Estimate Non-Motorized Travel: Supporting Documentation
2.5 Discrete Choice Models
Descriptive Criteria: What is It?
A discrete choice model predicts a decision made by an individual (choice of mode, choice of route, etc.) as a function of any number of variables, including factors that describe a bicycle or pedestrian facility improvement or policy change. The model can be used to estimate the total number of people who change their behavior in response to an action. As a result, the change in both non-motorized and motorized trips and distance of travel can be estimated. The model can also be used to derive elasticities, i.e., the percent change in bicycle or pedestrian travel in response to a given change in any particular variable.
For a general discussion of discrete choice modeling principles and methods, see Ben-Akiva and Lerman (1985) and Horowitz, Koppelman, and Lerman (1986). Key points are summarized here.
A discrete choice model is a mathematical function which predicts an individual's choice based on the utility or relative attractiveness of competing alternatives (for example, bike or drive). The logit function is a common mathematical form used in discrete choice modeling(1).
The model generally includes characteristics of the individual (e.g., age, gender, and income) and relative attributes of competing choices (e.g., cost and time of auto vs. bike travel). It also might include environmental factors, personal attitudes, or other factors which are thought to influence the choice in question. The model is developed from a data set containing individual trip decisions, characteristics of alternative choices for the trip, geographical characteristics, and characteristics of the individual.
A simple discrete choice model, for example, might be used to predict the probability of taking a trip by bicycle vs. by car, based on three factors:
1. Time difference between the two modes for the trip.
2. Whether the respondent is male or female.
3. Whether or not bicycle lanes are available.
The estimated coefficients (or weights for each factor) can be used to derive elasticities. Elasticities indicate the percent change in the variable being predicted (i.e., probability of choosing a mode) for a given change in one of the independent variables, holding the other variables constant. While transferring elasticities to realworld situations involves a number of assumptions, the elasticities may be used to estimate the change in users as a function of a given change in a facility or policy variable.
Ideally, instead of simply using elasticities, the model is applied to the entire affected population to estimate the total number of people who will change their behavior as a result of an improvement. To do this, an affected population must be defined. Examples of such a population might be residents in a census tract or transit users who access a particular transit station. The population must be defined in groups for which either an average value or a distribution is known for every variable in the model.
There are three alternative methods for aggregating results for the population (Horowitz, Koppelman, and Lerman, 1986):
1. The "naive" method. Average values are assumed for each variable except the one of interest. In the current example, an average trip time difference and an average gender value (such as 50 percent M/ 50 percent F) are used, and the probability of choosing the bicycle mode is compared with and without bike lanes. Significant errors may be introduced, however, by using single aggregate values for population variables.
2. The "market segmentation" method. The population is divided into groups (i.e., male vs. female and with different travel distances). For each group, a mode choice probability is estimated, multiplied by the total population of the groups, and summed across all groups. This is repeated with and without bicycle lanes, and the total numbers of bicycle trips for the two alternatives are then compared. This reduces, but does not eliminate, aggregation errors. The method is widely used in practice (see Wilbur Smith Associates, 1996).
3. The "sample enumeration" method. This method takes a random sample of the total population, estimates a mode choice probability for each person in the sample, and averages the sample probabilities to estimate a mode share for the entire population. This method is the most accurate of the three but is also the most difficult to apply.
Discrete choice models developed from stated-preference surveys can be calibrated/validated using models developed from data on actual (revealed) behavior (see "Inputs/Data Needs").
Discrete choice models are developed from data sets containing individual trip decisions, including characteristics of the individual and of alternative choices for the trip. Two types of data, revealed-preference and stated-preference, may be used, as described below:
1. "Revealed-preference" data, or data on actual behavior. This may be collected from a travel survey, which determines characteristics of a trip (origin and destination, mode, travel time, etc.) as well as characteristics of the individual and other influencing factors. Observations on trip decisions by 1,000 to 3,000 people are required. (Horowitz et al., 1985).
For this type of data to be useful for predicting non-motorized travel, the data set must include the following:
Stated-preference surveys are discussed further under the entry on Preference Surveys.
Potential Data Sources:
Mode choice models including bicycling and/or walking are usually developed directly from the results of special data collection efforts. The most common type is to conduct a stated-preference survey of users and potential users. Respondents are asked to choose between alternatives with different attributes. The results of these choices are then combined with information about the respondent, her/his current choice of the presently available alternatives, and other environmental factors, to develop a predictive model.
Household travel surveys are a potential existing source of revealed-preference data. These surveys are conducted routinely by many Metropolitan Planning Organizations (MPOs), although not all have included non-motorized travel in the past. Metropolitan travel surveys, however, generally suffer from two major limitations for developing discrete choice models of non-motorized travel:
1. Characteristics of non-motorized alternatives for each trip are generally not collected, so the effects of changing policies or improving facilities cannot be evaluated.
2. Most surveys do not include enough observations of non-motorized trips to develop predictive models for these modes.
It is possible that the first limitation could be overcome by collecting additional data describing existing bicycle/pedestrian facilities or environments, so that these factors can then be related to the locations of survey respondents and used as a predictor of travel decisions. However, the effort involved in collecting this data could be considerable.
The National Personal Travel Survey (NPTS) is another potential source of individual travel behavior data, although collection of local facility or environmental data for inclusion in a model based on this data source would again be difficult and the amount of geographic detail in the survey is limited.
Surveys of transit access mode are sometimes conducted by transit agencies and have also been used as a data source for predicting access mode choice (Wilbur Smith Associates, 1996; Loutzenheiser, 1997). An advantage of these surveys is that they contain a significant percentage of non-motorized trips and generally distinguish between walk and bicycle access. Transit access surveys may need to be supplemented with additional site-specific data collection on bicycle and/or pedestrian facility factors to evaluate the effects of these factors.
Models can also be developed from special revealed-preference data collection efforts, which relate information from counts and/or surveys of users to descriptors of the facilities or travel environment encountered (for a discussion, see Hunt and Abraham, 1997).
Discrete choice models can be estimated using a desktop microcomputer with specialized software such as ALOGIT.
A knowledge of statistical analysis and discrete choice modeling techniques is required, in addition to familiarity with sources and methods of collecting survey data.
Discrete choice models assume that choices made by individuals can be predicted based on a limited set of quantifiable factors and that people are essentially rational decision-makers who seek to make choices that maximize their utility. Furthermore, the relationship between the underlying factors and the probability of the individual choosing a particular alternative is assumed to bear a particular functional form (i.e., a logit function).
Facility Design Factors:
A range of facility design factors can be included in a discrete choice model. The inclusion of design factors is generally limited by:
1. For stated-preference surveys, the need to keep hypothetical alternatives simple and understandable to the respondent.
2. For revealed-preference surveys, the resources required to collect data describing existing facilities. Also, design factors are limited to those that currently exist in the realworld.
Possible outputs include:
Work Trip Mode Choice:
Kocur, Hyman, and Aunet (1982) describe the development of work-trip mode choice models for the Wisconsin Department of Transportation (WisDOT). In the late 1970s and early 1980s, WisDOT developed a series of mode-choice models to consistently assess transportation policy issues across urban areas in the State. Work-trip logit mode choice models are developed for four sets of metropolitan areas in Wisconsin based on the results of stated and revealed-preference surveys. Bicycle and walk are included as separate mode choices. Bicycle
facility variables include distance to work, existence of a bike lane (yes or no), street surface (smooth or rough), and traffic (busy or quiet). Pedestrian facility variables include distance to work, presence of sidewalks, and season (summer or winter). The models are used to estimate the effects of various policies on mode split. Addition of marked bicycle lanes to all streets in the cities studied was estimated to increase total summertime bicycle trips by 39 percent. Allowing pavement to deteriorate from smooth to rough was estimated to reduce summertime bicycle work trips by 42 percent.
Transit Access Mode:
Discrete choice models have been developed in a number of areas to predict transit access mode. A study for the Chicago Regional Transit Authority (Wilbur Smith Associates, 1996) estimates the effects on transit mode choice access of various improvements to bicycle and pedestrian facilities in station areas, based on estimation of a discrete mode choice model from both revealed-preference and stated-preference survey data. For more information, see separate entry on "Discrete Choice Models - Transit Access."
Taylor and Mahmassani (1996) developed a discrete choice model based on a hypothetical-choice stated-preference survey to assess preferences for work trip mode choice (auto, park-and-ride, or bike-and-ride). Facility factors include on-street bicycle facility type, bicycle parking facility type, and access distance to transit. Only relative utilities are reported, and the model is not used to predict changes in total mode use as a result of facility changes.
Loutzenheiser (1997) developed a discrete choice model of transit mode choice access based on Bay Area Rapid Transit passenger surveys and station area characteristics. Urban design and station area characteristics were found to be secondary to individual characteristics in determining the choice to walk. (Station area variables include nearby arterials and freeways; grid pattern; population density; and type and mix of land uses. Descriptors were developed using GIS techniques.)
Inclusion of Attitudinal and Perception Factors:
Additional studies have focused on determining the effect of users' attitudes and perceptions on the choice to walk or bike. These studies have also included rudimentary variables describing the quality of bicycle and/or pedestrian facilities.
Katz (1996) modeled demand for commuter bicycle use in two steps: (1) the choice to participate (bicycle) is modeled, through factor analysis and logit regression, based on attitudes and personal characteristics; and (2) mode choice is modeled through discrete choice (logit) models which include attitudes, personal characteristics, and structural factors (cost, distance, etc.). Bicycle facility measures include bicycle cost, trip distance, availability of showers and parking at the trip end, and percent of trip on a bike path. Elasticities for the bicycle mode are -0.88 for trip distance, +0.58 for percent of trip on bike path, and +0.26 for car cost. Inclusion of attitudinal factors is found to significantly improve model fit. Data are based on telephone and in-person surveys and choice experiments. An extensive discussion and literature review of the behavior modeling issues and techniques relevant to bicycle travel modeling is also included.
Kitamura, Mokhtarian, and Laidet (1997) conducted stated-preference surveys to determine the relative influence of socioeconomic, attitudinal, and neighborhood characteristics on travel behavior. Discrete choice models were developed to predict mode choice and total number of trips by mode. Facility variables included presence of sidewalks and bike paths as well as perceptions of whether streets are pleasant for walking or bicycling.
Noland (1995) developed multinomial logit models which relate use of a mode to perceptions of risk and convenience of that mode (perceptions of cost, comfort, and relevant personal variables are also included). Risk and convenience perceptions were measured based on surveys of bicyclists and of the general population. Modes include auto, transit, bicycle, and walk. The model was used to evaluate the general effect of policy variables on mode split. Elasticities were developed with respect to bicycle convenience, comfort, parking availability, competency, and lack of shoulders, as well as auto cost, convenience, and comfort. Sample enumeration was used to predict future mode splits as a result of policy changes. "Short-run" and "long-run" elasticities and mode splits were developed, which assume that many people do not have a choice of modes in the short run, but that in the long run different urban form policies and residential location decisions could allow everyone a choice of modes.
Mode Choice in Travel Demand Models:
Discrete choice models have been widely used to predict mode choice for work trips and other types of trips in the development of regional travel models. However, these models rarely include bicycle or walking trips as separate modes. For a discussion of models that do include bicycle and walking trips, see entry for "Regional Travel Models."
Route Choice Models:
Discrete choice models have also been applied to predicting route choice or facility preference as a function of route/facility characteristics (see "Discrete Choice Models - Route Choice," Method 2.6).
Ben-Akiva, M. and S.R. Lerman. Discrete Choice Analysis: Theory and Application to Travel Demand. Cambridge, MA: The MIT Press, 1985.
Horowitz, Joel L.; Frank S. Koppelman and Steven R. Lerman. A Self-instructing Course in Disaggregate Mode Choice Modeling. Prepared for the Urban Mass Transit Administration (now Federal Transit Administration), Washington, DC, December 1986.
Loutzenheiser, David R. Pedestrian Access to Transit: A Model of Walk Trips and their Design and Urban Form Determinants Around BART Stations. Transportation Research Board, 76th Annual Meeting, Washington, DC, January 1997.
Katz, Rod. Demand for Bicycle Use: A Behavioural Framework and Empirical Analysis for Urban NSW, Doctoral Thesis, The Graduate School of Business, The University of Sydney, Sydney, NSW, Australia, December 1996.
Kitamura, Ryuichi; Patricia L. Mokhtarian and Laura Laidet. A Micro-Analysis of Land Use and Travel in Five Neighborhoods in the San Francisco Bay Area. Transportation Vol. 24, No. 2, May 1997.
Kocur, George; William Hyman and Bruce Aunet. Wisconsin Work Mode-Choice Models Based on Functional Measurement and Disaggregate Behavioral Data. Transportation Research Record 895, 1982.
Noland, Robert B. and Howard Kunreuther. Short-Run and Long-Run Policies for Increasing Bicycle Transportation for Daily Commuter Trips. Transport Policy, Vol. 2, No. 1, 1995.
Taylor, Dean and Hani Mahmassani. Analysis of Stated-Preferences for Intermodal Bicycle-Transit Facilities. Transportation Research Record No. 1556, 1996.
Wilbur Smith Associates. Non-Motorized Access to Transit: Final Report. Prepared for Regional Transportation Authority, Chicago, IL, July 1996.
Evaluative Criteria: How Does It Work?
Kocur, Hyman, and Aunet (1982), in calibrating their behavior models based on actual behavior, found that the calibration coefficients are "larger than we would ideally like to see, but they indicate a relatively good correspondence between the experimental models and actual behavior."
The performance of the other models discussed here has not been evaluated.
Use of Existing Resources:
Development of a discrete choice model usually requires new data collection efforts. In some cases, it may be possible to transfer coefficients from a model developed in one area to other areas, eliminating the need for local data collection. However, this implies that the two situations are similar with respect to factors not included in the model.
Travel Demand Model Integration:
Discrete choice models are widely used to predict mode choice in existing travel demand models. It is a logical extension of existing practices to include non-motorized travel in this step. The added complication and data requirements, however, have so far limited the inclusion of non-motorized travel in most models.
Applicability to Diverse Conditions:
Determining the variables to include in a model and the required data collection efforts represents a tradeoff. The more specific the variables to the improvement being analyzed, the more accurate the results in analyzing that improvement. On the other hand, the model will be less applicable in different situations, and if a different improvement is to be analyzed, new data collection and modeling efforts may be required. Models with general environment or facility descriptors may have broader applicability but will be less suited for analyzing the impacts of a particular improvement. As an example, a model of bicycle choice may be estimated regionally using a variable of "miles of bicycle lanes available." Such a model may be of general use for evaluating and comparing facility improvement policies. For evaluating the effects of a specific improvement, however, it may not be as accurate as a model based on a survey in one locality which includes as a variable "bike lanes from point A to point B on street X."
Kocur, Hyman, and Aunet compared coefficients among the four sets of cities of varying sizes for which they were developed. They found that "most of the coefficients show relatively little variation across cities, which suggests that transferability of these coefficients among urban areas is a possibility."
Usage in Decision-Making:
No information is available.
Ability to Incorporate Changes:
See "Applicability to Diverse Conditions" above.
A knowledge of discrete choice modeling techniques is required, in addition to familiarity with sources and methods of collecting survey data.1The logit function is an "S-shaped" function relating one or more independent variables, such as the difference in auto and bicycle travel times, to the probability of making a specific choice, such as choosing to bicycle.