Office of Planning, Environment, & Realty (HEP)

# An Introduction to Panel Surveys in Transportation Studies

## 5. WEIGHTING PANEL DATA

Panel samples, like other survey samples, usually need to be weighted to produce unbiased population estimates. Weights are typically applied for three reasons:

• to account for differences in the selection probabilities of individual cases,
• to compensate for differences in response rates across subgroups, and
• to adjust for chance or systematic departures from the composition of the population.

In a panel survey, weights are often computed in two stages. First, a weight is developed for the initial wave following standard procedures for cross-sectional samples. Then, the weights from the initial wave are adjusted to produce longitudinal panel weights. The sections below provide an overview of the steps involved in the process. The procedures and computational formulas are discussed in detail in the Appendix.

### 5.1 WEIGHTS FOR THE INITIAL WAVE

Weights for the first wave of a panel survey are usually calculated in three steps. In the first step, each unit in the sample is assigned a base weight to compensate for differences in the selection probabilities of the individual units. In some cases, these differences arise by design. The PSTP, for example, deliberately oversampled transit users. As a result, transit users had a higher chance of selection into the sample than other sample members. In other cases, the differences in selection probabilities are a byproduct of the sampling process. In telephone surveys, for example, households with multiple telephone lines have a greater chance of selection into the sample than households with a single line. In either case, population statistics derived from the data will be biased unless they are appropriately weighted to adjust for unequal selection probabilities.

The second step adjusts the weights for differences in subgroup participation rates. In most surveys, certain groups of individuals tend to participate at lower rates than other groups. In transportation surveys, the underrepresented groups usually include the elderly, the less well-educated, urban dwellers, families with young children, and young adults. Such differences in participation rates can introduce nonresponse bias into the results. Weighting for nonresponse can help reduce those biases.

The third step compensates for differences between the composition of the sample and the composition of the population. These differences may occur purely by chance or because the sampling frame omits a portion of the population. Telephone surveys, for example, omit the portion of the population without telephones. Weighting the data to compensate for this omission helps reduce the bias in population estimates.

### 5.2 PANEL WEIGHTS

While calculating cross-sectional weights for the first wave is rather straightforward, calculating household-level longitudinal weights raises special problems because sample households can change over the life of the panel survey. For example, a household that initially consisted of a married couple may divorce, forming two "new" single-person households. There are several different ways to treat households that split up or that add new members over the course of the survey, and decisions about how to handle such changes affect the computation of longitudinal weights. Thus, an essential first step in weighting longitudinal data involves deciding how households will be defined for weighting purposes.

Another decision affecting the computation of the weights concerns the rules for defining responding and nonresponding households over time. In most panel surveys, households are classified as respondents if they participated in all rounds of data collection. However, in certain circumstances other definitions may be useful as well. Suppose, for example, an analyst wanted to compare data from the first and most recent rounds of data collection. In this case, it makes sense to classify households as respondents if they completed these two rounds of data collection. In many cases, it may be necessary to define responding households in more than one way to meet the analysis needs of the survey. In such situations, a separate set of weights is generated for each definition.

Once these definitional issues have been resolved, the calculation of longitudinal weights is straightforward, following the same basic steps as those used to calculate cross-sectional weights. The steps involved in this process are discussed in detail in the Appendix.

Updated: 3/25/2014
HEP Home Planning Environment Real Estate
Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000