Date: Thursday, October 24, 2013
Transportation professionals must rely heavily on statistical evidence to make important planning and infrastructure decisions. The advent of traffic analysis zones and other spatial analysis units are partially an artifact of this and thousands of planning decisions have already been made based on their geographic convenience. However, these analysis zones and other aggregated geographic definitions can create statistical problems commonly unknown or ignored by many transportation professionals.
A typical approach with polygon data of varying size or population density is to normalize the information so more appropriate comparisons can be made. For example, a common strategy is to divide the value of interest by the polygon's population, area, intersection count, or any other seemingly appropriate measure. Once this is done, the mistake is to presume the resulting normalized list can be compared without trouble. To make matters worse, everyday software tools make this (and only this) normalization procedure readily accessible. These lists are quite commonly ranked and subsequent planning decisions are made. However, an important statistical property of this procedure is that low density areas, such as the more rural census blocks or TAZs, will be overrepresented in the tails of the calculated distribution. This is because we expect higher variance in places with less information. Low density zones are equivalent to small samples and, thus, the typical concerns with small samples are particularly relevant.
Kevin discusses why this simple algebraic step can produce erroneous patterns and lead to misguided planning decisions. In addition, Kevin will discuss ways to avoid a connected visualization issue: the common use of choropleth maps to visualize these data leads to disproportionate focus on the areas of the map with larger polygons, which in many cases are the same polygons that receive the most color variation due to statistical problem described above.
Kevin M. Hathaway is a Vice President at RSG. He has been with the firm thirteen years working in the areas of statistical modeling and GIS across a number of the firm's disciplines including land use modeling, travel demand, public health research, housing policy, survey research, and environmental impact. He has led large federal, state, and academic research studies in the areas of applied statistics and the use of GIS technologies for data collection. He holds an M.S. from Dartmouth focused in applied statistics.