U.S. Department of Transportation
Federal Highway Administration
1200 New Jersey Avenue, SE
Washington, DC 20590
202-366-4000


Skip to content
Facebook iconYouTube iconTwitter iconFlickr iconLinkedInInstagram

Federal Highway Administration Research and Technology
Coordinating, Developing, and Delivering Highway Transportation Innovations

 
REPORT
This report is an archived publication and may contain dated technical, contact, and link information
Back to Publication List        
Publication Number:  FHWA-HRT-13-098    Date:  January 2014
Publication Number: FHWA-HRT-13-098
Date: January 2014

 

Human Factors Assessment of Pedestrian Roadway Crossing Behavior

Predictive Model

It was hoped that the results of this study might be used to develop a model that would predict whether a pedestrian would cross at a marked intersection or an unmarked non-intersection based on the features of the environment. Several approaches were taken to model the data.

Location 3 contained a significant proportion of crossings at the unmarked intersection, which made it significantly different than the other seven data collection areas (which did not include this type of crossing). As a result, Location 3 was excluded from all modeling presented in this section.

Training and testing data sets were created from the raw data. The training data is used to build predictions about the testing data. (All data are drawn from the original raw data set of 65,725 crossings.) To generate the training data set, a stratified random sample without replacement was performed using PROC SURVEYSELECT in SAS®. Location (i.e., marked intersection or unmarked non-intersection) was the stratifying variable. Approximately 70 percent of the raw data was selected for the training data set, and the remaining 30 percent was assigned to the testing data set.

Of the 42,231 observations selected for the training data set, about 93 percent involved pedestrians crossing at marked intersections. Hence, unmarked non-intersection crossings were considered a rare event. One technique to handle the occurrence of rare events is to over sample the rare events in the training data set. This methodology was used here.

Ten subsets of the training data set were created. Each subset was designed so that 50 percent of the observations involved marked intersection crossings and the remaining 50 percent of the observations involved unmarked non-intersection crossings. All 2,893 unmarked non-intersection crossings from the original training set were included in each training subset. To generate the remaining observations for each subset, 2,893 observations were randomly selected using a simple random sample without replacement from the 39,338 marked intersection crossings in the original training set. This process did not involve stratifying by location. Each subset contained a different set of marked intersection crossings. Thus, each training subset contained 5,786 observations.

PROC GLMSELECT was used to model the pedestrian crossing location. Although the crossing location was a binary variable (i.e., only two possible outcomes), previous research has shown that appropriate linear model selection techniques can produce models whose prediction capabilities are competitive to those produced through logistic modeling.(27) Each training subset was used once to model the pedestrian crossing location so that 10 models were produced. The testing data set was used to test each model. Under each model, 33.33 percent of the training subset was reserved for model validation. Stepwise selection was performed. The adjusted r‑square was used for selecting and stopping criteria. The average squared error of the validation data was used as the choosing criterion for the model to prevent over-fitting of the training data. Hierarchy was assumed for all model effects, meaning that an interaction term could not enter the model unless all main effects were already present in the model. Similarly, main effects could not exit the model before any respective interaction effects. Available predictors for each of the 10 models were A through O (see table 11; further details of each location are provided under each respective location description) and all second-order interactions, for a total of 121 effects. Predicted probabilities less than or equal to 0.5 corresponded to a predicted binary value of 0 (i.e., unmarked non-intersection). All other probabilities corresponded to a predicted binary value of 1 (i.e., marked intersection).

Table 11. Predictors and their respective descriptions used for the models.

Label

Description

Coding

A

Distance to the next marked crosswalk

Distance in ft

B

AADT

Expressed in thousands and rounded to the nearest 100

C

One-way or two-way street

1 or 2

D

Presence of physical barriers that might prevent a pedestrian from crossing the roadway

No barrier (0), partial barrier (1), or mostly blocked/large barrier (2)

E

Presence of a bus stop

None (0), bus exit near marked intersection (1), bus exit at non-intersection (2)

F

Range of the number of trip originators/ destinations

Range from very few (1) to a lot (5)

G

Presence of parking along the roadway

Yes (1) or no (0)

H

Presence of a center turning lane

Yes (1) or no (0)

I

Presence of a right turn only turning lane

Yes (1) or no (0)

J

Length of walk phase

Time in s

K

Length of don’t walk phase

Time in s

L

Curb-to-curb distance

Distance in ft

M

Presence and type of median

No median (0), soft (1), hard (2), median only on one side of crosswalk (3)

N

Presence of cross streets between marked crosswalks

No cross street, light-controlled cross street, not light-controlled cross street

O

Far marked crosswalk light controlled

Yes (1) or no (0)

Note: Values in parentheses are the values assigned to categorical variables.

 

The following subsections present the 10 (statistically) chosen models. For categorical variables, the second value corresponds to the value of the variable. For instance, xC,1 represents value 1 of predictor C (i.e., one-way traffic direction).


 

Model 1

·         Chosen Effects: Intercept, XC, XD, XI, and XM.

·         Adjusted r-square: 0.0551.

·         Model:
CrossingLoc = 0.1932 – 0.504xC,1 + 0.1385xD,0 + 0.4082xD,1 – 0.1265xI,2 + 0.2863xM,0 + 0.1102xM,2.

·         Classification of Training Subset: 58.30 percent correct.

·         Classification of Testing Data Set: 42.60 percent correct.

            o   Of those correctly assigned, 87.91 percent were for marked intersection crossings.

            o   Of those incorrectly assigned, 97.12 percent were for marked intersection crossings.

Model 2

·         Chosen Effects: Intercept, XA, XD, XI, and XM.

·         Adjusted r-square: 0.0603.

·         Model:
CrossingLoc = 0.1491 – 0.0000xA + 0.1301xD,0 + 0.3542xD,1 – 0.0933xI,0 + 0.3138xM,0 + 0.1956xM,2.

·         Classification of Training Subset: 58.62 percent correct.

·         Classification of Testing Data Set: 42.60 percent correct.

            o   Of those correctly assigned, 87.91 percent were for marked intersection crossings.

            o   Of those incorrectly assigned, 97.12 percent were for marked intersection crossings.

Model 3

·         Chosen Effects: Intercept, XD, XI, XJ, and XM.

·         Adjusted r-square: 0.0614.

·         Model:
CrossingLoc = 0.8614 + 0.3906xD,0 + 0.9442xD,1 – 0.6139xI,0 – 0.0234xJ + 0.5613xM,0 + 0.2649xM,2.

·         Classification of Training Subset: 58.95 percent correct.

·         Classification of Testing Data Set: 42.60 percent correct.

            o   Of those correctly assigned, 87.91 percent were for marked intersection crossings.

            o   Of those incorrectly assigned, 97.12 percent were for marked intersection crossings.

Model 4

·         Chosen Effects: Intercept, XD, XE, and XM.

·         Adjusted r-square: 0.0524.

·         Model:
CrossingLoc = 0.0217 + 0.0806xD,0 + 0.3239xD,1 + 0.0642xE,0 + 0.3555xM,0 + 0.2333xM,2.

·         Classification of Training Subset: 58.07 percent correct.

·         Classification of Testing Data Set: 66.28 percent correct.

            o   Of those correctly assigned, 94.89 percent were for marked intersection crossings.

            o   Of those incorrectly assigned, 89.88 percent were for marked intersection crossings.

Model 5

·         Chosen Effects: Intercept, XA, XD, XI, and XM.

·         Adjusted r-square: 0.0560.

·         Model:
CrossingLoc = 0.1856 – 0.0001xA + 0.1312xD,0 + 0.3366xD,1 – 0.0812xI,0 + 0.2868xM,0 + 0.1745xM,2.

·         Classification of Training Subset: 58.14 percent correct.

·         Classification of Test Data Set: 42.60 percent correct.

            o   Of those correctly assigned, 87.91 percent were for marked intersection crossings.

            o   Of those incorrectly assigned, 97.12 percent were for marked intersection crossings.

Model 6

·         Chosen Effects: Intercept, XD, XI, XJ, and XM.

·         Adjusted r-square: 0.0588.

·         Model:
CrossingLoc = 1.1831 + 0.5053xD,0 + 1.1918xD,1 – 0.8352xI,0 – 0.0354xJ + 0.7382xM,0 + 0.3473xM,2.

·         Classification of Training Subset: 57.93 percent correct.

·         Classification of Test Data Set: 42.60 percent.

            o   Of those correctly assigned, 87.91 percent were for marked intersection crossings.

            o   Of those incorrectly assigned, 97.12 percent were for marked intersection crossings.

Model 7

·         Chosen Effects: Intercept, XC, XD, XI, and XM.

·         Adjusted r-square: 0.0584.

·         Model:
CrossingLoc = 0.1559 – 0.0453xC,1 + 0.1663xD,0 + 0.4348xD,1 – 0.1771xI,0 + 0.2877xM,0 + 0.0888xM,2.

·         Classification of Training Subset: 58.38 percent correct.

·         Classification of Test Data Set: 42.60 percent correct.

            o   Of those correctly assigned, 87.91 percent were for marked intersection crossings.

            o   Of those incorrectly assigned, 97.12 percent were for marked intersection crossings.

Model 8

·         Chosen Effects: Intercept, XD, XI, XJ, and XM.

·         Adjusted r-square: 0.0476.

·         Model:
CrossingLoc = 1.4533 + 0.5920xD,0 + 1,3440xD,1 – 1.0218xI,0 – 0.0426xJ + 0.7929xM,0 + 0.3502xM,2.

·         Classification of Training Subset: 57.86 percent correct.


 

·         Classification of Testing Data Set: 42.60 percent correct.

            o   Of those correctly assigned, 87.91 percent were for marked intersection crossings.

            o   Of those incorrectly assigned, 97.12 percent were for marked intersection crossings.

Model 9

·         Chosen Effects: Intercept, XD, XI, XJ, and XM.

·         Adjusted r-square: 0.0519.

·         Model:
CrossingLoc = 1.6741 + 0.7527xD,0 + 1.6295xD,1 – 1.2252xI,0 – 0.0524xJ +
0.9266xM,0 + 0.4021xM,2.

·         Classification of Training Subset: 58.37 percent correct.

·         Classification of Testing Data Set: 66.28 percent correct.

            o   Of those correctly assigned, 94.89 percent were for marked intersection crossings.

            o   Of those incorrectly assigned, 89.88 percent were for marked intersection crossings.

Model 10

·         Chosen Effects: Intercept, XC, XD, XI, and XM.

·         Adjusted r-square: 0.0546.

·         Model:
CrossingLoc = 0.1803 – 0.0506xC,1 + 0.1533xD,0 + 0.4221xD,1 – 0.1205xI,0 + 0.2801xM,0 + 0.1033xM,2.

·         Classification of Training Subset: 58.16 percent correct.

·         Classification of Testing Data Set: 42.60 percent correct.

            o   Of those correctly assigned, 87.91 percent were for marked intersection crossings.

            o   Of those incorrectly assigned, 97.12 percent were for marked intersection crossings.

Model Summary

Of the 121 effects entered into each of the 10 models, only XA, XC, XD, XE, XI, XJ, and XM were ever chosen. No interactions were selected for the models using the aforementioned model restrictions. Table 12 summarizes each of the chosen effects and their respective parameter estimates.

Table 12. Summary of each of the model-selected effects (predictors) and their parameter estimates.

Effect

Mean

Minimum

Maximum

Occurrences in Model

Intercept

0.6058

0.0217

1.6741

10

XA

-0.0001

-0.0001

0.0000

2

XC,1

-0.0488

-0.0506

-0.0453

3

XD,0

0.3041

0.0806

0.7527

10

XD,1

0.7389

0.3239

1.6295

10

XE

0.0642

0.0642

0.0642

1

XI,0

-0.4705

-1.2252

-0.0812

9

XJ

-0.0385

-0.0524

-0.0234

4

XM,0

0.4829

0.2801

0.9266

10

XM,2

0.2290

0.0888

0.4021

10

 

XD (the presence of a physical barrier) and XM (the presence of a median) were selected for every model. Each of these variables had three values, two of which were assigned parameter estimates. For XD, value 0 (no barrier) always yielded a smaller parameter estimate than value 1 (partial barrier/blocked crosswalk). For XM, value 2 (median only on one side of the crosswalk) always yielded a smaller parameter estimate than value 0 (no median). XI,0 (no dedicated right turn only lane) was chosen in 9 of the 10 models. Parameter estimates for XC (traffic directionality), XI (dedicated right turn only lane), and XJ (length of walk phase) were always negative, indicating that these effects tended to decrease the probability of a marked intersection crossing (i.e., increase the probability of an unmarked non-intersection crossing). The mean intercept parameter estimate was 0.6058 (greater than 0.5), indicating that baseline predictions yielded a marked intersection crossing. Although XA (distance to the next marked crossing) was selected for the model twice, the influence of this variable was negligible in both cases.

Each of the 10 models accurately predicted the testing data set only about 50 percent of the time. Furthermore, the largest adjusted r-square value was only .0614. In other words, the model explained only about 6.14 percent of the variance in the data. These two factors combined suggest that any single model calculated thus far is not sufficient to predict whether pedestrians are more likely to cross at a marked intersection than at an unmarked non-intersection. As a result, more detailed and site- and factor-specific human factors analyses are described in the next section. It is hoped that this will provide greater insight regarding which environmental factors influence pedestrian crossing behavior.