Skip to content
Facebook iconYouTube iconTwitter iconFlickr iconLinkedInInstagram
Office of Planning, Environment, & Realty (HEP)
HEP Events Guidance Publications Glossary Awards Contacts

FHWA and EPA National Near-Road Study - Detroit

6 Data Management, Analysis and Validation

It should be noted that with the extremely large data sets that are a result of the data collection efforts for this project, it will take a significant amount of staff time to thoroughly quality assure the data. Moreover, data analysis will also require a significant amount of staff time. Both activities are ongoing processes.

6.1 Data Management

6.1.1 Purpose/Background

The following section identifies the processes and procedures that were used to acquire, transmit, transform, reduce, analyze, store, and retrieve data. These processes and procedures will maintain the data integrity and validity through application of the identified data custody protocols. Figure 2 shows the data flow from the shelters, lab analysis and traffic data to raw data storage, data review and analysis.

6.1.2 Data Recording

The majority of the data collected for this study was recorded electronically. Field/lab personnel/teams used EPA-provided forms and checklists or develop documents as needed to accomplish data recording (EPA/FHWA Near-Road QAPP). To accomplish this, each monitoring site was equipped with data loggers. A data logger was set up to record each air quality monitor's output, perform specific data manipulations, and format the resulting data in preparation for downloading and subsequent loading to a SAS database(s). Data collected from real-time monitors (e.g., gas analyzers, sonic anemometers, etc.) were recovered via computers on a daily or near-daily basis.

Data that required manual entry, such as those obtained from the integrated particulate samplers or MSAT canister and DNPH sampling, were entered into a custom designed EXCEL spreadsheet that was used to generate sample labels, record data, and generate sample tracking forms for the integrated VOC, carbonyl and PM2.5 samples. The spreadsheet generated the unique sample codes and labels for each sampling day, location, time period, and sample type. All sample collection parameters (e.g., pressures, flows) were hand recorded on a printed blank form by the field operator at the time of sample collection. This information was then entered by the field operator into the electronic data collection form where embedded formulas made all necessary calculations and generated a summary page later entered into the study database. From this information, the chain-of-custody (COC)/tracking forms were generated and printed. Information recorded in the electronic data sheet included sample start and end times, pressures, and flow rates. The electronic files were copied to a dedicated flash drive that was shipped with the samples from the field to the EPA RTP facility. At this point EPA staff retrieved the data files from the flash drive, verified the data entries, made necessary corrections and delivered the corrected field files to the database administrator. All datasheet entries made by the field site operator were 100% verified at the laboratory by EPA staff. Verification compares the original handwritten datasheets to the field generated electronic datasheet. This electronic datasheet formed the basis of the final EPA database for the integrated samples. The spreadsheets were designed to reduce human error and provide a simple, effective means to collect and process a large number of samples. After laboratory analysis, EPA contractor staff provides the analysis data in EXCEL spread sheet format that was imported by the Database Administrator (DBA), into the database. Linkages between the field data and the laboratory analysis were made using the field sample codes.

Traffic data were downloaded by EPA staff approximately every 4 weeks. This data was in the form of an ASCII text file. These data were transmitted to the EPA DBA for entry into a SAS dataset.

6.1.3 Field and Laboratory Data Validation

Data validation occurred at each level of data collection and reporting with each activity recorded in laboratory notebooks. Data were conditionally validated after collection and after analysis. Conditional validation was the acknowledgment that field and laboratory staff did not or did notice problems with sample collection or analysis of a particular sample. Conditional validation helped identify problems during collection, storage, shipping, and analysis that may invalidate samples. Questionable data - defined as unusual values which the DBA determines can find no basis for being invalidated, were considered valid and annotated as such in the database. EPA is in the process of reviewing the database and making final determinations of data validity.

6.1.3.1 Instrument Performance Assessment Procedures

Each day, data was accessed using WinCollect software. Graphical reports were run to determine instrument status and data validity. Examples of these graphical reports are shown in the Appendices.

Instrument issues were identified and noted in a logbook at the computer being used to run WinCollect. The graphs and any instrument issues were noted in an email to the site operator, EPA and contractor staff.

6.1.3.2 Laboratory Data Verification

Data validation continued with the inspection of received samples/documents and the integration of the laboratory analyses with the corresponding field monitoring data. Validation consisted of an assessment of the reasonableness of the data, determination of data completeness, and comparison to the criteria defined for each specific parameter (such as pump flow rates, sampling duration, etc). Analytical data not appearing to be valid or not meeting validation criteria were flagged in the database.

6.1.4 Data Reduction

Original data will be kept and archived as a part of the project's record keeping. This archiving activity was carried out by the EPA DBA.

Data recorded on a continuous basis by data loggers were electronically retrieved on a weekly or near-weekly basis by the EPA DBA. In the event that continuously logged data was not electronically transmitted, the data would be sent to the EPA DBA via DVD or other appropriate media. (This use of DVD or other media never occurred for the continuous analyzers. This only occurred for video data.) Non-continuous data, such as filter samples, canister or cartridge samples, were first analyzed by laboratory analysis. In any event, all data were submitted to the EPA DBA for this project and entered into the SAS database(s). The only exception was the video data. The video data would have consumed too many network resources and thus was maintained on external hard drives.

6.1.5 Data Related Organizational Deliverables

The Field Site operator was responsible for ensuring the data loggers, computers, and communications were in good working order so that data were retrieved on a weekly or near-weekly basis by the EPA DBA.

Continuous data that were retrieved on a weekly or near-weekly basis by the EPA DBA included:

The Field Site operator was responsible for ensuring that non-continuous samples were recorded properly in logbooks, chain-of-custody forms. This data included:

The Field Site operator was responsible for ensuring that all logbooks, chain-of-custody forms, notes, and similar records were maintained in an orderly fashion so that a complete record of the project was documented.

Laboratory analysis staff was responsible for reporting the laboratory analytical results for the canister, DNPH and PM2.5 integrated samples to EPA. The data were provided in electronic format, Excel data worksheets. The data were reviewed for completeness. If any changes were necessary the data were investigated and changes documented in both the hardcopy and electronic files. The data were then submitted to the DBA for inclusion in the study database.

The following table (Table 10) lists the data-related deliverables, format of each deliverable, and personnel responsible.

Table 10. Data-related deliverables.

Deliverable

Custodian

Person Delivered To

Format

CO, NO, NO2, NOX,

Field Site Operator

EPA DBA

Electronic

BC

Coarse PM, PM10, PM2.5

Meteorological Data

DNPH Cartridge Sample Collection Information

Field Site Operator/Lab Tech

EPA DBA

Electronic

Canister Sample Collection Information

PM Filter Sample Collection Information

DNPH Cartridge Laboratory Data

Laboratory Staff

EPA WAM,

EPA WAM Delivers to DBA

Electronic

Canister Laboratory Data

PM Filter Laboratory Data

Traffic Data

Field Site Operator

EPA DBA

Electronic

6.1.6 Data Completeness

The DBA for this project developed a SAS program that provided an overview of the data completeness for this project. This table was updated as required by the needs of the project (weekly, bi-weekly, monthly, etc.). This table provided at a glance the overall instrument up-time versus instrument maintenance, failures or other field site issues. The following tables (Table 11) shown are for the time period of September 29, 2010 thru Mid-June, 2011.

Table 11. Summary of Data Completeness across by Site for Major Parameters.

Parameter

Station ID

Total

Station 1

Station 2

Station 3

Station 4

10 m roadside

100 m downwind

300 m downwind

100 m upwind

BC

98.73

99.76

98.25

98.69

98.22

CO

77.80

72.16

88.29

57.29

92.53

NO

89.53

95.99

95.12

70.95

95.34

NO2

89.55

96.00

95.13

70.97

95.36

NOx

89.55

96.01

95.13

70.97

95.36

PM10

88.20

78.25

99.05

88.22

87.31

PM2.5

86.98

77.72

99.40

88.13

82.82

PM Coarse

89.43

78.98

99.32

90.42

89.06

Wind Direction

85.38

79.30

98.44

75.82

87.60

Wind Speed

99.11

99.76

98.85

98.78

99.05

Traffic

> 99 (est.)

Black Carbon - Digital Data

Site name

Distance from Road

Na

(hours)

Completeness

Time span: 09/29/2010-06/20/2011

Station 1

10 m roadside

6142

97%

Station 2

100 m downwind

6146

97%

Station 3

300 m downwind

6166

97%

Station 4

100 m upwind

6179

98%

aA complete hour of sampling was set at a minimum of 10 five minute data points (50 min)

Sample

% Total

TO-15 canisters (VOCs)

88

TO-11 cartridges (aldehydes)

90

PM2.5 Filters

94

6.1.7 Data Storage and Retrieval

The EPA Project Officer will be consulted prior to disposal of records. The EPA DBA or similar designee is responsible for archiving, storage, and retrieval of all field and laboratory data files developed during the study at EPA. Copies of all study information (records/data) are retained and archived in accordance with Federal record storage guidelines.

6.1.8 Data Dictionary

The data dictionary provides a description of each database variable including range (minimum, maximum), type (numeric, alpha), missing value codes, and error flags (See Appendix). Descriptive information required to understand or interpret variables, including calculations or other manipulation, was included for each variable, as needed. This data dictionary is an on-going effort and is refined on an as needed basis.

6.2 Data Review, Verification, and Validation

The purpose of this section is to identify the procedures, and responsible parties that performed data review, verification and validation. Data verification is the process of evaluating the completeness, correctness, and conformance/compliance of a specific data set against the method, procedural, or contractual requirements. Data validation is an analyte- and sample-specific process that extends the evaluation of data beyond method, procedural, or contractual compliance (i.e. data verification) to determine the analytical quality of a specific data set.

Verification and validation of the procedures used to collect and analyze data are critical to the goals of this project and are performed after data collection, but prior to performing the flux calculations and uncertainty determinations. Study personnel were responsible for ensuring that the sampling methods, quality control protocols, and validation methods were followed and completed.

6.2.1 Validating and Verifying Data

Ideally, data undergoing evaluation should be compared to actual events. However, exceptional field events may occur, and field and laboratory activities may negatively affect the integrity of samples. In addition, some of the QC checks may indicate that the data failed to meet the acceptance criteria. Data identified as suspect, or does not meet the acceptance criteria, were flagged as indicated in the appendix.

While reviewing the CO data for Site 4 (100 m upwind), we observed a baseline shift in the data for the time period of September 29, 2010 thru December 6, 2010. After a review of the multi-point calibration datasheets, correction factors were applied. More detail may be found in the appendix.

6.2.2 Verification

As the data were being compiled (continuous and non-continuous data), a review of the data was conducted for completeness and data entry accuracy. All raw data that were hand entered from data sheets was checked prior to entry to the appropriate database. Once the data were entered, the data were reviewed for routine data outliers and conformance to acceptance criteria. Unacceptable or questionable data was flagged appropriately.

6.2.3 Validation

Validation of measurement data required two stages, one at the measurement value level and the second at the batch level. Records of all invalid samples were retained in the appropriate database. Information included a brief summary of why the sample was invalidated along with the associated flags. Logbook notes and field data sheets have more detailed information regarding the reason a sample was flagged. These documents were retrieved from the field sites and are stored at EPA.

The flags listed in Appendices were used to indicate that individual samples, or samples from a particular instrument, were invalidated.

6.3 Data Analysis

6.3.1 Statistical Analysis - Overall Project

The data analyses recommended below focused on the most basic issues of roadway emission impacts:

Additional data analyses may address additional questions such as the respective impacts of meteorological conditions, traffic volume, vehicle type, or other variables.

Given the complexity of the data set, multivariate analysis approaches using statistical analysis software such as JMP or SAS was necessary to assess the impact of various parameters of interest on the pollutant dispersion. However, emphasis was placed on reporting clear and understandable results from the statistical analysis. The field studies were conducted to understand the relation of mobile source emissions to key air contaminants and to determine if there was a statistically significant difference between the pollutant concentration measured at each site and the background concentration.

Data were analyzed using a combination of programs, including MATLAB version R2009b, Microsoft Excel 2007, JMP 8/9 and Sigma Plot 11/12. The data analysis included calculating summary statistics of data for each site for all wind conditions and for winds only from the West (downwind) (+/- 60 degrees from perpendicular), estimating concentration gradients for winds from the West, and observing concentrations as a function of wind direction for all winds.

Updated: 6/28/2017
HEP Home Planning Environment Real Estate
Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000