Skip to content
Facebook iconYouTube iconTwitter iconFlickr iconLinkedInInstagram
Office of Planning, Environment, & Realty (HEP)
HEP Events Guidance Publications Glossary Awards Contacts

FHWA and EPA National Near-Road Study Las Vegas

6 Data Management, Analysis and Validation

It should be noted that with the extremely large data sets that are a result of the data collection efforts for this project, it will take a significant amount of staff time to thoroughly quality assure the data. Moreover, data analysis will also require a significant amount of staff time. Both activities are ongoing processes.

6.1 Data Management

6.1.1 Purpose/Background

The following section identifies the processes and procedures that were used to acquire, transmit, transform, reduce, analyze, store, and retrieve data. These processes and procedures will maintain the data integrity and validity through application of the identified data custody protocols. Figure 2 shows the data flow from the shelters, lab analysis and NDOT traffic data to raw data storage, data review, and analysis.

6.1.2 Data Recording

The majority of the data collected for this study was recorded electronically. Field/lab personnel/teams used EPA-provided forms and checklists or develop documents as needed to accomplish data recording (EPA/FHWA Near-Road QAPP). To accomplish this, each monitoring site was equipped with data loggers. A data logger was set up to record each air quality monitor's output, perform specific data manipulations, and format the resulting data in preparation for downloading and subsequent loading to a SAS database(s). Data collected from real-time monitors (e.g., gas analyzers, sonic anemometers, etc.) were recovered via computers on a daily or near-daily basis.

Data that required manual entry, such as those obtained from the integrated particulate samplers or MSAT canister and DNPH sampling, were entered into a custom designed Excel spreadsheet that was used to generate sample labels, record data, and generate sample tracking forms for the integrated VOC, carbonyl and PM2.5 samples. The spreadsheet generated the unique sample codes and labels for each sampling day, location, time period, and sample type. All sample collection parameters (e.g., pressures, flows) were hand recorded on a printed blank form by the field operator at the time of sample collection. This information was then entered by the field operator into the electronic data collection form where embedded formulas made all necessary calculations and generated a summary page later entered into the study database. From this information, the chain-of-custody (COC)/tracking forms were generated and printed. Information recorded in the electronic data sheet included sample start and end times, pressures, and flow rates. The electronic files were copied to a dedicated flash drive that was shipped with the samples from the field to the EPA RTP facility. At this point EPA staff retrieved the data files from the flash drive, verified the data entries, made necessary corrections and delivered the corrected field files to the database administrator. All datasheet entries made by the field site operator were 100 percent verified at the laboratory by EPA staff. Verification compares the original handwritten datasheets to the field generated electronic datasheet. This electronic datasheet formed the basis of the final EPA database for the integrated samples. The spreadsheets were designed to reduce human error and provide a simple, effective means to collect and process a large number of samples. After laboratory analysis, EPA contractor staff provides the analysis data in Excel spread sheet format that was imported by the Database Administrator (DBA), into the database. Linkages between the field data and the laboratory analysis were made using the field sample codes.

Nevada DOT traffic data were uploaded to the FTP Science Server approximately every two weeks by an NDOT FAST staff member. This data was in the form of an ASCII text file. These data were transmitted to the EPA DBA for entry into a SAS dataset.

6.1.3 Field and Laboratory Data Validation

Data validation occurred at each level of data collection and reporting with each activity recorded in laboratory notebooks. Data were conditionally validated after collection and after analysis. Conditional validation was the acknowledgment that field and laboratory staff did not or did notice problems with sample collection or analysis of a particular sample. Conditional validation helped identify problems during collection, storage, shipping, and analysis that may invalidate samples. Questionable data—defined as unusual values which the DBA determines can find no basis for being invalidated, were considered valid and annotated as such in the database. EPA is in the process of reviewing the database and making final determinations of data validity.

6.1.3.1 Instrument Performance Assessment Procedures

Each day, data was accessed using WinCollect software. Graphical reports were run to determine instrument status and data validity. Examples of these graphical reports are shown in the Appendices.

Instrument issues were identified and noted in a logbook at the computer being used to run WinCollect. The graphs and any instrument issues were noted in an email to the site operator, EPA and contractor staff.

6.1.3.2 Laboratory Data Verification

Data validation continued with the inspection of received samples/documents and the integration of the laboratory analyses with the corresponding field monitoring data. Validation consisted of an assessment of the reasonableness of the data, determination of data completeness, and comparison to the criteria defined for each specific parameter (such as pump flow rates, sampling duration, etc). Analytical data not appearing to be valid or not meeting validation criteria were flagged in the database.

6.1.4 Data Reduction

Original data will be kept and archived as a part of the project's record keeping. This archiving activity was carried out by the EPA DBA.

Data recorded on a continuous basis by data loggers were electronically retrieved on a weekly or near-weekly basis by the EPA DBA. In the event that continuously logged data was not electronically transmitted, the data would be sent to the EPA DBA via DVD or other appropriate media. (This use of DVD or other media never occurred for the continuous analyzers. This only occurred for video data.) Non-continuous data, such as filter samples, canister or cartridge samples, were first analyzed by laboratory analysis. In any event, all data were submitted to the EPA DBA for this project and entered into the SAS database(s). The only exception was the video data. The video data would have consumed too many network resources and thus was maintained on external hard drives.

6.1.5 Data Related Organizational Deliverables

The Field Site operator was responsible for ensuring the data loggers, computers, and communications were in good working order so that data were retrieved on a weekly or near-weekly basis by the EPA DBA.

Continuous data that were retrieved on a weekly or near-weekly basis by the EPA DBA included:

The Field Site operator was responsible for ensuring that all logbooks, chain-of-custody forms, notes, and other records were maintained in an orderly fashion so that a complete record of the project was documented.

Laboratory analysis staff was responsible for reporting the laboratory analytical results for the canister, DNPH and PM2.5 integrated samples to EPA. The data were provided in electronic format, Excel data worksheets. The data were reviewed for completeness. If any changes were necessary, the data were investigated and changes documented in both the hardcopy and electronic files. The data were then submitted to the DBA for inclusion in the study database.

The following table lists the data-related deliverables, format of each deliverable, and personnel responsible.

Table 14. Data-related deliverables.

Deliverable

Custodian

Person Delivered To

Format

CO, NO, NO2, NOX,

Field Site Operator

EPA DBA

Electronic

BC

Coarse PM, PM10, PM2.5

Meteorological Data

DNPH Cartridge Sample Collection Information

Field Site Operator/Lab Tech

EPA DBA

Electronic

Canister Sample Collection Information

PM Filter Sample Collection Information

DNPH Cartridge Laboratory Data

Laboratory Staff

EPA WAM,

EPA WAM Delivers to DBA

Electronic

Canister Laboratory Data

PM Filter Laboratory Data

Traffic Data

NDOT Staff/EPA Staff

EPA DBA

Electronic

6.1.6 Data Completeness

The DBA for this project developed a SAS program that provided an overview of the data completeness for this project. This table was updated as required by the needs of the project (weekly, bi-weekly, monthly, etc.). This table provided at a glance the overall instrument up-time versus instrument maintenance, failures or other field site issues. The tables shown below are for the time period of mid-December 2008 thru mid-December 2009.

Table 15. Summary of Data Completeness across by Site for Major Parameters.

Parameter

Station ID

Total

Station 1

Station 2

Station 3

Station 4

20 m roadside

100 m downwind

300 m downwind

100 m upwind

CO

93

94

94

93

92

NO

95

97

96

95

93

NO2

95

97

96

95

93

NOx

95

97

96

95

93

PM10

87

88

89

85

86

PM2.5

87

88

91

83

83

PM Coarse

88

89

88

85

90

Wind Direction

92

93

91

89

97

Wind Speed

100

100

100

100

100

Traffic

> 99 (est.)

Black Carbon – Digital Data

Site name

Distance from Road

Na

(hours)

Completenessb

Time span: 12/15/2008-12/15/2009

Station 1

20 m roadside

8503

97%

Station 2

100 m downwind

7838

89%

Station 3

300 m downwind

7913

90%

Station 4

100 m upwind

8755

100%

aA complete hour of sampling was set at a minimum of 10 five minute data points (50 min)

bNote that the completeness <100% is largely due a delayed start to sampling for several instruments. Incomplete time periods due to instrumentation error was generally less than 1%.

Sample

% Total

TO-15 canisters (VOCs)

82

TO-11 cartridges (aldehydes)

68

PM2.5 Filters

83

6.1.7 Data Storage and Retrieval

The EPA Project Officer will be consulted prior to disposal of records. The EPA DBA or similar designee is responsible for archiving, storage, and retrieval of all field and laboratory data files developed during the study at EPA. Copies of all study information (records/data) are retained and archived in accordance with Federal record storage guidelines.

6.1.8 Data Dictionary

The data dictionary provides a description of each database variable including range (minimum, maximum), type (numeric, alpha), missing value codes, and error flags (See Appendix). Descriptive information required to understand or interpret variables, including calculations or other manipulation, was included for each variable, as needed. This data dictionary is an on-going effort and is refined on an as needed basis.

6.2 Data Review, Verification, and Validation

The purpose of this section is to identify the procedures, and responsible parties that performed data review, verification and validation. Data verification is the process of evaluating the completeness, correctness, and conformance/compliance of a specific data set against the method, procedural, or contractual requirements. Data validation is an analytical- and sample-specific process that extends the evaluation of data beyond method, procedural, or contractual compliance (i.e. data verification) to determine the analytical quality of a specific data set.

Verification and validation of the procedures used to collect and analyze data are critical to the goals of this project and are performed after data collection, but prior to performing the flux calculations and uncertainty determinations. Study personnel were responsible for ensuring that the sampling methods, quality control protocols, and validation methods were followed and completed.

6.2.1 Validating and Verifying Data

Ideally, data undergoing evaluation should be compared to actual events. However, exceptional field events may occur, and field and laboratory activities may negatively affect the integrity of samples. In addition, some of the QC checks may indicate that the data failed to meet the acceptance criteria. Data identified as suspect, or does not meet the acceptance criteria, were flagged as indicated in the appendix.

6.2.2 Verification

As the continuous and non-continuous data were being compiled, a review of the data was conducted for completeness and data entry accuracy. All raw data that were hand entered from data sheets was checked prior to entry to the appropriate database. Once the data were entered, the data were reviewed for routine data outliers and conformance to acceptance criteria. Unacceptable or questionable data was flagged appropriately.

6.2.3 Validation

Validation of measurement data required two stages, one at the measurement value level and the second at the batch level. Records of all invalid samples were retained in the appropriate database. Information included a brief summary of why the sample was invalidated along with the associated flags. Logbook notes and field data sheets have more detailed information regarding the reason a sample was flagged. These documents were retrieved from the field sites and are stored at EPA.

The flags listed in Appendices were used to indicate that individual samples, or samples from a particular instrument, were invalidated.

6.3 Data Analysis

6.3.1 Statistical Analysis – Overall Project

The data analyses recommended below focused on the most basic issues of roadway emission impacts:

Additional data analyses may address additional questions such as the respective impacts of meteorological conditions, traffic volume, vehicle type, or other variables.

Given the complexity of the data set, multivariate analysis approaches using statistical analysis software such as JMP or SAS will be necessary to assess the impact of various parameters of interest on the pollutant dispersion. However, emphasis has been placed on reporting clear and understandable results from the statistical analysis. The field studies were conducted to understand the relation of mobile source emissions to key air contaminants and to determine if there is a statistically significant difference between the pollutant concentration measured at each site and the background concentration.

Data were analyzed using a combination of programs, including MATLAB version R2009b, Microsoft Excel 2007, JMP 8/9 and Sigma Plot 11/12. The data analysis included calculating summary statistics of data for each site for all wind conditions and for winds only from the West (downwind) (+/- 60 degrees from perpendicular), estimating concentration gradients for winds from the West, and observing concentrations as a function of wind direction for all winds.

Updated: 6/28/2017
HEP Home Planning Environment Real Estate
Federal Highway Administration | 1200 New Jersey Avenue, SE | Washington, DC 20590 | 202-366-4000