Skip to content

Concept Design for an Online Information Source for Major Surface Transportation Projects: A Discussion Paper

June 2017
Table of Contents

« PreviousNext »

5 Long-Term Enhancement & Maintenance of the Information Source

5.1 Data Lifecycle Attributes

Investment in the information source cannot be regarded as a one-time event. To remain a viable long-term resource for the industry, the information source requires a thoughtful and proactive approach to its management. This will ensure that the owning organization is able to manage the data resources according to appropriate levels of performance, protection, availability, and cost.

Establishing a high quality and high profile information source would require significant investment in order to provide relevant and useful information on project outcomes for P3 and non-P3 projects. It is envisioned that the process would include the following components:

  • Plan: identification of the data that would be compiled and how the data would be managed and made accessible throughout its lifetime - the focus of this paper.
  • Describe: data, data tables and other pertinent project information would be described thoroughly and accurately using the appropriate metadata standards - this could be the focus of a future effort that would develop and describe the web-based information source or an equivalent system for information dissemination.
  • Note: for the purposes of illuminating the discussion of this paper, it is assumed that an online relational database would be the preferred choice to store and disseminate information. However, this presumption, does not in any way limit the applicability of the paper if other choices were made to store or disseminate information.
  • Collect: data would be submitted to an appropriate long-term archive or a data center responsible for managing the online information source portal - discussed in this section of the paper.
  • Assure: quality of the data would be verified by a clearinghouse or a data oversight board and then made available through a database server - discussed in this section of the discussion paper.
  • Analyze: data would be used and analyzed by end users to achieve their research and business objectives, i.e., answer core questions.

Important aspects of the work flow would include the Plan, Collect and Assure phases, during which the planning, collection and quality assurance of the project-based information will occur prior to being entered into a database and made available to the public through the online portal. High-level specifications for the on-line portal are discussed in Chapter 4.

5.1.1 Data Management Plan

The data management plan will be a critical aspect of the information source. Planning for data management would involve answering several questions about how the data would be gathered and used to populate the information source. These questions include:

  • Do the data already exist?
  • How will the data be obtained or collected?
  • What is the schedule and budget for data collection?
  • How will the data be checked and certified?
  • Can the data be made publicly available?
  • How will the data be stored, accessed, and protected?

Questions regarding data availability are addressed in Chapter 3 of this discussion paper. The larger question underpinning the data management plan involves determining who would be responsible for data collection and the associated schedule and budgetary needs. While the information platform described in Chapter 4 suggests that import of data into the information source may be facilitated with the help of templates, the structure of the entity or entities responsible for collecting information and importing it into the information source could take three potential forms:

  • Approach 1 -- Distributed Responsibility: As shown in Figure 3, this approach would assign responsibility for understanding the specific needs of the information source, collecting and reviewing the data for accuracy and then uploading the data to the online platform to a group of data providers. Clear and stringent data entry protocols would be necessary, even when the information can be expected to vary somewhat in completeness and quality. The information source owner would then accept, store, protect and make the data accessible to end users. The advantage of this approach is that it would assign the data review and population responsibilities to the data providers, who would be in the best position to serve in this function. For example, the data providers would be in charge of supplying quality data to the online source, and the information source owner would then be responsible for ensuring that the data is received and delivered to end users in a seamless fashion. However, a potential disadvantage of this approach could be that the data providers may not have a strong incentive to populate the information source with quality information and may not be fully attuned to the needs of end users of the information source. Additionally, the multiple providers may interpret the data input differently causing errors in entry. As a result, this approach could lead to a weaker information source.

Figure 3. Distributed Responsibility Data Management Plan

Figure 3

Source: WSP | Parsons Brinckerhoff, 2017

View larger version of Figure 3

Text of Figure 3

Figure 3. Distributed Responsibility Data Management Plan
Information Source Owner
  • Responsible for data governance
  • Responsible for IT enablement including
    online platform and technology
    resources
  • Responsible for managing user
    community
Data Providers
  • Understand data standards, formats,
    definitions
  • Ensure data quality
  • Upload data to information source
End user
  • Data discovery, analysis
  • Feedback
  • Approach 2 --Total Ownership: As shown in Figure 4, this approach would assign total ownership of the information source to a single entity that would be responsible for providing the IT infrastructure for the information platform, coordinating and gathering data from various sources, reviewing the data and uploading all relevant data to the information platform to making it available to the public. The information source owner would also be responsible for the lifecycle management of the online platform and the user base. The advantage of this approach would be that all activities would be undertaken by a single entity that would be aware of the needs of the information source end users, as well as data and data quality requirements and the workings of the online platform. The disadvantage with this approach is that it could perhaps be more expensive than the other strategies due to the considerable resources that would need to be used to collect data from disparate agencies.

Figure 4. Total Ownership Data Management Plan

Figure 4

Source: WSP | Parsons Brinckerhoff, 2017

View larger version of Figure 4

Text of Figure 4

Figure 4. Total Ownership Data Management Plan
Information Source Owner
  • Responsible for data governance
  • Responsible for IT enablement
  • Responsible for data collection and
    data quality
  • Responsible for managing user
    community
End user
  • Data discovery, analysis
  • Feedback
  • Approach 3 -- Hybrid: This approach would adopt a hybrid structure, as shown in Figure 5, where one party would be responsible for providing the IT infrastructure to house the online platform and its ongoing management needs. Separate liaisons of the information source owner would work with data originators (including project sponsors) to collect raw information. This information would then be uploaded to the information source by the liaison. The data liaison, or possibly a third party expert or expert panel, would be responsible for quality review, curating the data, certifying the data, and releasing it for public use. The level of effort associated with data certification would vary and may require triangulation, which could be resource-intensive. The advantage of this approach is that the data originators would be responsible for providing raw information. However, the requirements of the information source would be communicated through a liaison, making it less of a burden on the data originators to align the final data submitted with the goals of the information source. In addition, since the data liaison / third party experts would be involved with quality review of the data, it further lowers the burden on the data originators to understand the quality requirements of the data set. Finally, the IT platform and its management would be left to a competent party that would not necessarily need to be involved with the technical aspects of the data. However, this approach would likely result in greater costs than Approach 1, but might incur lower costs than Approach 2.

Figure 5. Hybrid Data Management Plan

Figure 5

Source: WSP | Parsons Brinckerhoff, 2017

View larger version of Figure 5

Text of Figure 5

Figure 5. Hybrid Data Management Plan
Information Source Owner
  • Responsible for data governance
  • Responsible for IT enablement
  • Responsible for managing user
    community
Data Providers
  • Provide raw data
Data Liaisons
  • Understand data standards and
    information source requirements
  • Work with data providers to collect data
  • Ensure data quality via review board
  • Upload data to information source
End user
  • Data discovery, analysis
  • Feedback

The selection of a final data management approach should involve careful deliberation of the pros and cons of each option considered. The requirements of the information platform (relational database and file management system), the data collection, quality reviews and maintenance strategies, and roles and responsibilities of the various parties involved will flow from the selected approach. In addition, given that the information for each project in the database will be developed over several years, the management of the database is a long-term activity. Therefore, data collection, population and quality assurance will be on-going, long-term activities involving multiple parties.

5.1.2 Data Collection

For the purposes of this paper, data collection is defined as a coordinated set of activities that begins with identifying the key points of contact at the data sources (identified in Chapter 3), making data requests, receiving and collating data, and moving it to the next step of quality assurance and upload. There are two types of data needed to complete the information source: 1) primary data, i.e., data that are available from the data sources in a form that can be readily consumed by the information source; and 2) data discovered through secondary research such as interviews, questionnaires, detailed project reviews and other specific information elicitation techniques. A majority of the data for Tier 2 is expected to fall in the latter category. Two types of actors will be needed to gather the necessary information, as discussed below.

Primary Data Providers: For an overwhelming set of the information needed to address the core questions, the primary data providers are project sponsors/owners. The USDOT and FHWA are also valuable sources of project information since they collect some of the desired information as part of their monitoring and oversight activities. In a few instances, for DB and P3 projects, the DB contractor or private partner are also responsible for generating and reporting some of the primary data elements. Examples of such data include legislative information, agency capacity and policy, asset condition and operational (performance) information. All data incorporated into the information source should undergo reliability tests (as explained below in section 5.1.3).

Secondary Data Providers (including Research): As noted above, a majority of the data elements for Tier 2 require research that is attuned to the needs of the information source. This would likely involve additional interviews with specific parties within the project owner/sponsor and the identification and review of key internal and published project documents. While project sponsors/owners or their contractors and private partners technically have access to their own information, they will need to re-analyze it in light of the needs of the information source to make such data available in appropriate formats and with common metrics. Regulatory agencies such as the USDOT or FHWA routinely perform research on some data elements and have many datasets or project-specific reports that could provide helpful input to the information source. Other interested industry stakeholders might have similar information. However, it is expected that additional effort would be needed to research and mine the requisite information for the information source. Stringent protocols and reliability tests will be needed to maintain the quality of the data incorporated into the information source over time.

5.1.3 Data Quality Assurance

The online information source will include some data metrics that are robust and others that are more descriptive and may require judgement and interpretation by the team assembling and maintaining the information source. The information source could create an illusion of consistency by inappropriately including disparate data into binary or other constrained categories.

Data quality assurance is an ongoing activity lasting at least as long as the information source is active. Data quality assurance is required when data is uploaded to the information source. Two levels of quality assurance are recommended. The first step would be to validate the data fields being uploaded to the information source to ensure that they are consistent with the backend database in terms of units, maximum and minimum value, type of data (e.g., text or numbers), etc. This could be an automated process using a validation checker that could be made available as part of the online information platform described in chapter 4. Once the validation checks are completed, the data could be admitted into the database with lowest level of quality assurance certification attributed to it along with a version control. A data completeness check could also be incorporated as an automated process to review the number of expected fields to be completed at a given project stage versus the number of fields for which data is actually populated.

A second, more laborious and manual data checking process is also recommended to assure the usability and competency of the data. This review would entail ensuring that the data submitted for publication in the information source passes the test of reasonableness to be valuable to end users. Reasonableness checks could be performed on all aspects of the information submitted, but should focus mainly on the core questions that the information source is designed to address. The checks could encompass reviewing information on costs, schedule, financial performance, and quality (during design and construction and in-service). The intent would be to ensure that the reasonableness checks are directly related to the data fields being collected (or their derivatives) and that their coverage is comprehensive and encompasses the entire information source. It is possible that the information source assigns a confidence level to the data published using an arbitrarily selected, but well explained, rating scale.

Examples of questions to be asked to check the reasonableness of the submitted data could include:

  • Is the total cost value of claims reasonable (e.g., are the costs greater than, say, 10 percent of the engineer's estimate of the project cost)?
  • Are the net cost savings from all ATCs reasonable (e.g., are the cost savings greater than 25% of the total engineer's estimate of project costs)?
  • Are the net schedule savings from ATCs reasonable (e.g., are the schedule savings greater than 25% of the engineer's estimate of the baseline duration)?
  • Are the non-weather related incident clearance times greater than 24 hours for a single event?
  • Is the average work zone queue length greater than typical queue lengths for this site?
  • Is the value of non-conforming work accepted greater than a small percentage (e.g., 2 to 5 percent) of the total project cost?

If the answers to any of these questions (or other similar ones formulated during the review process) are questionable, secondary research, including formal outreach to the data sources to verify data accuracy, may be necessary. Triangulation and replication tests provide the highest degrees of reliability and validity. Any verification processes would be formalized through proper communication protocols, including feedback and data reconciliation reports during the operational phase. Any data that is reconciled would need to be updated and the "raw" data source cleansed, with appropriate versioning and re-certification steps occurring to ensure the quality of the new information.

5.2 Potential Institutional Structure and Responsibilities

The responsibilities for data collection will align with the approach chosen for the data management plan. If Approach 1 were chosen, the project data originators would become responsible for primary and secondary data collection as well as data completeness, accuracy and long-term population. The information source owner would provide the platform, data standards and IT enablement for uploading and use of the data. In Approach 2, data collection and quality would be the responsibility of an "outsourced" entity, which would work in cooperation with the information source owner to populate and maintain the online information source. In Approach 3, primary data collection would be assigned to the data originator, with secondary data collection outsourced, and the data quality assurance function would be housed with the information source owner along with the IT enablement function.

No matter which approach is selected, it will be important for the parties responsible for assembling the information source to engage with state and local project sponsors to obtain ongoing data updates. The information source would also benefit from a strong initial impetus focused on creating a robust online platform and creating a critical database of project-related information. This is important to understand the value proposition created by the information source, finalize details of the management plan, and attract agency participation and secure funding for the long-term sustenance of the information source. Initial seed funding for the information source could be provided through a pooled-fund study using state DOT research monies from multiple states, through the national cooperative highway research program, or by the U.S. Department of Transportation.

« PreviousNext »

back to top