REPORT

This report is an archived publication and may contain dated technical, contact, and link information

Top
< Prev
Main
1
2
3
4
5
6
Next >
>>

Federal Highway Administration >
Publications >
Research Publications >
15072 >
004.Cfm >
Chapter 4. Technical Integration Formats

Publication Number: FHWA-HRT-15-072 Date: December 2016

Publication Number: FHWA-HRT-15-072
Date: December 2016

State of The Practice on Data Access, Sharing, and Integration

CHAPTER 4. TECHNICAL INTEGRATION FORMATS

GEOGRAPHIC REFERENCING AND RESOLUTION

Nearly all transportation data can be associated with a geographic location. Therefore, GIS technology, which allows data collected by different sources to be displayed together on a common map, is seen as a primary data integration tool. Most, if not all, of the examples cited in the previous chapter use GIS as the platform for integrating, analyzing, and visualizing transportation data based on locational proximity.

A key factor in determining how well transportation data can be integrated is the locational referencing method used to describe where each transportation data item is located. The following are the three principal locational referencing method typologies, with each typology having numerous variations:

Geographic coordinates: Locations are defined by a pair of numbers that correspond to a unique point on the Earth’s surface. The most common coordinate system is latitude and longitude (lat/lon). Lat/lon is also the native coordinate system of the GPS, which is used extensively in surveying, vehicle navigation, vehicle tracking, and incident reporting (i.e., crash locations). However, other coordinate systems (e.g., State Plane) also exist and are still used by many transportation agencies as the primary referencing method for storing location information in various transportation databases. Fortunately, equations exist to translate between different coordinate systems, and most commercial GIS software platforms include these translation equations as part of their basic toolbox.
Linear referencing: This method locates features, events, and changes in attributes along a roadway (or other transportation networks) by defining both a unique identifier for each route or road segment and a linear measurement along that route from a specified reference point to each feature of interest. The most common linear referencing methods used by transportation agencies are route milepoint, where all linear measurements are made from the start of the defined route, and reference point offset, where linear measurements are made relative to well-defined (usually visible) reference markers (e.g., an intersection, bridge, or milepost marker) along the defined route. Another form of linear referencing is street address ranges, where the street name corresponds to the unique route identifier, and each street address number represents a coarse linear measurement along the named street.

Linear referencing is used extensively by State transportation agencies for collecting, storing, and reporting data on transportation infrastructure. Several different linear referencing systems (LRSs) may be in use within the same transportation agency to link to specific legacy databases (e.g., one LRS may be used to locate pavement conditions, while another may be used for locating roadway attributes such as number of lanes, shoulder width, or functional class). Most State transportation agencies spend significant GIS staff resources to maintain, update, and reconcile the locations of roadway data referenced using alternative LRSs. There are currently no national standards or even clear guidelines for developing or updating various LRSs; each agency’s approach is unique, and translating between different systems must be handled on a case by case basis.
Locational proxies: This method uses other geographic entities to specify an approximate location for a transportation feature or event. Typically, the locational proxy represents a predefined area (e.g., a county, incorporated place, highway maintenance district, or legislative district) that provides sufficient locational information for a specific application. Each area can be represented by a polygon feature in a GIS and is identified by a standard identification code (e.g., county Federal Information Processing Standard code).

A type of locational proxy that is unique to traffic data reporting is the traffic message channel (TMC) code. The TMC code is a quasi-industry standard for identifying a section of highway for the purpose of locating and reporting real-time traffic volumes, travel times, or incident events between automated monitoring devices, traffic management centers, traffic reporters, and in-vehicle navigation systems. Each TMC code is a nine-character code that defines a physical section of road. On limited access divided highways, each directional travel way has its own code, and each section typically extends from interchange to interchange. Each interchange also has separate codes for each intersecting route in each direction. The TMC code provides a compact, efficient means of transmitting location data between automated devices, but to display the traffic information contained in the message, the TMC codes must be linked to a geospatial roadway database. Currently, only commercial navigation databases (e.g., NavTeq™, TomTom™, or Google™) have incorporated TMC codes as attributes in their roadway databases.

Another critical component needed for integrating transportation data using GIS technology is the geospatial database representing the physical roadway network itself. Typically, most roadway networks are depicted geospatially as linear features, where each line segment represents the approximate centerline of the roadway (or directional travel way for physically separated divided highways). The positional accuracy of the line representing the roadway centerline can vary significantly between different databases, although most State transportation agencies (and commercial navigation data developers) currently are maintaining roadway centerline databases that have positional accuracies ranging from 10 to 40 ft of ground truth. This level of positional accuracy is generally adequate for locating, displaying, and linking most transportation-related attributes, features, and events to the roadway centerline and through the roadway centerline to one another.

Roadway centerline databases also vary significantly with respect to which roads are included and what attribute data are included for each road segment. Historically, roadway databases developed by State transportation agencies include centerline geometry and attributes only for those roads that were specifically maintained by the State or that were required for Federal reporting purposes (e.g., HPMS). Relatively few State transportation agencies maintain a complete, integrated centerline database for all public roads, and none of them routinely update roadway attribute data on roads that are maintained by other agencies. Commercial navigation databases do include all public (and even some private) roads and update key roadway attributes needed for vehicle navigation and routing (e.g., one-way streets, vehicle restrictions, speeds, etc.) through a combination of primary data collection and data sharing with State and local government agencies, and even private freight transportation providers (e.g., UPS®, FedEx®).

To provide a common framework for integrating transportation data, the roadway centerline database must include certain specific attributes that facilitate translation between the various referencing methods described earlier in this section. To integrate State road inventory and condition data, the roadway centerline database must include, for each road segment feature, a route identifier and linear measures corresponding to the start and end points of the road segment along the specified route. If roadway attribute data are maintained using more than one LRS, similar attributes must be included for each LRS. To integrate traffic data that are geographically referenced using TMC codes, each road segment feature must also include the associated TMC code representing its location. While multiple road segments may share the same TMC code (e.g., a long stretch of highway between interchanges, or segments representing individual ramps within an interchange), road segments should be split wherever the TMC code changes.

Common geospatial editing and analysis functions can be used to link some data items to the roadway centerline database based on spatial proximity. For example, roadway data referenced using geographic coordinates can be “snapped” to the nearest point on the roadway centerline database. That point on the roadway centerline can be translated into a milepoint measure on a specific route, enabling its display and analysis together with other linear referenced attribute data. The critical assumption in using spatial proximity is that the positional accuracies of both the coordinate referenced data item and the roadway centerline alignment are close enough to ensure that any spatial search finds the correct match. As the relative positional accuracies of either spatial feature decline, the likelihood of matching the data item to the wrong roadway centerline increases, requiring additional validity checks, and more time to complete them.

Another consideration in using roadway centerline databases as the common framework for integrating transportation data is whether a centerline depiction provides sufficient feature resolution for its intended application. Typically, a roadway centerline database does not display or explicitly identify individual travel lanes unless they are physically separated by a median or other barrier. This means that information on events or incidents that affect one or more but not all lanes of a roadway (e.g., a crash or roadway construction that blocks the outside travel lane) cannot be explicitly displayed using just a roadway centerline network. There are methods available to graphically display roads with multiple concurrent lanes, but to accurately locate lane-level features and events requires additional locational attributes. For example, commercial navigation databases include lane-specific information using a standard lane identification convention (e.g., lanes are numbered consecutively, beginning with the inside travel lane (leftmost, in the direction of travel) as lane one, and proceeding outward). These conventions are used in RITIS to display detailed traffic volumes at the individual lane level.

Geospatial, a Web portal for sharing geographic information (http://www.data.gov/geospatial/) developed by the Federal Government in 2003, also faced many organizational issues.⁽⁵⁶⁾ Geospatial is a public gateway for improving access to geospatial information and data under the Geospatial e-government initiative. Geospatial’s Web portal unifies geospatial data found among 69 Federal and 79 State and local government clearinghouses by using behind-the-scenes search tools to find and display data. Geospatial orchestrates new practices for data-sharing and system interoperability by developing an open system standard that defines a Web-enabled geospatial architecture. Metadata describing the geospatial data on the clearinghouses follow a standard reporting framework to make the information accessible to the Geospatial search tools.

TEMPORAL RESOLUTION

The importance of temporal referencing and resolution for integrating transportation data depends heavily on the specific application and the data being integrated. Most of the roadway inventory data maintained by State transportation agencies is temporally stable and may be updated only once a year or at the conclusion of a specific highway maintenance or construction project.

By contrast, traffic conditions change continuously. The frequency with which traffic condition data are updated depends on the methods by which the data are collected and transmitted, how they are processed, and the purposes for which the data are being used. For example, traffic operations centers require near real-time streaming of actual traffic conditions (i.e., volumes, incident locations, etc.) to enable traffic managers to take tactical corrective actions and to see how traffic responds to those actions. Transportation planners, however, typically want traffic data that have been summarized over various time periods (e.g., by 5-min periods over a day, daily over a week or month, or simply AADT). Creating these summaries requires substantial data storage capacity; automated procedures to select, extract, process, and display data of interest; and updating procedures to add new data to the repository on a continuous basis.

Integration of geospatially referenced data having different temporal resolution requires the development of data aggregation and visualization procedures, in close coordination with the end users, for how (near) real-time data will be summarized and displayed. For example, some users may want to view a historical replay of traffic conditions at 5-min intervals for a specific day. Other users may want to produce graphs showing the average hourly weekday traffic volumes and variances along a specific roadway over the past year. Still other users may simply want to update the AADT data items for all roads based on traffic volumes collected over the past year.

To support these diverse user requirements, data storage formats need to be established that enable efficient selection, extraction, and summarization of individual traffic data records. At a minimum, the data formats should standardize both geospatial and temporal references. For geospatial referencing, either geographic coordinates or TMC codes should be sufficient, provided that TMC codes are attached to the geospatial roadway centerline database. For temporal referencing, standardized representation of dates (e.g., YYYYMMDD) and time (e.g., HHMMSS, 24-h clock) should be attached to each data record. If raw incoming data are not transmitted in the standardized format, they need to be converted as part of the extract, transform, and load procedure.

QUALITY CONTROL AND ASSURANCE

Quality control procedures are necessary if archived data are to be usable for a variety of applications. The quality of archived data from traffic operations systems has been influenced by the following two prevailing issues:

The difficulty of maintaining extensive electronic field equipment (sensors and communication).
Real-time traffic operations applications that have different data quality requirements than historical uses of archived operations data.

The result has been that some managers and users of data archived from traffic operations have wrestled with data quality problems. The following quality assurance strategies are the most important when using archived operations data:

Improve data quality at the source (if possible) and avoid “scrap and rework.”
Apply business rules (quality checks) to automate the identification of invalid data.
Make data quality results available to data and information consumers.
Consider the development of data quality standards to ensure a base level of quality.

Traffic Data Quality Measurement is an excellent source for methods and tools that enable traffic data collectors and users to determine the quality of traffic data they are providing, sharing, and using.⁽⁵⁷⁾ The report presents a framework of methodologies for calculating the data quality metrics for different applications. The report also presents guidelines and standards for calculating data quality measures to address the following key traffic data quality issues: defining and measuring traffic data quality, quantitative and qualitative metrics of traffic data quality, acceptable levels of quality, and methodology for assessing traffic data quality.

The framework is developed based on the six recommended fundamental measures of traffic data quality defined as follows:⁽⁵⁸⁾

Accuracy: The measure or degree of agreement between a data value or set of values and a source assumed to be correct. It is also defined as a qualitative assessment of freedom from error, with a high accuracy assessment corresponding to a small likelihood of error.
Completeness (also referred to as availability): The degree to which data values are present in the attributes (e.g., volume and speed are attributes of traffic) that require them. Completeness is typically described in terms of percentages or number of data values. It can refer to both the temporal and spatial aspect of data quality in the sense that completeness measures how much data are available compared with how much data should be available.
Validity: The degree to which data values satisfy acceptance requirements of the validation criteria or fall within the respective domain of acceptable values. Data validity can be expressed in numerous ways. One common way is to indicate the percentage of data values that either pass or fail data validity checks.
Timeliness: The degree to which data values or a set of values are provided at the time required or specified. Timeliness can be expressed in absolute or relative terms.
Coverage: The degree to which data values in a sample accurately represent the whole of that which is to be measured. As with other measures, coverage can be expressed in absolute or relative units.
Accessibility (also referred to as usability): The relative ease with which data can be retrieved and manipulated by data consumers to meet their needs. Accessibility can be expressed in qualitative or quantitative terms.

Table ES-1, taken from the FHWA report Traffic Data Quality Measurement and reproduced as table 5, shows the range of data consumers, types of data, and possible applications that are considered in developing the framework. Table 6 indicates example data quality targets.

Table 5. Types of data consumers and applications.⁽⁵⁷⁾
Data Consumers	Type of Data	Applications or Users
Traffic operators (of all stripes)	Original source data Archived source data	Traffic management Incident management
Archived data administrators	Original source data	Database administration
Archived data users (planners and others)	Original source data Archived source data Archived processed data	Analysis Planning Modeling (development and calibration)
Traffic data collectors	Original source data Archived source data	Traffic monitoring Equipment calibration Data collection planning
Information service providers	Original source data (real time)	Dissemination of traveler information
Travelers	Traveler information	Pre-trip planning

Table 6. Data quality targets.⁽⁵⁷⁾
Applications	Purpose	Data	Accuracy	Completeness	Validity	Timeliness	Typical Coverage
Transportation planning applications	Standard demand forecasting for long-range planning	Daily traffic volumes	Freeways: 7 percent Principal Arterials: 15 percent Minor Arterials: 20 percent Collectors: 25 percent	At a given location 25 percent—12 consecutive hours of 48-h count	Up to 15-percent failure rate— 48-h counts Up to 10-percent failure rate—permanent count stations	Within 3 years of model validation year	55 to 60 percent of freeway mileage 25 percent of principal arterials 15 percent of minor arterials 10 to 15 percent of collectors
Transportation planning applications	Highway performance monitoring system	AADT	5- to 10-percent urban interstate 10-percent other urban 8-percent rural interstate 10-percent other rural mean absolute error	80-percent continuous counts 70 to 80 percent for portable machine counts (24-/48-h counts)	Up to 15-percent failure rate—48-h counts Up to 10-percent failure rate—permanent count stations	Data 1 year old or less	55 to 60 percent of freeway mileage 25 percent of principal arterials 15 percent of minor arterials 10 to 15 percent of collectors
Transportation operations	Traveler information	Travel times for entire trips or portions of trips over multiple links	10- to 15-percent root mean squared error	95- to 100-percent valid data	Less than 10‑percent failure rate	Data required close to real time	100-percent area coverage
Highway safety	Exposure for safety analysis	AADT and VMT by segment	5- to 10-percent urban interstate 10-percent other urban 8-percent rural interstate 10-percent other rural mean absolute error	80-percent continuous count data 50 percent for portable machine counts (24-/48-h counts)	Up to 15-percent failure rate—48–h counts Up to 10-percent failure rate—permanent count stations	Data 1 year old or less	55 to 60 percent of freeway mileage 25 percent of principal arterials 15 percent of minor arterials 10 to 15-percent of collectors
Pavement management	Historical and forecasted loadings	Link vehicle class	20 percent—combination unit 12 percent—single unit	80-percent continuous count data 50 percent for portable machine counts (24-/48-h counts)	Up to 15-percent failure rate—48-h counts Up to 10-percent failure rate—permanent count stations	Data 3 years old or less	55 to 60 percent of freeway mileage 25 percent of principal arterials 15 percent of minor arterials 10 to 15 percent of collectors

METADATA

Metadata are typically used to determine the availability of certain data, determine the fitness of data for an intended use, determine the means of accessing data, and enhance data analysis and interpretation by improved understanding of the data collection and processing procedures. Metadata provide the information necessary for data to be understood and interpreted by a wide range of users. Thus, metadata are particularly important when the data users are physically or administratively separated from the data producers. Metadata also reduce the workload associated with answering the same questions from different users about the origin, transformation, and character of the data. The use of metadata standards and formats helps to facilitate the understanding, characteristics, and usage of the data.⁽⁵⁸⁾

ASTM has published metadata standards for ITS data in ASTM E2468-05, “Standard Practice for Metadata to Support Archived Data Management Systems.” ⁽⁵⁹⁾ As stated on the ASTM Web site, “This standard is applicable to various types of operational data collected by ITSs and stored in an ADMS. Similarly, the standard can also be used with other types of historical traffic and transportation data collected and stored in an archived data management system.” ⁽⁶⁰⁾

This standard adopts the Federal Geographic Data Committee’s (FGDC) existing Content Standard for Digital Geospatial Metadata (FGDC-STD-001-1998) (with minimal changes) as the recommended metadata framework for ADMSs.⁽⁶⁰⁾ The FGDC metadata standard was chosen as the framework because of its relevance and established reputation among the spatial data community. A benefit of using the FGDC standard is the widespread availability of informational resources and software tools to create, validate, and manage metadata (see https://www.fgdc.gov/ metadata/index_html).⁽⁶¹⁾

Because the ASTM E2468-05 metadata standard is based on the FGDC standards for geographic metadata, the components of the nongeographic metadata structure, as provided in the following list, are the same except for those related to spatial data:

Identification information.
Data quality information.
Entity and attribute information.
Distribution information.
Metadata reference information.

Additional metadata standards include the following:

FGDC metadata standards.
International Organization for Standardization (ISO) Standard 19115 (metadata content standard for describing geographic information).⁽⁶²⁾
Federal Enterprise Architecture (FEA) Data Reference Model (high-level reference model defining standardization areas for data description and discovery).
FGDC tools, a variety of tools available based on the FGDC standards.

Each of these standards and their requirements is discussed in the following paragraphs.

FGDC Metadata Standards

According to Executive Order 12096, all Federal agencies are ordered to use the Content Standard for Digital Geospatial Metadata (CSDGM) Version 2 (FGDC-STD-001-1998) to document geospatial data created as of January 1995.⁽⁶⁰⁾ Many State and local governments have adopted this standard for their geospatial metadata as well.

ISO 19115 Metadata Content Standard

The international community, through the ISO, has developed and approved an international metadata standard, ISO 19115. As a member of ISO, the United States is required to revise the CSDGM in accordance with ISO 19115. Each nation can craft its own profile of ISO 19115 with the requirement that it include the 13 core elements. The FGDC is currently leading development of a U.S. profile of the international metadata standard ISO 19115.⁽⁶²⁾The ISO 19115 core metadata elements are listed in table 7.

Table 7. ISO 19115 core metadata elements.⁽⁶²⁾
Element Category	Element Name
Mandatory	Dataset title
	Dataset reference date
	Dataset language
	Dataset topic category
	Abstract
	Metadata point of contact
Conditional	Dataset responsible party
	Geographic location by coordinates
	Dataset character set
	Spatial resolution
	Distribution format
	Spatial representation type
	Lineage statement
	Online resource
	Metadata file identifier
	Metadata standard name
	Metadata standard version
	Metadata language
	Metadata character set

Federal Enterprise Architecture Data Reference Model

The U.S. FEA is an initiative of the U.S. Office of Management and Budget that aims to comply with the Clinger-Cohen Act and provide a common methodology for IT acquisition in the United States Federal Government.⁽⁶³⁾ It is designed to ease sharing of information and resources across Federal agencies, reduce costs, and improve citizen services.⁽⁶⁴⁾

The following five models comprise the FEA:

Performance Reference Model.
Business Reference Model.
Service Component Reference Model.
Data Reference Model.
Technical Reference Model.

Minnesota Department of Transportation Example

The Minnesota Department of Transportation (MnDOT) standards for metadata elements were established by the State’s Data Governance Work Team prior to implementation of data governance in 2011. The metadata standards were formally adopted by the Business Information Council in November 2009. The mandatory metadata elements and definitions (see table 8) are based on the Dublin Core Metadata Element Set and the Minnesota recordkeeping Metadata Standard. These elements should be applied at the table level at a minimum. Ideally, they should be applied at the column level based on the customer or business need.

Table 8. MnDOT metadata element schedule.
Element	Definition	Table Level	Column Level
Title	The name given to the entity.	X	X
Point of Contact	The organizational unit that can be contacted with questions regarding the entity or accessing the entity.	X	—
Subject	The subject or topic of the entity, which is selected from a standard subject list.	X	—
Description	A written account of the content or purpose of the entity. Accuracy or quality descriptions may also be included.	X	X
Update Frequency	A description of how often the record is updated or refreshed.	X	—
Date Updated	The point or period of time, which the entity was updated.	X	—
Format	The file format or physical form of the entity.	—	X
Source	The primary source of record from which the described resource originated.	X	—
Lineage	The history of the entity; how it was created and revised.	X	—
Dependencies	Other entities, systems, and tables that are dependent on the entity.	X	—
— Indicates the element is not applicable at this level.

California Example

The Caltrans Library uses the MARC 21 metadata standard CONTENTdm®, which uses the Dublin Core American National Standards Institute (ANSI) standard.⁽⁶⁵⁾ The GIS library has been through numerous iterations, but in general, there is still quite a bit of information without metadata, and it is becoming an issue. Caltrans Earth (an application that uses a Google Earth™ plugin) will not allow data without metadata; ISO 19115 Geographic Information—Metadata is used.⁽⁶²⁾ The Data Retrieval System (DRS) requires some metadata and is system dependent. Each metadata item has core elements and some that are specific to category. The metadata for the DRS are established by the data administrator and the DRS Steering Committee, usually for search and retrieval purposes.⁽⁶⁶⁾

Metadata Guidelines for the Research Data Exchange⁽⁶⁷⁾

This document provides requirements for metadata that must be followed by registered users who submit data for the Research Data Exchange (RDE). In terms of standards, metadata must follow ASTM E2468-05, Standard Practice for Metadata to Support Archived Data Management Systems.⁽⁵⁹⁾ Metadata content includes seven main sections: identification information, data quality information, spatial data organization information, spatial reference information, entity and attribute information, distribution, and metadata reference information. The document also defines the following roles and responsibilities in metadata management:

RDE Metadata Creator: These individuals include RDE users (both registered users and registered users with additional access), the portal content manager, data management system content manager, and data sources. For projects under USDOT procurement, the project team is required to create/update metadata along with the dataset used for or generated from the project. For projects no longer under USDOT procurement or for data from an external data management system, the data provider may volunteer to create/update metadata, or the RDE portal administrator may designate a portal content manager to be the metadata creator.
RDE Metadata Manager: The RDE portal administrator acts as the RDE metadata manager. When a new dataset arrives, the metadata manager designates a metadata analyst from the RDE portal content managers to check the quality of the metadata. If the metadata conform to the standard and pass the quality assurance test, the metadata manager releases the dataset and metadata to the public through the RDE.
RDE Metadata Analyst: The RDE portal content manager acts as the metadata analyst and is responsible for checking the quality of the metadata to determine whether they meet the metadata standard. If the metadata meet the standard, the metadata analyst performs the metadata quality assessment. The metadata are then passed back to the metadata manager.
RDE Metadata User: These individuals include RDE registered users and registered users with additional access who use the metadata as the search criteria to find and use datasets. Metadata users report any inappropriate or irrelevant metadata issues to the metadata manager.

In summary, both the ASTM Standard Practice for Metadata to Support Archived Data Management Systems (E2468-05) and a modified Dublin Core3 as the metadata schema have widespread industry acceptance. Either one is recommended for the VDA Framework.⁽⁵⁹⁾

MIGRATION STRATEGIES

Data migration approaches abandon the effort to keep old technology working or to create substitutes that emulate or imitate it. Instead, these approaches rely on changing the digital encoding of the objects to preserve them while making it possible to access those objects using state-of-the-art technology after the original hardware and software become obsolete.

The following subsections describe a variety of migration strategies described in NCHRP Report 754.⁽⁶⁶⁾

Simple Version Migration

The most direct path for format migration, and one used very commonly, is simple version migration within the same family of products or data types. Successive versions of given formats, such as Corel WordPerfect®’s WPD or Microsoft® Excel’s XLS, define linear migration paths for files stored in those formats. Software vendors usually supply conversion routines that enable newer versions of their product to read older versions.

Format Standardization

An alternative to the uncertainties of version migration is format standardization, whereby a variety of data types are transformed to a single, standard type. For example, a textual document, such as a WordPerfect® document, could be reduced to plain ASCII. Obviously, there would be some loss if font, type size, and formatting were significant. However, this conversion is eminently practicable, and it would be appropriate in cases where the essential characteristics to be preserved are the textual content and the grammatical structure. Where typeface and font attributes are important, richer formats, such as PDF or Rich Text Format, could be adopted as standards.

Typed Object Model Conversion

Another approach to migrating data formats into the future is Typed Object Model (TOM) conversion. The TOM approach begins with the recognition that all digital data items are objects, that is, they have specified attributes, specified methods or operations, and specific semantics. All digital objects belong to one or another type of digital object, where “type” is defined by given values of attributes, methods, or semantics for that class of objects. A Microsoft® Word 6 document, for example, is a type of digital object defined by its logical encoding. An electronic mail message is a type of digital object defined, at the conceptual and logical levels, by essential data elements (e.g., To, From, Subject, or Date).

Object Interchange Format

Another approach enables migration through an object interchange format defined at the conceptual level. This type of approach is being widely adopted for e-commerce and e‑government where participants in a process or activity have their own internal systems that cannot readily interact with systems in other organizations. Rather than trying to make the systems directly interoperable, developers are focusing on the information objects that need to be exchanged to do business or otherwise interact. These objects are formally specified according to essential characteristics at the conceptual level, and those specifications are articulated in logical models. The logical models or schema define interchange formats. To collaborate or interact, the systems on each side of a transaction need to be able to export information in the interchange format and to import objects in this format from other systems. The XML family of standards has emerged as a major vehicle for exchange of digital information between and among different platforms.

FILE AND DATA FORMATS

This review identified the following data formats:

TranXML.
Traffic Management Data Dictionary.
Data.gov.
Geospatial reference frameworks (e.g., TMC codes).
Apache Software Foundation Object-Oriented Data Technology.
National Transportation Communications for Intelligent Transportation System Protocol.
National Information Exchange Model (FHWA Office of Policy is currently examining this model as a possible standard).
United Nations (UN) recommended UN/EDIFACT (which is the only international standard and is predominant outside North America as well throughout the life sciences and pharmaceutical industry).
U.S. standard ANSI ASC X12 (X12) (predominant in North America).
TRADACOMS standard developed by the Article Numbering Association (predominant in the United Kingdom’s retail industry).
ODETTE standard used within the European automotive industry.
ebXML, used by many Asian community systems.
World Customs Organization (WCO) standards, such as the WCO data model.
ASYCUDA.

ADUS of the National ITS Architecture standards for file and data formats are also important for consideration. ADUS is concerned with storing data generated from ITS, integrating it with other data, repackaging it, and making it accessible to a wide variety of stakeholders and applications.

NCHRP Report 754 noted that one of the challenges in developing transportation information best practices is assessing the risk that the file formats chosen to store data and information may become obsolete.⁽⁶⁶⁾ A file format is a particular way that information is encoded for storage in a computer file. There are different kinds of formats for different kinds of information. To preserve content in digital form, data custodians must be able to distinguish between format refinements and variants, because those are significant to sustainability, functionality, or quality. However, this may be difficult because new formats are very complex, and there may be no obvious way to determine the format for a file; different formats are employed or favored in different stages of a content item’s lifecycle; and formats are often proprietary and may be limited to the creator’s available software package.

Factors to Consider When Choosing Formats

As stated in NCHRP Report 754, two types of factors come into play in choosing file formats for long-term needs: sustainability factors and quality and functionality factors.⁽⁶⁶⁾ Quality and functionality factors pertain to the ability of a format to represent the significant characteristics required or expected by current and future users of a given content item. These factors vary for particular genres or forms of expression. For example, significant characteristics of sound are different from those for still pictures, whether digital or not, and not all digital formats for images are appropriate for all genres of still pictures. The following seven factors influence the feasibility and cost of preserving content:⁽⁵⁷⁾

Disclosure: The degree to which complete specifications and tools for validating technical integrity exist and are accessible to those creating and sustaining digital content. However, what is most significant for sustainability is not approval by a recognized standards body, but the existence of (and preservation of) complete documentation.
Adoption: The degree to which the format is already used by the primary creators, disseminators, or users of information resources. A format that is widely adopted is less likely to become obsolete rapidly, and tools for migration and emulation are more likely to emerge from industry without specific investment by archival institutions.
Transparency: The degree to which the digital representation is open to direct analysis with basic tools, including human readability using a text-only editor. Digital formats in which the underlying information is represented simply and directly will be easier to migrate to new formats, more susceptible to digital archaeology, and allow easier development of rendering software.
Self-documentation: Digital objects that contain basic descriptive metadata (the analog to the title page of a book) as well as technical and administrative metadata relating to creation and the early stages of the lifecycle will be easier to manage over the long term than data objects stored separately from the metadata needed to render or understand them.
External dependencies: The degree to which a particular format depends on particular hardware, operating system, or software for rendering or use and the predicted complexity of dealing with those dependencies in future technical environments.
Impact of patents: The degree to which the ability of archival institutions to sustain content in a format will be inhibited by patents.
Technical protection mechanisms: The implementation of mechanisms such as encryption that prevent the preservation of content by a trusted repository. To preserve digital content and provide service to future users, custodians must be able to replicate the content on new media, migrate and normalize it in the face of changing technology, and disseminate it to users at a resolution consistent with network bandwidth constraints. Long-term retention will be difficult if not impossible for content protected by technical mechanisms that prevent custodians from taking appropriate steps to preserve it.

DATA RESOLUTION

A recent white paper prepared for FHWA noted that data resolution needs—both temporal and spatial—for VDA vary according to the application.⁽⁶⁸⁾ Table 9 provides an example of how the resolution needs for volumes, speeds, and travel times vary according to the intended application.

Table 9. Example of matching temporal and spatial resolutions of data to applications.
Data Type	Resolution
	Temporal Application				Spatial Application
	Connected Vehicle	Simulation	RTP	HPMS	Connected Vehicle	Simulation	RTP
Volumes	Very High—subsecond	High—5min	Medium—1 h	Low—24 h	Link	Link	Link and areawide
Speeds				Medium—1h	Link	Link	Link and areawide
Travel Times				—	Link and corridor	Link and corridor	Link and areawide
— Indicates the data type is not applicable to the application.

ACTIVE VERSUS PASSIVE ARCHIVING

The VDA white paper also describes the differences between active versus passive archiving, with the main difference being that active archiving involves a high level of data management methods and protocols.⁽⁶⁸⁾ Figure 32 (taken from the white paper) shows an example of active archiving. Passive archiving is less formal and usually just means that the data are stored with minimal processing. Cost and assigning responsibility (who will be the archivist?) is the major determining factor. For VDA, the main question is whether technology has progressed to the point that active management is no longer necessary on a full-time basis—can it be done virtually as the need arises?

©Cambridge Systematics, Inc.
Figure 32. Diagram. Actively managed VDA Framework, which involves several processes to make data fully usable.⁽⁶⁸⁾

Most of the key issues to be addressed are similar to those faced by the ADUS of the National ITS Architecture. ADUS is concerned with storing data generated from ITS, integrating it with other data, repackaging it, and making it accessible to a wide variety of stakeholders and applications. FHWA-sponsored work led to the development of the following three ASTM standards that are directly relevant for VDA:

ASTM E2259-03a, which covers the technical and institutional principles to be followed in developing an ADMS.⁽⁶⁹⁾
ASTM E2468-05, which developed a metadata standard for documenting an ADMS and associated data user’s guide.⁽⁵⁹⁾
ASTM E2665-08, which is a data dictionary for archiving ITS-generated traffic and travel activity data.⁽⁷⁰⁾

Another excellent source is General Requirements for an Archived Data Management System, a report prepared by Cambridge Systematics, Inc., for FDOT District 4 in preparation for the development of a regional ADMS to store, integrate, and access the following types of data:⁽⁷¹⁾

Probe data: ID, travel times, speeds, metadata, and validation data.
Agency detector data: ID, travel times, speeds, volumes, and metadata.
Agency incident data: location, timeline, severity, and response.
Agency work zone data: location, timeline, and lanes blocked.
Special event data: location, timeline, and affected roads.
511 messages: location, message, and duration.
DMS messages: ID, message, and duration.
Weather data: ID, type, timeline, and severity.

The requirements cover typesof queries (real-time queries and batch queries); user roles and rights (standard users, advanced users, and system administrators); accessing the system; creating, submitting, and retrieving query results; query and result file management; reports; data storage; data processing; data management and archiving; operational requirements (database, application, and networking environment); and nonfunctional requirements (look and feel, usability, documentation and online help, performance, portability, and security).

VERSION CONTROL

It is near certainty that various changes will be made to the calculation procedures related to data integration once reports have been publicly released, and the VDA white paper discusses the need for version control during the development of a performance monitoring program.⁽⁶⁸⁾ For example, consider the impacts that might occur when an improved travel time estimation algorithm is developed after several years of performance reports. The improved algorithm is more accurate, but it also estimates travel times that are consistently shorter. However, several years of performance reports have already been published that use the old algorithm, which produces longer travel times. Instead of simply overwriting the old statistics, they should be retained in the data management system as a previous version that has expired. The new statistics based on the improved algorithm would then be stored as the current version.

DATA TRANSFORMATION

The VDA white paper also provides guidance in data transformation, which is the act of changing data and is quite common in archiving real-time traffic data in a data archive.⁽⁶⁸⁾ As collected, real-time traffic data are highly detailed and are stored in a database designed to allow quick access to current conditions. However, a data archive typically does not retain the original level of data, and the archive database must be designed for quick access to a wide range of historical conditions. Transformation can be as simple as aggregating data over time and space or as complicated as creating new metrics (e.g., travel-time based performance metrics from detector measurements). In all cases, alternative data processing procedures and assumptions can be used to arrive at the same result—the differences in these can lead to inconsistency in the final values. For example, the choice of free-flow speed or the length of a peak period will affect the final transformations. Again, the issue is whether agencies should be free to use whatever methods they choose (with adequate metadata documentation) or whether a standard set of procedures needs to be fostered.

APPLICABILITY TO THE VDA FRAMEWORK

All the examples, standards, and best practices described in this chapter will all be considered in the design and development of the VDA Framework.

Page Owner: Office of Research, Development, and Technology, Office of Operations, RDT

Topics: research, operations, intelligent transportation systems, ITS
Keywords: research, operations, intelligent transportation systems, ITS, Data environments, Technical integration formats, Business rules, Data sharing, Data management, Data sharing agreements
TRT Terms: research, Communication and control, Telematics, Intelligent transportation systems
Scheduled Update: Archive - No Update needed

This page last modified on 03/30/2017