Comprehensive Transportation Asset Management depends upon the availability of fully-integrated data. The process of integrating data is complex, and can be quite challenging. This is especially true when organizations are used to standalone records or database systems that rarely communicate with each other.
Where to begin? A thorough analysis of an agency's Transportation Asset Management activities is the ideal place to start. This helps an organization pinpoint needs, priorities, and existing capabilities for data integration. Before the analysis begins, however, it is wise to establish a data integration team consisting of all stakeholders in the TAM and data management processes.
Figure 2 provides a general outline of the key activities in the Data Integration Process, along with things to consider for each activity. Analyzing requirements is the first step in the process, followed by data and process flow modeling, then the definition, evaluation, and selection of alternatives. After this, database design and specification can be pursued. Finally, the development, testing, and implementation of the chosen database integration strategy can be implemented.
|Figure 2: The Data Integration Process|
Evaluating the cost of the data integration process depends upon several key factors, including the availability of existing:
Change is a constant in today's transportation agency. Maintaining momentum during organizational shifts is a key ingredient for the success of a long-term process such as data integration. In fact, agencies may wish to leverage windows of opportunity presented when top management changes or new requirements are mandated by the Federal government regarding Transportation Asset Management. Once established, the data integration system quickly becomes so integrated into business processes that future changes in the management or budget environment are unlikely to undermine support for it, or its value.
Philosopher, Plato, famously noted that "the beginning is the most important part of the work." The most important stage of data integration is said to occur at the beginning: a requirements analysis. Depending upon the size and extent of integration, this can be a complex and time intensive step. Several areas must be examined to develop criteria for the best integration strategy.
Integrated databases can support a variety of functions, typically: inventory, data handling, decision-support processes, and systems for creating, acquiring, or maintaining pavements, bridges, tunnels, roadway hardware, equipment, and other physical transportation assets. To begin the requirements analysis, each business process is characterized according to the types of information it uses and produces, and the individuals who must be involved to do so. A system to support sign inventory and condition assessment, for instance, would identify key types of related information. These might include location, sign type and reflectivity, sign maintenance history, sign age, and the staff involved in assessing these data items (e.g., field crews, sign managers, and district and headquarters maintenance managers).
In any system supported by data integration, the requirements of data users such as field, technical, and management staff must be considered. Cooperation and involvement at all staff levels is critical to a successful integration strategy.
Requirements analysis includes ascertaining from a variety of staff where and how they obtain data, the business processes and information systems supported by that data, and any concerns they have about integrating databases. The very act of collecting this preliminary information helps to promote cooperation. Every data user must see that the strategy includes information relevant to his or her requirements and beneficial to the work at hand.
Each transportation organization is unique and a requirements analysis should reflect the individual agency's characteristics. This includes recognition of the various groups that will be impacted by data integration. Each group's business process needs to be understood, along with factors such as the relationships between and within groups, staff skills and capabilities, availability of staff to collect additional data, and how receptive staff members are to data integration (how much they feel it will improve their effectiveness). The broad operational climate of the agency must also be taken into account. Is decisionmaking in the organization, by nature and practice, centralized or decentralized? How can an integrated data system best support either framework?
It is critical to recognize these realities and involve all stakeholders in, first, evaluating the optimum process for integrating the agency's data, then, migrating the data from traditional information structures to the integrated environment, and finally, testing and using the new data system. This creates the maximum level of trust, cooperation, and enthusiasm for the benefits of data integration. Full stakeholder involvement drives the highest possible return on the integration investment.
One critical area served by optimal stakeholder buy-in is the mapping of current information systems infrastructure. A clear view of the current picture helps the agency determine which software, hardware, and communications strategies will be required to integrate databases. From this analysis, the agency can then gauge its level of readiness for data integration. Most importantly, the analysis helps identify which potential data integration strategy can best marry the existing resources with the new infrastructure.
Useful information at this stage of the process includes an inventory of existing computer programming environments and database management or mapping software or servers, as well as computer hardware and operating systems.
Software systems are a particularly important component in planning the most efficient collection and reorganization of current data into the new structure. Many agencies use GIS software to manage a wide range of data inputs. It is important to ask what other software platforms contain information that must be identified, understood in context, and eventually harvested to build the integrated system.
Key questions to ask when analyzing existing data and database systems within an agency might include:
The objective of data and process flow modeling is to create a picture of the relationships between information and the business functions that the information supports. Data flow diagrams help database engineers and analysts determine the design specifications for the data integration system. All data and business processes identified in the requirements analysis can be captured in flow diagrams. A variety of software products exist to support this function.
To understand how data flows through an organization or agency division, analysts must know who collects the data, where it is stored, who uses it, and what levels of access users need (i.e., whether they need to modify, to view, or to update the data). It is also important to ascertain who "owns" the data, to provide guidelines or structure for its stewardship, and to establish a system of governance that protects the integrity of the data.
The current type, status, form, location, and uses of this information are first examined to determine data integration needs and opportunities. A path is mapped that allows specific information within each location or category to be accessed. Information extracted from a Bridge Inventory Record, for instance, will be maintained according to a protocol that allows it to relate to and be accessible across a wide platform of users for a broad set of purposes. In a well designed data integration system, inventory information might flow smoothly into a set of data that helps staff assess the structure's condition, leading to decisions about maintenance and rehabilitation. In a less cohesive environment, these categories of information are unlikely to be aligned in such a way that they can be readily synthesized to produce the most accurate picture from which to make sound decisions.
In the bridge example shown in Figure 3, basic information required to monitor the status of a bridge structure might reside in several general locations managed by a variety of departments.
Figure 3: Data and Process Flow: A Bridge Example
Once requirements are analyzed and the flow of data is diagrammed, feasible integration alternatives can be identified. Two general approaches are available: fused databases and interoperable databases.
Data fusion (also known as data warehousing) combines information from multiple sources for the one-time use of making them accessible for data integration. The sources of fused data can be eliminated when the data is migrated to a centralized location. They can also continue to exist independently to serve various business processes. Ultimately, all fused data reside in a single database server with substantial processing and data storage capacity. All personal computers and terminals go through this server to access the data and perform the functions supported by the data "warehouse."
Interoperable databases (also known as federated or distributed systems) consist of a series of data sources that communicate among themselves through a multi-database query. This requires a new interface through which a data source, such as an existing database, can be viewed and manipulated. In this environment, data reside in computers or database servers located in a variety of places, but each is linked through a computer network and viewed via the master interface. With interoperable databases, one computer can access or add to another's information.
Figure 4 shows how each of these options supplies access to data in response to a specific question. In this example, an integrated data system helps the agency conduct a proactive safety analysis.
Figure 4: Data Integration Alternatives
In data fusion, the data is gathered, cleaned to remove inconsistencies, and exported to a centralized database. There it is stored in a format that replicates the way the data would be viewed in the source location. This allows users access to vast stores of data. The data fusion, or warehousing, program relies on a common user interface to organize the relevant subsets of all component databases from which it is fed, and specifies the rules for fusing the data it acquires.
Often, this requires converting a database and its applications from one format to another. Data are then shipped from the legacy system to the new one, using data reengineering or other integration methods. An agency can choose to continue to use the data in the old format after it is made available to the centralized data warehouse, or the old infrastructure can be abandoned when the warehouse is complete.
Regardless of the method used to achieve data fusion, the database management system is key-it must be able to handle the accumulation and management of a large amount of data while still ensuring that it can be accessed quickly and easily. In this sense, the interface with a fused database is something like using an Internet search engine to learn about a given topic-it delivers rapid access to information from a variety of sources, without requiring the user to have advance knowledge of each of those sources.
When fusing data, the variety of databases or formats, as well as sources and applications, can make it difficult to ensure the integrity of the information in each database. This complicates the task of mapping the movement of data from old systems to the new one and it is here that skilled database managers and information technology professionals, whether agency staff or consultants, provide critical solutions.
There is no single approach to data fusion that will meet all agencies' needs. The approach to data warehousing will continue to evolve as agency experience and technology advance.
Arizona DOT faced technical, cultural and business process challenges in its data integration effort. The agency chose a data warehousing approach and pulling data from many sources into a single repository exposed quality issues and data disconnects that had to be addressed at the source. As a result, the agency's strategy targeted cultural and process issues concurrently with technology changes.(15)
As the name implies, an interoperable, or federated, database approach is one in which a variety of databases are linked through a communications network so that all appear to form a single source. Users can access and manipulate data from a variety of original sources, without harming the integrity of each source, and without having to learn the data model or write their transactions in the language of the source. Planners, for instance, can easily access and apply to their processes information from environmental engineers within their agency whose data might be maintained in a different format.
Interoperable databases allow a user to make a query without concern for where the data resides or how it is organized at its source location. A "federated view" is somewhat akin to shopping online at a department store website-it provides access to the product a user seeks without that person having to search for or be familiar with the universe of possible suppliers.
A federated view is created when an appropriate data interface is set up to link individual databases. The integrated format hides the complexity and distribution of the underlying component databases. Such a system can support different data models and execute transactions written in various data languages. It does not require the migration of all agency data into a single format, so it leaves intact the complexity and depth of data at its source.
Decentralized agencies are a natural fit for interoperable databases. While information may be structured consistently within a division, there is often great variation in data management formats across divisions. An interoperable approach works well when multiple well-organized subject databases exist (pavement, bridges, etc.), but there is a substantial need for access to the data for agency-wide applications, such as maintenance management.
In short, the advantages of interoperable over fused (or centralized) database systems are that they provide:
It is easy to imagine the myriad complications of developing a platform in which a wide variety of data sources converse in a universal language. In fact, the disadvantages of interoperable databases include:
Federation does, however, allow agencies to preserve their investment in legacy systems while substantially improving data sharing capabilities. This, in turn, improves access to the information needed to maximize service to the highway customer.
Table 2 provides a quick reference guide to the chief advantages and disadvantages of fused and interoperable databases. More generally, however, agencies will want to consider four key elements when evaluating integration alternatives:
Additional evaluation factors might arise from the requirements analysis, including the identification of unique agency needs.
Table 2: Comparison of Fused and Interoperable Databases
|Characteristics||Fused Database (Data Warehouse)||Interoperable Database|
|Number of Data Servers||One (central)||Multiple (distributed)|
|Location of Data Server(s)||Single site||Multiple sites|
Easy to manage and control the databases.
Maximum data processing power (quick access to the database).
Able to handle large amounts of data and processing requests.
Provides data security.
Can keep data in independent locations and file servers (autonomy of sites).
No reliance on a single site that can become a point of failure.
Changes made to data at one location can propagate quickly to become visible at other locations.
Unified description of all data-no need to know database models.
Allows access to resources in the computer network.
Requires considerable time and resources to implement.
Data is generally in read-only format and cannot be updated online.
Storage requirements can become a major problem.
Hard to support and maintain integrated (global) data model.
Need to rebuild the database system every time data export protocols change.
Requires rigorous procedures for database access and updates.
Moving toward its integrated Transportation Asset Management process, Michigan DOT adopted four of the guiding principles identified by the National Performance Review for gathering data. Those principles are that data gathering must be:
Once requirements are analyzed, processes are diagrammed, and the most effective data integration alternative is selected, an agency is ready to develop a detailed implementation plan. This includes a schedule for database design, development and testing, training, as well as plans for software and hardware purchases.
The process of developing the design and specifications for the database can help generate this plan. It can also help define the overall approach to the database development effort.
Database design for both fused and interoperable environments includes certain common components:
A data model is the way a database is structured and configured to serve the needs of its users and the way data is organized. Data and process flow diagrams developed early in the integration process will help point up the appropriate models for agencies to use when integrating Transportation Asset Management data.
Typically, there are at least five different data structure options, or models.
The agency's existing models may be good candidates for use in data integration depending on the format. Relational databases (with spatial components) are the easiest to integrate given a standard location referencing system. Ideally, the data models chosen will be directly compatible with applications that will use the information. If not, a data conversion or transformation process can be developed to help applications use information that exists in a different format.
With many potential inputs, data entry personnel, and sources of data, standards must be developed to ensure that the information is optimized for use in the chosen system. Data standards are, in effect, a set of rules for representing, accessing, manipulating, transferring, and reporting information. Some of these standards may already exist within an agency (and, if so, will be identified in the Requirements Analysis stage) and some may need to be developed. New standards would be drawn from an analysis of the components of the data itself and of the processes that draw information for particular applications.
Three types of data standards are commonly used:
Each is vital to a well-functioning integrated data system. Standards help to ensure that, as data is used and manipulated on an ongoing basis, the quality of the information stored and accessed is maintained at a consistent level and cannot be degraded by a patchwork of users and practices.
Alaska DOT's work on data governance and data business plans is on the cutting edge.(17)
Reference systems are used by the database management system to link and relate separate data files. Location referencing systems, commonly used in the highway arena, can form a backbone for the organization of information in the data integration process. Route number and milepost, for instance, can be used as a common element through which to access and deliver information regarding processes as diverse as maintenance decision-making and pavement design. The coordinate reference system (involving latitude and longitude) might also be used.
A location referencing system can also be used to map and analyze data using GIS software. Administrative data, such as accounting, human resources or inventory information, on the other hand, will likely be referenced by a means not related to location.
A highway location reference system is a set of office and field procedures that includes a highway location reference method.
Confused? A location reference method is a way to identify a specific location with respect to a known point. A location reference system is set of procedures that relate all locations to each other and includes techniques for storing, maintaining, and retrieving location information.
Typical highway location referencing methods include those that are:
Michigan DOT decided to abandon all existing linear referencing systems and adopt a single statewide system. This allowed consistency among key data components and enabled sharing within the agency as well as among county and city agencies and the State Police. To implement, MDOT worked with the Michigan Center for Geographic Information to fund the development of a statewide GIS capability built on a GIS basemap for use by all State agencies. The multi-agency partnership eliminated many process steps that occur between data capture, integration into appropriate shared databases, and final dissemination across State government.(19)
Metadata is a set of information that is needed to best access, understand, and use other information in a database or information environment. In other words, it is data about data. Exponential growth in the Internet and other communications channels, as well as improved access to all forms of information, has challenged government, business and others to manage effectively ever more complex sets of data. This has driven the need for standardized ways to manage information about such content, spawning the concept of metadata. The Dewey Decimal System, which is the way libraries traditionally organize books and reference material, is one very basic form of metadata.
"Metadata is key to ensuring that resources will survive and continue to be accessible into the future." -"Understanding Metadata," National Information Standards Organization (NISO)
Metadata sometimes refers to information that a computer or program can read and understand in order to organize the location, delivery, or storage of data. Other times it refers to records that describe information available electronically. It can involve any level of information access, from a single record to a large, aggregated database.
A variety of metadata standards and models have evolved in the highway program, and some of these have their own sub-layer of standards (taxonomy, vocabulary, thesauri, etc.) to convey additional information. Sets of metadata, often called metadata schemes, might be expressed in a variety of different programming languages, and communicated in a variety of forms or syntaxes, such as in HTML or XML.
Metadata can help organize a set of data and it can also help facilitate the migration of existing blocks of information into an integrated environment, including both interoperable and fused databases. It also aids tremendously in the recovery of information from integrated databases.
Table 3: MetaData Sample Scheme
|1||Sec_num, Pave_ID||DECIMAL||Unique record identifier|
|2||Ipmp_rte||CHAR||IPMP GIS database/Iowa DOT road name|
|3||Loc_rte||CHAR||Local agency road name|
|4||Lit_desc||CHAR||Literal description of section location|
|5||Co||CHAR||County number for the county where city is located|
|7||Gen_surf_t||CHAR||Current (general) surface type from local agency (P-PCC, A-ACC, C-Combination)|
|8||Const_yr||DECIMAL||Year constructed from local agency|
|9||Fed_fc||DECIMAL||Federal functional classification|
|10||Avg_liri||DECIMAL||m/km||Average left IRI, 999 = invalid value|
|11||Avg_riri||DECIMAL||m/km||Average right IRI, 999 = invalid value|
|12||Avg_lrut||DECIMAL||mm||Average left rut|
|13||Avg_rrut||DECIMAL||mm||Average right rut|
|14||Allig_m||DECIMAL||m^2||Area of medium severity alligator cracking|
|15||Allig_h||DECIMAL||m^2||Area of high severity alligator cracking|
The Iowa Pavement Management Program (IPMP) offers a good example of the elements that might be included in a metadata scheme for a transportation agency.
Generally, information technology professionals develop metadata appropriate to the data integration needs of a transportation agency. Sometimes, though, basic descriptions of the information sets are provided by the creators or collectors of the data. A variety of basic tools are commonly used in the development of metadata including templates, mark-up tools, extraction tools, and conversion tools.
Once a scheme is in place, "crosswalks" help map metadata elements-such as semantics and syntax-from one metadata scheme to another, fueling the effectiveness of even the most complex data integration strategies.
The information standards community as a whole continues to develop ever more effective metadata approaches and schemes, which are then applied to specific industry practices.
As metadata has grown in importance, the use of data dictionaries has progressed as well. A data dictionary is a central repository-sometimes called a metadata repository-in which the meaning, origin, use and format of information, along with its relationship to other metadata, are described and maintained. It is basically a register that lists all files contained in each database in an integrated data system, the number of records in each file, and the names and types of fields contained in each record.
The term "database dictionary" can be used broadly to refer to either a document, a component of a database management system (DBMS), or a piece of "middleware" that extends or replaces an existing data dictionary within a DBMS. Depending upon their design, data dictionaries can help facilitate various levels of access to information in an integrated data system. Examples of these functions include simplifying the input required to generate a report that results from a query, and automating the assembly of code required to use data from or enter information into a system. While a data dictionary does not contain any actual source data, it is used by a DBMS to find and use such information.
Database design and specifications drive decisions about which software and hardware will best serve the integrated data strategy. Such elements as servers, network communications, data mapping, user interfaces, computer operating systems, and programming environments will be chosen to support the integrated system.
Both software and hardware choices are involved here and they are critical because this is how and where the system's databases are stored and manipulated. In making decisions about the database server, consider the following:
A variety of highly secure, customizable, Web-enabled commercial products now exist to support thousands of distributed users. Both software and hardware are becoming better, faster, and cheaper as use and technology advances.
The data integration marketplace now includes a variety of commercial software packages for database management. An agency can either purchase a commercial product and customize it to suit its needs, or build its DBMS from scratch.
Geographic information systems lend themselves beautifully to the organization of data within transportation agencies. Beyond a tool to pinpoint a specific location on the map, in an integrated data environment, GIS software analyzes and queries information. It aids in the construction of spatial databases of transportation networks and features. Additionally, GIS software allows users to conduct a variety of analyses and applications on that data, while integrating management and decision-making information and processes.
GIS provide the common information threads that allow a variety of databases and DBMS to communicate with each other.
In the simplest terms, GIS is the merging of cartography (graphic elements) and data (raw information) through the use of database technology. In a generic sense, GIS applications are tools that allow users to perform a variety of functions, including creating interactive queries (such as user-created searches) analyzing spatial information, editing data, creating maps, and presenting the results of all these operations in a manner that can be easily communicated to the end user. In specific terms, GIS integrates, stores, edits, analyzes, queries, shares, and displays geographic information graphically.
GIS tools can play an important role in the process of data integration for any transportation agency because they link disparate sets of information to facilitate decision-making. For instance, GIS can be used to overlay pavement condition data and vehicle crashes to determine if there is a relationship between infrastructure condition (highway roughness and friction) and safety. Other information such as traffic volumes and roadway function class can be used to determine the most appropriate pavement marking material.
GIS also allows transportation agencies to leverage the information available through other State or local agencies, such as departments of natural resources, to inform decisions about roadway maintenance and construction. One example is a cutting edge practice in which an agency overlays transportation and environmental spatial data to make decisions about corridor-wide mitigation options that maximize the efficiency and impact of roadway-related environmental quality investments.
Pennsylvania DOT's GIS work spanned several years and a wide range of issues. Critical success factors identified by staff include:(20)
Agencies have the option of choosing COTS software that supports the basic, generic functions of Transportation Asset Management processes and can be tailored to serve specific agency data integration routines and applications. Typically, this software ranges from enterprise-wide suites of applications (known as enterprise resource planning-or ERP-software) to products that can be used for several asset types or TAM processes. Where applicable, COTS can save time and money in the data integration process.
An ERP system is a multi-module application software that supports a broad set of activities to help an agency or business manage important parts of its business. Typically, ERP solutions operate in a client-server structure, but some work with midrange and mainframe computers as well. ERP systems are typically Windows-based, grouping functions that serve such areas as accounting, financing, manufacturing and human resources. The systems work together to control and analyze related data, often in real time-a tremendous benefit to Transportation Asset Management. A variety of COTS now exist to support specifically transportation data management and integration.
A final step in the database design and specification stage is the setting of responsibilities for management and administration of the integrated databases. This includes identifying those who will manage database development (programming, prototyping, and testing), procurement (software and hardware purchases), systems administration (computer network setup), and database general management (maintenance and upkeep).
The final phase in data integration involves software development and system implementation. In this stage, prototype software and use case applications are developed, computer systems and network communications are set up, and the database is populated with information.
Database models, data management applications, and communications interfaces are tested, evaluated, and modified as needed. It is now time for the appropriate personnel to be trained in the requirements and uses of the new system as well. It is wise to approach this stage in as modular and incremental a fashion as feasible since an agency will almost certainly need to make future, ongoing additions or changes to the database and many components of the integrated environment.
When creating software, developers ideally create prototypes and use case applications in tandem so that they can focus their attention on the data users and how they actually work with the system. Initially, a prototype will likely consist of major program modules designed to move data back and forth between screens, databases, reports, and inputs and outputs used to communicate with other data systems. At this stage, the software may perform only limited amounts of actual data processing. Subsequent versions of the programs that perform complete data processing functions will eventually replace the prototypes.
Pilot-level rollout programs provide the opportunity to test the validity of a system before undertaking a large-scale implementation. Virginia DOT initially implemented its system in three counties-rural, urban, and mixed rural/urban-then fine tuned data collection processes and technologies for use throughout the rest of the State.(21)
Network communications components of the integrated data system are put in place during the programming and software development stage, once the interface between databases is ready for setup and testing. Significant effort and agency resources may be expended at this stage, depending upon the scope of integration and the type of network configuration an agency chooses. Generally, the more complex the communications requirements of the integrated database system in terms of connectivity, data access rates, data retrieval and processing, or system flexibility and reliability, the more time and money will be required.
As the new system nears its operational launch, a variety of agency staff can be educated as to the design and uses of the system and trained to operate or maintain its various functions. Training might include, at a minimum, the following:
The final-and arguably, most important-step in the development and implementation phase is, of course, the actual population of the integrated database system with data, whether new information, or material that is a legacy of the previous systems.
This Primer describes requirements analysis, data and process flow modeling, alternatives evaluation, detailed database design, software development, and implementation. Application of each of these methodologies help ensure the most successful possible outcome for a data integration initiative. Within various stages of this complex process, a few general guidelines serve an agency well: