One of the most fundamental challenges in the process of data integration is setting realistic expectations. The term data integration conjures a perfect coordination of diversified databases, software, equipment, and personnel into a smoothly functioning alliance, free of the persistent headaches that mark less comprehensive systems of information management. Think again.
The requirements analysis stage offers one of the best opportunities in the process to recognize and digest the full scope of complexity of the data integration task. Thorough attention to this analysis is possibly the most important ingredient in creating a system that will live to see adoption and maximum use.
As the field of data integration progresses, however, other common impediments and compensatory solutions will be easily identified. Current integration practices have already highlighted a few familiar challenges as well as strategies to address them, as outlined below.
For most transportation agencies, data integration involves synchronizing huge quantities of variable, heterogeneous data resulting from internal legacy systems that vary in data format. Legacy systems may have been created around flat file, network, or hierarchical databases, unlike newer generations of databases which use relational data. Data in different formats from external sources continue to be added to the legacy databases to improve the value of the information. Each generation, product, and home-grown system has unique demands to fulfill in order to store or extract data. So data integration can involve various strategies for coping with heterogeneity. In some cases, the effort becomes a major exercise in data homogenization, which may not enhance the quality of the data offered.
Data quality is a top concern in any data integration strategy. Legacy data must be cleaned up prior to conversion and integration, or an agency will almost certainly face serious data problems later. Legacy data impurities have a compounding effect; by nature, they tend to concentrate around high volume data users.
If this information is corrupt, so, too, will be the decisions made from it. It is not unusual for undiscovered data quality problems to emerge in the process of cleaning information for use by the integrated system. The issue of bad data leads to procedures for regularly auditing the quality of information used. But who holds the ultimate responsibility for this job is not always clear.
The unanticipated need for additional performance and capacity is one of the most common challenges to data integration, particularly in data warehousing. Two storage-related requirements generally come into play: extensibility and scalability. Anticipating the extent of growth in an environment in which the need for storage can increase exponentially once a system is initiated drives fears that the storage cost will exceed the benefit of data integration. Introducing such massive quantities of data can push the limits of hardware and software. This may force developers to instigate costly fixes if an architecture for processing much larger amounts of data must be retrofitted into the planned system.
Data integration costs are fueled largely by items that are difficult for the uninitiated to quantify, and thus predict. These might include:
It is important to note that, regardless of efforts to streamline maintenance, the realities of a fully functioning data integration system may demand a great deal more maintenance than could be anticipated.
Unrealistic estimating can be driven by an overly optimistic budget, particularly in these times of budget shortfall and doing more with less. More users, more analysis needs and more complex requirements may drive performance and capacity problems. Limited resources may cause project timelines to be extended, without commensurate funding. Unanticipated issues, or new issues, may call for expensive consulting help. And the dynamic atmosphere of today's transportation agency must be taken into account, in which lack of staff, changes in business processes, problems with hardware and software, and shifting leadership can drive additional expense.
The investment in time and labor required to extract, clean, load, and maintain data can creep if the quality of the data presented is weak. It is not unusual for this to produce unanticipated labor costs that are rather alarmingly out of proportion to the total project budget.
User groups within an agency may have developed databases on their own, sometimes independently from information systems staff, that are highly responsive to the users' particular needs. It is natural that owners of these functioning standalone units might be skeptical that the new system would support their needs as effectively.
Other proprietary interests may come into play. For example, division staff may not want the data they collect and track to be at all times transparently visible to headquarters staff without the opportunity to address the nuances of what the data appear to show. Owners or users may fear that higher ups without appreciation of the peculiarities of a given method of operation will gain more control over how data is collected and accessed organization-wide.
In some agencies, the level of personnel, consultants, and financial support emanating from the highest echelons of management may be insufficient to dispel these fears and gain cooperation. Top management must be fully invested in the project. Otherwise, the likelihood is smaller that the strategic data integration plan and the resources associated with it will be approved. The additional support required to engage and convey to everyone in the agency the need for and benefits of data integration is unlikely to flow from leaders who lack awareness of or commitment to the benefits of data integration.
At least three conditions were required for the success of Virginia DOT's development effort:
As more transportation agencies nationwide undertake the integration of data, the availability of experienced personnel increases. However, since data integration is a multi-year, highly complex proposition, even these leaders may not have the kind of expertise that evolves over a full project life-cycle. Common problems develop at different stages of the process and these can better be anticipated and addressed when key personnel have managed the typical variables of each project phase.
Also, the process of transferring historical data from its independent source to the integrated system may benefit from the knowledge of the manager who originally captured and stored the information. High turnover in such positions, along with early retirements and other personnel shifts driven by an historically tight budget environment, may complicate the mining and preparation of this data for convergence with the new system.
When transportation agencies consider data integration, one pervasive notion is that the analysis of existing information needs and infrastructure, much less the organization of data into viable channels for integration, requires a monumental initial commitment of resources and staff. Resource-scarce agencies identify this perceived major upfront overhaul as "unachievable" and "disruptive." In addition, uncertainties about funding priorities and potential shortfalls can exacerbate efforts to move forward.