Web services converge with data integration: new ways to unlock the power of data

Web services converge with data integration: new ways to unlock the power of data

James Markarian

* As data integration tools produce and consume information, a service-oriented architecture presents opportunities for information exchange, data-driven process automation, and business agility.

Web services proponents like to rhapsodize about a somewhat quixotic world in which Web services orchestrates disparate components of the enterprise in a seamless, real-time symphony. There’s genuine promise in that vision. But as any IT professional worth his or her salt knows, the devil is in the details. When you begin to look under the Web services hood, it’s clear that certain crucial distinctions tend to get lost amid the froth.

In this article, we’ll drill down into one of the most compelling propositions offered by Web services–as an enabler of data-level integration. Then we’ll explore how Web services plays with data integration to better support real-time information access delivery to the business users who need it.

Leveraging the Proven Value of Data Integration Technologies

Data integration helps organizations leverage their data across a variety of integration initiatives for improved operational efficiency and business performance. Whether a project involves data warehousing, data migration, building master data hubs, improving data synchronization, or implementing business activity monitoring projects, it requires reconciliation and integration of data, often on an enterprise scale, to provide a single view of customers, households, and suppliers. For more than 10 years, data integration has delivered some of the highest ROI of any technology set.

So now Web services comes along, and prompts two questions:

* What does Web services mean for data integration and business intelligence?

* What do data integration and business intelligence mean for Web services?

At first glance, they may sound like the same question. They’re not. They’re two distinct yet related issues that organizations deploying Web services and data integration technology–that is, virtually any mid-size to large enterprise–need to carefully assess.

Data-Level Integration with Web Services

Though data integration has delivered enormous value, many systems built piecemeal throughout the 1990s operate in relative isolation via proprietary APIs and hard-coded connectivity. Web services offers a way to liberate both the integration and analytic aspects of data warehousing, expose them as Web services, and embed this functionality into enterprise applications.

In the first iteration, most flavors of Web services do not address the issue of data-level compatibility among disparate systems. When we talk about Web services, it’s usually in terms of connectivity and interoperability among loosely coupled, disparate applications via such standard protocols as XML, SOAP messaging, WSDL to describe services, and UDDI as a services registry.

Customers who plunge headfirst into a Web services integration initiative may soon find themselves confronted by the old apples-and-oranges dilemma–missing data, incompatible data (for example, different customer IDs between SAP and Oracle application systems that require the use of lookups and lookup tables), format discrepancies, unit of measure differences, and invalid data (data that is valid in one system but rejected by another). Data integration facilitates Web service integration by enabling application-to-application integration at the data level, thus providing solutions to all these problems via data mapping and transformation capabilities, integrated data cleansing, and automatic lookup techniques. A Web service call can invoke a data integration engine to return requested data (as XML) that is transformed from its native format and cleansed for consistency. If data is changed in an operational application, the data integration engine can function as a Web service that propagates data-level changes across target applications.

For example, a wholesale customer phones an inbound call center to order a large quantity of your company’s product. With a data integration engine in the mix, the sales agent can compare the quantity of the customer order to historical averages and the amount in inventory. If the quantity is significantly different, the data integration engine can effect changes in the inventory system or notify the head of procurement about the situation.

With a Web services call to a shipping system, the data integration engine can capture that data, transform it, and propagate it to, say, an SAP financial application and an Oracle customer transaction history record. In turn, the data integration engine can extract, cleanse, transform, and aggregate data from those applications and load it into a data warehouse. Depending on predefined business rules, the system may push out an alert of a large order to the BI dashboard of a sales manager. These sequences are accomplished in minutes or less–in effect, in real time.

Data Integration Evolution: From Tool to Web Services Platform

This is an evolutionary step in the data integration market. Until recently, data integration tended to mean extraction, transformation, and loading, or ETL. It was also associated exclusively with data warehousing and executed in an isolated batch fashion. Gradually, the acronym is giving way to the more comprehensive term data integration. Data integration platforms are engineered to execute five key elements among multiple sources and multiple targets–movement, transformation, aggregation, cleansing, and profiling. They are also now engineered to operate in real time and feature high-performance engines that fully exploit SMP and grid architectures to offer fast performance on large data volumes.

A key enabler for data integration is Web services, and the connectivity and interoperability its standards provide to loosely coupled applications. In the past year, leading data integration vendors have incorporated support for XML, SOAP, WSDL, and UDDI in their products, and delivered open Web services APIs within software development kits. These help support:

* Real-time updates across disparate applications and databases

* A stand-alone data transformation service

* Bidirectional data-level integration

* Ease of development, implementation, and maintenance of a data integration service-oriented architecture

But support for Web services standards in a data integration platform is only the start. Companies that look to couple data integration platforms with Web services should not take for granted that their vendor supplies the functionality necessary for fully optimized systems.

Prospective buyers should look for centralized metadata management support, configuration management, robust security, broad data connectivity including mainframe and AS400, and change data capture.

Metadata Management

Metadata–or data about data–was once something of an afterthought in the larger world of data management. But as systems and data have grown in volume and complexity, more companies are implementing metadata systems for visibility into integration and business intelligence processes, allowing for significant benefits in terms of reuse, productivity improvements, and reduced coordination costs.

Similarly, a metadata repository can serve as a secret source for tracking Web services publication and invocation mechanisms executed via WSDL, UDDI, and SOAP. Metadata provides a means to answer such questions as, “Where did this Web service originate? What’s the source of this data? What does it do, for whom? What formulas are applied to get it here?” It’s a powerful and incisive tool for keeping straight what would otherwise become a spaghetti-like snarl of services.

In a practical sense, suppose you expose a currency conversion program as a Web service. Millions of dollars may be at stake–but what is your assurance that the conversion rate is accurate and the Web service valid? Metadata enables you to monitor and validate the system. Similarly, metadata provides lineage and versioning to data exposed as a Web service that can be crucial in establishing data accountability and monitoring compliance with such mandates as Sarbanes-Oxley and HIPAA.

Best of all, it’s not necessary to redevelop Web service–based access to metadata systems. A metadata-driven data integration platform will automatically publish service-related metadata with negligible cost or risk.

Open Interfaces and APIs

A data integration platform should feature broad support for open standards (such as COM, XMI, and LDAP) and APIs to plug into applications from any number of vendors. This support (along with backwards compatibility) is also important in liberating legacy and mainframe applications for Web services.

Given that Web services is frequently about interoperability, buyer organizations will be well served by a data integration vendor that boasts a long history of support for standards and APIs. The last thing you want is a data integration platform that’s going to require hard-coding around proprietary interfaces. That can negate any productivity gains you achieve through Web services.

Robust Security

SOAP over an SSL transport is necessary, but it’s not enough to ensure data protection, particularly in multi-hop Web services systems. For security that’s suitable for the majority of enterprise applications, look for a data integration platform that supports authentication and authorization as well as in-transit encryption. This is an emerging area and involves acceptance and validation of authentication tokens, as well as encryption, decryption, signing and verification of SOAP messages.

The best data integration platforms will accommodate LDAP, or Lightweight Data Access Protocol, and support leading LDAP servers, such as Java System Identity Server from Sun Microsystems, or Active Directory from Microsoft. LDAP will govern which individuals and applications have access to which data, and where.

Change Data Capture

This technology is crucial to deploying a data integration platform in a service-oriented architecture. Change data capture is a technique by which only data that has changed since an application’s last call is fetched and propagated to the requestor.

As such, it can reduce by orders of magnitude the impact on both a source application and the network in retrieving requested data. That’s a critical consideration when architecting a real-time Web services infrastructure to support an initiative such as business activity monitoring (BAM).

Building Blocks for Visibility

Business intelligence (BI) has proven crucial in enabling organizations to analyze data on customers, sales and marketing, finance, supply chain, and other operational areas. Eager to build on that success, the demand from companies–and the focus of many BI vendors–is increasingly on systems that knit those various elements into a broad analytic fabric across the enterprise.

BI platforms have evolved incrementally. Over the years, companies have deployed tactical BI systems to attack key subject areas. From an architectural standpoint, the data warehousing infrastructure is closely coupled with the operational systems from which it sources data. However, it depends on fairly rigid processes for data access, collection, and analysis.

Web services offers an opportunity for greater fluidity between operational and analytical systems. Along with accepted Web services standards, a subset of protocols aimed at analytics and data management, such as XML for analysis, CWM, and the Java OLAP API, pave the way for developers to link operations and analytics through cost-effective component reuse.

In addition, Web services’ ease of development and deployment gives IT organizations new mechanisms to rapidly extend analytics to meet short-term tactical goals. From a strategic standpoint, this dovetails with objectives for faster and more precise insights into business performance:

* Real-time monitoring of business conditions

* Metrics-driven alerts with analytic drill-through

* Better visibility into dynamics among interdependent processes

These goals are encapsulated in BAM. A key premise of BAM is its real-time monitoring of changes in business data and notification to decision makers. A service-oriented architecture using trigger-driven business rules neatly meets this goal, as in the following example.

A product manager would want to know immediately that a new product release has triggered a flood of complaint calls to a call center in order to identify the root cause and take corrective action. With many implementations of present-day technology, it may take a week or more before a pattern is discerned, and more time before corrective actions may be executed.

With the Web services approach, developers can readily couple the transactional data associated with the complaints to a business rule threshold defined in a BI system. The BI system’s business rules function as an intelligent agent that continuously “listens” for anomalies and delivers alerts to a manager’s desktop. This BI Web service may be described in WSDL, registered in a UDDI library, and invoked by the requesting application via SOAP messaging, with data exchanged in XML format.

In a related area, because a service-oriented architecture separates presentation and application layers, Web services enable developers to embed BI functionality within an operational application. For instance, a call center agent working from a Siebel call center interface can summon information on customer affinity program status and predefined cross-sell and up-sell opportunities stored in a BI system.

Dividends from a Three-Pronged Convergence

To date, much of the attention paid to Web services has been as an enabler of enhanced enterprise interoperability, and more efficient and cost effective deployment of services. Naturally, CIOs, IT managers, and developers are focused on building simple Web service request/response systems. At the same time, they are assessing the long-term viability of Web services standards, as well as the competing J2EE and .NET platforms.

Web services itself, though, is not an “if”–it’s a when. Already it’s evident that Web services will evolve in much the same way data management software evolved throughout the 1980s and ’90s. First, companies deployed relational databases and ERP applications to crunch numbers, record transactions, and track customers. Once those systems were up and running, organizations turned their attention to data integration systems to analyze and understand the raw data.

Though Web services are bound to be deployed in phases, it’s not too early to examine how to build in data integration capabilities at the ground floor. Ultimately, however, the open architecture of Web services will enable developers to plug and play data integration functionality as business needs and IT strategy dictate.

James Markarian is the CTO of Informatica, where he leads their product strategy, defining the key technologies and themes that are instrumental for Informatica’s industry-leading data integration platform. Prior to joining Informatica, James spent 10 years at Oracle Corporation, where he held a variety of positions including senior architect of the Oracle Tools division and development manager of the Oracle Forms product.

jmarkarian@informatica.com

COPYRIGHT 2004 Sys-Con Publications, Inc.

COPYRIGHT 2004 Gale Group