Software Magazine

NT data marts: start small, think big – companies deploying smaller Windows NT-based departmental data warehouses – includes related articles on upcoming products, scalability issues

NT data marts: start small, think big – companies deploying smaller Windows NT-based departmental data warehouses – includes related articles on upcoming products, scalability issues – Product Information

Barbara Francett

Does size really matter? Just ask companies like Oakley, Scientific-Atlanta, and GTE Wireless, which are deploying smaller, NT-based data marts and letting their IS departments run the show.

Talk about serendipity: Windows NT steamrolls its way into popularity as a low-end alternative to Unix. At the same time, IS managers turn to data marts as tactical decision-support solutions. For many, the result is — for once — a harmonious mesh of technology and business need, augmented by the proliferation of Web browsers and the use of the Internet for distributing applications.

Just a couple of years ago, IS managers were scrambling to design and build data warehouses that could house a plethora of information catering to every facet of the business. The result? Many of these top-down, enterprise-scale efforts failed because they were too extensive, too expensive, and simply took too long.

“We had a bad experience [with] a data warehouse,” says Jon Krause, director of IT at Oakley Inc., a $220 million sunglasses manufacturer in Foothill Ranch, Calif. “First, you have to define the model, then figure out how to populate the data warehouse. That takes a couple of years, and by then, requirements change.”

Switching to a bottom-up approach allowed Oakley to take a sales and financial data mart from design to implementation in a matter of months. The company started the project last November, and the mart has been in operation since May. “The need for better access to information and better use of our data was clear, and we wanted to provide quick value to our users,” Krause says.

Currently handling only a gigabyte of data, the data mart uses PowerPlay, a multidimensional OLAP tool from Cognos, and Broadbase, an NT-based data mart and tool suite from Broadbase Information Systems. Though the mart is considered a tactical solution now, the long-term strategy is to evolve it into a multi-subject data warehouse as it expands to encompass customer service and manufacturing data, Krause says.

Oakley will continue to base all the subject areas on a single data model. “It’s important to keep a `single instance’ of data to hit against,” he says. Third-party data from retail customers may also be included in the warehouse, so Oakley’s IT group is now working on defining a format to ensure that “external data matches internal data,” Krause says.

Exterminating Skunks

From an IS perspective, data marts are quick and cheap to design, populate, access, and maintain — which translates into development flexibility, quick pay back, and happy end users. Moreover, by driving data mart development themselves, rather than reacting to departmentally grown marts, IS can prevent standalone, “skunkworks” databases, which lack consistency or a common infrastructure, from gaining a foothold.

“We managed to kill about six skunkworks projects with this [mart],” says Tim McCutcheon, director of information management at Bell Canada, referring to the firm’s financial reporting data mart, which is based on Arbor’s Essbase OLAP database running on NT. Not only were these rogue projects producing some 16,000 reports and costing the company about $1 million annually, “when we put the data into our SQL Server [data warehouse], nothing balanced,” McCutcheon says.

One of the factors fueling the growth of these IS-driven data marts is the upward march of the Windows NT operating system. At Sun Chemical Corp., the world’s largest manufacturer of commercial printing inks, plates, and film, NT is an integral part of the company’s Sun 2000 project. Sun 2000 goals are to update core logistics, manufacturing support, and accounting systems, replace all legacy systems, and develop and implement appropriate data marts, all by the year 2000. The project also sets standards for all divisions worldwide.

“A pillar of [this] standardization is the NT platform on Intel and DEC Alpha hardware for database and file servers,” says David Fritz, manager of data warehousing and reporting for the Ft. Lee, N.J.-based firm. Manufacturing and logistics, as well as financial operational systems, use SQL Server databases on NT platforms. “From these we create separate reporting environments” using the Sagent Data Mart suite of tools for extracting, cleansing, and moving data from the transactional to financial and sales data marts, he says.

According to Fritz, NT is an attractive platform because it is less expensive than a Unix-based solution and goes “hand-in-hand” with SQL Server. “I like the integration. It makes the database easier to manage,” he says.

Another benefit is NT’s ease of use and administration, which also allows companies to get data marts up and running quickly. At Scientific-Atlanta Inc., a $1.2 billion manufacturer of radio-frequency antennas and cable, broadband, and satellite communications equipment, a seven-member IT group took a test-results data mart from design to operation in four months. “It would have been impossible to do [the same job] with Unix with seven people, and we would have needed a Unix system administrator,” says Jim Kirchner, data mining engineer.

The firm designed the data mart for large-scale queries, using Informatica’s PowerMart design and query tool suite. Embedded Web pages make it accessible from corporate intranets. The data mart currently holds 9Gb, and will be up to about 15Gb by year-end, Kirchner says.

Unlike other marts and warehouses, the test-results data mart cannot use aggregate data because test results are variable by definition. “We have to track all those variables,” Kirchner says. “Our statisticians need real values, [not aggregates]. That’s why our data mart implementation is different.”

As products are tested, results are recorded by automated test stations and forwarded to a central server. This server migrates the data to a SQL Server OLTP database running on a multiprocessor Compaq PC. The data is then moved into the Oracle data mart on an NT-based IBM quad-processor optimized for query and retrieval. Managers and statisticians can pull records in real-time, “without having to submit queries to a large IS staff,” Kirchner says.

In Sync with the Customer

Another large firm that’s opted for an NT-based data mart is GTE Wireless in Honolulu. GTE implemented the mart to provide realtime billing data to cellular-phone rental companies. With this database, says IT Manager Benny Mateo, “these customer sites become an extension of our network.”

This need to connect with their phone-rental company partners drove GTE to choose SQL Server/NT as the platform for the data mart. “We had a customer lined up who said they were going to build their system with NT and SQL Server,” says Mateo. “Since we were starting from scratch, we decided to use what they were using.”

The result has been an application unduplicated by any GTE Wireless competitors, according to Mateo. In tourist-magnet Hawaii, cell-phone rental companies provide phones to tourists and tour groups. Before the tourists leave the island, GTE bills the rental company for any calls made from those phones. Then the rental company bills the tourists directly.

“The cellular switch produces a billing tape,” Mateo explains. “We take a realtime feed off the switch and send all the data into an Informix database.” Relevant call data — for example, any calls made to preprogrammed numbers on the cell phones — is moved into the data mart via Platinum Technology’s Info-Pump data movement product.

“We give the rental companies data that this phone made this call at this time for this duration, so they can bill their customers,” Mateo explains. “Once the data is on the NT server, we do some database replication through a frame relay network to customers’ servers, where the invoices are created.” The rental company itself receives a separate monthly statement.

According to Mateo, the link between networks is essential to ensure the timeliness of the data. The interval between the time the billing data leaves GTE Wireless and the rental company generates the invoice ranges from two to eight minutes. “Thirty minutes would not be acceptable,” Mateo says.

As valuable as these data marts have been as point solutions, their success is threatened by the inevitable growth they will undergo. How much growth and how soon it will occur are longer-term factors that IS needs to figure into its Windows NT-based data mart decisions.

“The Intel environment is acceptable for small to medium-sized databases,” says Jim Gamm, senior director, sales automation, at the Sabre Group in Ft. Worth, Texas. Currently, Gamm is developing applications based on NT and Sybase IQ data mart software that Sabre Group’s airline customers will use for analyzing booking data. “Intel boxes max out at four processors,” he says. “I don’t recommend supporting a database over 50Gb on an NT/Intel platform.”

Growing Pains?

But some IS managers aren’t worried about data mart growing pains yet. GTE’s Mateo has run system stress tests and estimates the current system can handle data for up to 20,000 phones “without a hiccup.” Fritz at Sun Chemical is likewise unconcerned about pushing the limits of NT’s capabilities. “I’ll start worrying when [one of the data marts] gets above 20Gb. But we’ve got a long way to go.”

Oakley’s Krause is confident Microsoft will offer more multiprocessing options so NT can scale to accommodate growth. “By the time we get to NT 5.0, it’ll be running on 16-way machines,” he predicts. Nevertheless, he’s covering his bets. “Even if that isn’t the case, we have architectural options to break the data back into data marts.”

For Scientific-Atlanta’s test-results data mart, NT should be able to handle up to 60Gb of data with the aid of an external RAID 5 storage array, says Kirchner. However, as the data mart grows, or evolves into multiple marts, another solution may be necessary.

“Hundreds of Gbs of data may require a clustered environment. If Wolf Pack [Microsoft’s clustering solution] becomes more stable, we’ll stick with NT,” Kirchner says. “Clustering has been a boon to the Unix world. We haven’t been able to do that with NT yet.”

RELATED ARTICLE: Upcoming Products

Don’t Throw Out Those Unix Boxes Yet

Traditional Unix-based database vendors like Oracle, Informix, and Sybase agree that the Windows NT platform is their fastest-growing market segment. Yet Unix sales still dominate their business, and will continue to do so for some time. Essentially, just as Unix-based products co-opted a chunk of the low-end mainframe database market, NT is claiming a piece of the low-end Unix market.

“Customers [adopting NT now] are those who had small Unix boxes, one or two processors, for first-generation, departmental client/server applications,” says Mike Regan, VP and general manager of Sybase’s enterprise database group. “Now they’ re doing the next generation of those applications, and they’re looking at the NT platform.”

In July, Sybase announced two data mart products for NT platforms–QuickStart ReportMart and QuickStart DataStore–both based on its Sybase IQ data mart product. The vendor plans to apply lessons learned from its NT development work to its Unix-based product development as well–for exam pie, adding more ease-of-use features–Regan says.

For Informix, NT data marts present a “big opportunity,” says Brett Bachman, director of enterprise products, “but we won’t abandon Unix.” Informix is targeting the knock NT takes for limited scalability to leverage the parallelism of its Dynamic Scalable Architecture (DSA), a move it hopes will make NT an option for mid-sized enterprise applications–including data marts and warehouses–supporting I O0 to 500 users.

For NT to succeed on an enterprise scale, it needs to develop more Unix-like features, he adds, such as a lightweight threading model, parallel data management, and cluster management features. Informix supportsVIA (Virtual Interface Architecture), a specification for high-speed communications interfaces for server and workstation clusters that is also supported by hardware suppliers Amdahl, Compaq, Hewlett-Packard, and Dell.

Oracle, true to its long-standing tradition of leaving no platform unsupported, offers data mart software suites for both Unix and NT platforms. Integration of suite components–visual design, graphical data extraction, and query, reporting, and analysis tools, as well as the database and a Web server–is key, according to Nell Mendelson, senior director of data warehousing.

The suite for NT, announced in March, is available now for Intel machines. A DEC Alpha version, as well as versions for Unix flavors HP-UX, Sun Solaris, and AIX, will follow in the first half of next year, Mendelson says.


The Issue That Won’t Go Away

For NT-based data marts, scalability is the “elephant in the living room”: Sooner or later, the issue has to be dealt with, because data marts inevitably grow.

According to a recent market study from Informatica Inc., the vast majority of the current data mart market–almost 90%–has less than 50 users per mart.

“That’s why there’s such a big opportunity for NT,” says Paul Albright, Informatica’s VP of marketing. The study also shows that in the next 18 to 24 months, only 25% of data marts are expected to have less than 50 users, while 56% will have more than 100 users.

However, according to experiences reported by Informatica’s PowerMart customers, the number of data mart users increases by roughly I 0 times in the first year of use. At the same time, the amount of data in the mart increases by 300% to 400%.

The Internet has a big impact on why those user levels increase so substantially. As their data marts grow, say customers, their goals are to maintain a centralized view and control of the data mart, but increase distributed functionality, including optimized change data capture, incremental aggregation, and read/writes over the network. “This speaks to the performance and scalability issue,” Albright says.

Not only can individual, or independent, data marts rapidly expand in size, but multiple subject-specific data marts have a tendency to prolife rate as well. At some point, users will need to consider linking these subject areas and making them coordinated data marts dependent upon a centralized data store, an effort requiring shared database schemas, logic, and metadata.

“The problem with independent data marts,” says Dave Gleason, director, information management consulting at Platinum Technology, “is that you can get data marts in different parts of the enterprise with conflicting business rules.” Creating a dependent data mart infrastructure requires what Gleason calls “enterprise reference data,” or processes for common data definitions; an enterprise-wide definition of shared codes and decodes; and “dimension decomposition,” a hierarchical definition and breakdown of business components.

Most NT-based data marts won’t push the limits of their systems for another year or two. The current limit of four-way multiprocessing hasn’t crimped SQL Server’s business at all, says Joe Brown, data warehouse manager at Microsoft. “Customers can start modestly and do incremental increases,” he says. The SQL Server Enterprise product, due to be announced in November, should address six-and eight-way multiprocessing, Brown adds.

Nevertheless, users should make a long-term plan now, emphasizing a common data model, says Lance Miller, vice president of technology at Customer Insight Co., a marketing services and methodology provider. “They must also plan for the synchronization and replication issues” that will arise in a linked environment, he says. “They have to determine who synchronizes with whom and which is the system of record. Otherwise, it can be an administrative nightmare.”

COPYRIGHT 1997 Wiesner Publications, Inc.

COPYRIGHT 2004 Gale Group