How large corporations use data mining to create value

How large corporations use data mining to create value

Thomas G. Calderon

MOST COMPANIES HAVE DEPLOYED DATA MARTS AND DATA WAREHOUSES, BUT ABOUT 35% SAY THEY DO NOT USE DATA MINING. WE REVIEW THE DATA-MINING PROCESS, WITH A REPORT ON A SURVEY OF PRACTICES AMONG FORTUNE 500 COMPANIES AND AN EXAMPLE OF A DATA-MINING TASK.

Recent literature describes several cases where managers are tapping into their corporate databases and transforming their raw data into knowledge that provides significant business intelligence and competitive advantage. This process, which companies such as Johnson & Johnson, GE Capital, Fingerhut, Procter & Gamble, and Harrah’s Casino have used very effectively to create competitive intelligence, is known as data mining. (1) The data-mining process allows a company to harness data generated from normal business processes to create knowledge for solving business problems. One study reports that the payoff from an effective data-mining project can be as high as $24 million in certain companies. (2) Despite the potential of data mining, management accountants are not very familiar with its concepts, and it is seldom among the repertoire of tools used to create value for their employers.

We will briefly review the data-mining process, report on a survey of data-mining practices among Fortune 500 companies, and provide an example of a data-mining task. Included are insights into (a) the functional areas within Fortune 500 companies that use data-mining techniques, (b) the reasons companies give for not using data mining, (c) the types of data-mining software companies use, (d) the data-mining techniques companies use, (e) the data sources companies use for data mining, and (f) the types of business applications for which companies use data mining. We also offer an example that shows how management accountants can use a data-mining technique to create highly intuitive guidelines to evaluate the financial health of a business. We conclude with a set of recommendations for financial professionals who want to use data mining to create business intelligence and value for their employers.

THE DATA-MINING PROCESS

Whether a project is simple or complex, effective data mining originates with a need for business intelligence and culminates with the creation of such intelligence. Figure 1 shows an overview of the data-mining process.

[FIGURE 1 OMITTED]

In practice, the goal of data mining is often modest. A successful data miner looks for a solution to a well-defined business problem, and data mining provides necessary business intelligence to solve it. After defining the problem, the data miner determines the type and scope of the data needed. Data may come from normal operational activities such as sales and marketing, procurement and logistics, production, and accounting. Data may also come from external, nonroutine sources such as government statistical sources, surveys, and commercial databases. Usually, the data miner must pool the data into a single usable data repository and then build the necessary data-mining databases from scratch or prepare the data from existing business databases.

Management accountants may store data for data mining in either informal or formal data repositories. Informal data repositories include the ad hoc data files they keep to facilitate their projects. These data files typically enable a very restricted function and are maintained by a single person in an end-user-oriented application such as Microsoft Access or Excel. Formal repositories include data warehouses and data marts that managers intend to use as part of the organization’s documented knowledge base. A data warehouse is a systematic repository of large volumes of data that serves as a knowledge base for a company’s business decisions. Unlike operational databases that support ongoing business transactions, a data warehouse includes integrated data for customers, vendors, products, events, and transactions that span several years. Data in a data warehouse may be in summary or detailed form and can come from a variety of operational sources across the organization. Sometimes an enterprise may acquire and store data for a more specialized purpose. Experts refer to this smaller, more focused data repository as a data mart. The common thread running through all those data sources is that data miners use them to create business intelligence that supports management decisions.

Because the data may be scattered in different locations, exist in different formats, or stored in various languages, the data miner must prepare the data for mining. Known as preprocessing, this data preparation activity is often a major task that can take several months to complete, depending on the size of the project. Once this phase is complete, the data miner is ready to build the model that the organization will use to provide intelligence for solving a business problem. Model building is typically a computer-intensive activity that requires both an understanding of the business problem and the data-mining methodology for building the model. The goal of data mining is to identify hidden patterns in a data set that a management accountant may use to solve a business problem. The individual can mine a data set in search of those hidden patterns and produce vital intelligence for creating a competitive advantage.

Management accountants may use several competing data-mining software packages as tools in the mining process. Experts often classify these tools on a continuum in terms of their level of sophistication, ranging from low-end data-mining tools to high-end. More sophisticated data-mining products provide multiple methods and algorithms to enable complex data-mining tasks. They include wizards and editors for data preparation and incorporate scalability and automation to handle complex projects. Low-end tools provide easy-to-use capability to query, summarize, classify, and categorize data, but they do not provide either a high degree of functionality or sophisticated pattern-recognition methodologies.

Financial professionals can use data-mining tools in conjunction with very simple or highly complex database systems. In the simplest cases, they may use a pivot table with data in an MS Excel file. In the most complex cases, a data miner might interface with an enterprise-scale database management system, such as Oracle or DB2, and a high-end data-mining tool, such as Clementine, Enterprise Minder, CART, or Oracle Darwin.

Once someone builds a data-mining model, it is necessary to evaluate and validate that model to assess the likelihood that it will work. Validation involves testing the performance of a data-mining model on data that were not used in building the model. Data miners must do this type of assessment because the accuracy found during the model-building phase applies only to the data used to build the model. There are many models and algorithms that data miners commonly use, such as artificial neural networks, automatic clustering detection, decision trees, link analysis, market-basket analysis, memory-based reasoning (MBR), multivariate adaptive regression splines (MARS), rule induction, logistic regression, discriminant analysis, generalized addictive models (GAM), boosting, and genetic algorithms. Each model and algorithm can produce different results. Because the fundamental goal of a data-mining project is to build a model that can classify or predict an activity or event accurately (such as fraud, bankruptcy, product failure, and warranty claim), the best model is the one that is likely to perform best in the field. Model validation tries to determine that a model will continue to predict or classify with a high degree of accuracy when the data miner uses new, previously unseen data to test the model.

DATA-MINING PRACTICES

We mailed a two-page survey to CFOs of all Fortune 500 companies. The instrument contained 12 questions, which we subdivided into two major sections: (a) general information about respondent and company and (b) technical information about data mining. In the general information section, we asked respondents to identify the functional area in which they worked, the size of their company, and why their company may or not use data mining. In the technical information section, we asked what data-mining tools and data-mining technique( s) their companies used, and we also requested additional information about their data-mining and data-warehousing activities.

We pretested the survey instrument among accounting professors at a midwestern university and pilot-tested it at a large corporation, which we do not include among the respondents. We made several refinements to the instrument as a result of the pretest and pilot.

We mailed the survey initially in fall 2000 and then followed this first mailing with two follow-up requests addressed to persons who did not respond. The response rate was 8.6% (43 of the Fortune 500 companies). Because of the low response rate, nonresponse bias is a distinct possibility in this study. It is very likely that more nonuser entities than user entities were among the nonrespondents. Nonetheless, our survey provides insight into the use of data mining among large corporations.

WHO USES DATA MINING?

Table 1 shows that individuals holding several different titles responded. Most came from accounting and finance, and some also came from information systems, marketing, and purchasing. The various functional areas represented in Table 1 suggest that organizations do not restrict data mining to the domain of either accounting or information systems. All functional areas with a need for business intelligence apply data-mining techniques. This broad-based use of data mining is logical because proponents claim that effective data mining can produce net benefits ranging between $20 million and $24 million. Given this high potential payoff, it seems intuitive that use of data mining will cut across functional lines in business and industry.

Sixty-five percent of the companies that responded to the survey are using data mining. Though this percentage is relatively high, we expect that many more companies will use data mining as the tools become easier and more transparent and as company personnel become more aware of the product and process differentiation opportunities that data mining and knowledge management can facilitate.

WHAT REASONS DO COMPANIES GIVE FOR NOT USING DATA MINING?

Among companies that do not use data mining, a relatively large percentage (36%) say the reason is because they have more pressing business problems (see Table 2). This is intriguing because companies use data mining when managers have pressing questions that can be answered through deep analysis of their own data and related information. Thus, there are additional issues that impede the use of data mining in large corporations. Resource constraints seem to be a major factor because even among Fortune 500 companies there are factors that might constrain management’s ability to deploy resources for it. As we investigated this issue further, we observed that smaller companies tended not to use data mining because they considered it less urgent than more pressing business problems they faced.

Cost is another factor that limits the use of data mining. Training, hardware, software, time needed to build and interpret models, and consulting are among the main drivers of data-mining cost. Thus, it is not surprising that several survey respondents view data mining as an expensive activity with uncertain benefits. Many (18%) do not consider it to be cost effective. This contrasts sharply with previous surveys that suggest the value added from data-mining projects can be as high as $24 million.

Lack of top-management support and concerns about the relevance of data mining are some of the other factors that limit its use in large entities. It is possible that many managers are not aware of the benefits of data mining relative to its perceived cost.

WHAT DATA-MINING SOFTWARE DO COMPANIES USE?

Companies report that they use a wide variety of data-mining software. Oracle’s Darwin, SAS’s Enterprise Miner, and IBM’s Intelligent Miner are the dominant players, with SPSS’s Clementine being used by a smaller number of companies. We observed two user clusters–those entities that use Business Miner, Darwin, and Intelligent Miner (cluster 1) and entities that use Enterprise Miner, MineSet, and Clementine (cluster 2). Companies in cluster 1 reported median sales of $4.2 billion, while companies in cluster 2 reported median sales of $9.65 billion. This led us to conclude that, in general, larger companies tend to use a combination of Enterprise Miner, MineSet, and Clementine, while smaller ones use a combination of Business Miner, Darwin, and Intelligent Miner. Most companies use at least two data-mining software tools. This might indicate that the type of business intelligence that Fortune 500 companies are trying to obtain from their knowledge bases cannot be culled by using a single data-mining tool or that no single tool is sufficient to satisfy the diverse user groups that exist in many large corporations. Whatever the underlying reasons, it is evident that large companies use multiple data-mining tools to extract knowledge from their databases.

WHAT SPECIFIC DATA-MINING TECHNIQUES DO COMPANIES USE?

When asked about the types of data-mining techniques used, respondents indicated that they most often use decision trees and regression modeling, followed by clustering and discriminant analysis (see Table 3, Panel B). The popularity of regression modeling may stem from its ease of use and general familiarity. Unlike many other modeling and analysis tools, regression models are often intuitive, and managers can relate to the uncomplicated equations and statistical output that regression (particularly, linear regression) analysis produces.

Management accountants often use regression analysis to develop cost estimation equations and to identify appropriate cost drivers. This type of information is valuable in activity-based costing and in cost estimation in general. Regression modeling is a logical process that accountants can readily relate to through either heuristics or theory. In contrast to decision trees and clustering techniques, which enable identification of hidden patterns existing in a database, regression modeling assumes the patterns exist based on an existing hypothesis or theory. For example, when estimating costs, management accountants often make implicit or explicit assumptions about the factors that drive cost, and their cost estimation equations confirm and quantify their beliefs.

WHAT DATA SOURCES DO COMPANIES USE FOR DATA MINING?

Table 3, Panel C shows that the majority of respondents (86%) use data marts and data warehouses. Given this high percentage, it seems plausible that companies that use data mining do so at many levels and across several departments. It is also evident that users of data mining use data marts to create business intelligence for specific processes and departments within their organizations. On the other hand, given that about 35% of respondents say they do not use data mining, it seems that many companies with data warehouses or data marts did not engage in formal data-mining activities at the time of the survey. A possible explanation is that many companies with data warehouses and data marts are among those who do not consider data mining cost effective.

FOR WHAT BUSINESS APPLICATIONS DO COMPANIES USE DATA MINING?

When asked about the types of business applications for which they use data mining, corporations said they most often use data mining for financial management and for customer and sales management (see Table 4). Trend analysis (7.5%), direct marketing (6%), database marketing (7.5%), customer retention (6.5%), customer relationship management (6%), customer acquisition (4%), cross-selling (4.5%), and sales analysis (8%) account for 50% of the use.

Table 4 also provides insight into the types of tools corporations use for different business applications. Overall, the table shows that CFOs use cluster 1 software (Business Miner, Darwin, and Intelligent Miner) primarily for activity-based costing, cost-benefit analysis, and credit analysis. They use cluster 2 software (Enterprise Miner, MineSet, and Clementine) for most of the other business applications, including trend analysis, customer retention, and product/market analysis.

USING CART AS A DATA-MINING TOOL

While the preceding discussion provides insight into data-mining practices among large corporations, it does not offer the specifics on what a data-mining task might look like. We will fill that void by using a financial risk assessment task to illustrate the data-mining process.

Management accountants who need to assess the financial risk of a business entity can choose from several different data-mining tools. One such tool is Classification and Regression Trees (CART), a data-mining technique developed by researchers at Stanford University. We chose this technique because management accountants can use it to construct intuitive decision rules to predict, among other things, the likelihood that a business entity will fail. This type of analysis–with the decision rules it produces–is useful in assessing the financial health and overall business risk of trading partners, corporate affiliates, investment partners, and takeover targets.

Consistent with the data-mining methodology outlined in Figure 1, we define the business problem as predicting the likelihood that an entity will fail within one year from the date of our analysis. We believe that five financial ratios, originally used by Edward Altman for bankruptcy prediction, might be useful in making this assessment. These ratios are working capital to total assets (WCAPAT), retained earnings to total assets (REAT), earnings before interest and taxes to total assets (EBITAT), market value of equity to total debt (MKVALFLT), and sales to total assets (SALESAT). We obtained data for the analysis from a Standard & Poor’s research data warehouse called Research Insight. This data repository contains 20 years of financial data for all U.S. companies with publicly traded securities. It also contains similar data for many non-U.S. companies.

We used the relevant financial data for companies in three major stock exchanges (NYSE, AMEX, and NASDAQ) for fiscal years 1992 to 1996. We assumed that that there were hidden patterns in the data set that our data-mining analysis would identify and exploit as the basis for distinguishing between entities that will go bankrupt within one year and entities that will survive. We used CART as the enabling tool to extract the hidden patterns embedded in the data and to develop an accurate decision tree.

With CART, a user must first build the model. Our model is simple. It states that the five financial ratios identified previously will correctly distinguish between bankrupt and nonbankrupt entities. Figure 2 shows the model specifications as represented in CART. The target variable (or variable we are attempting to predict) is bankruptcy. The variables we use to predict bankruptcy are the five ratios mentioned above: WCAPAT, REAT, EBITAT, MKVALFLT, and SALESAT.

[FIGURE 2 OMITTED]

In CART, as well as other data-mining tools, the data miner usually builds the model in two phases. In phase one, we built the model by using a data sample referred to as the learning sample. In phase two, we tested the model to ensure that it would work on new data. Testing the model requires a different data set (known as the testing sample) from the learning sample. Both learning and testing samples must contain cases of failed as well as surviving companies. Our learning and testing samples contained 12,794 and 4,659 cases, respectively. Each sample contained cases of bankrupt and nonbankrupt companies that span the period 1992 to 1996, with data from 1992 to 1995 being used as the learning sample and data from 1996 being used as the testing sample.

Results generated by CART include prediction success of the model and logical rules in the form of a decision tree that a management accountant can use to predict an entity’s financial risk. Data miners judge a model’s prediction success by examining the rate at which it incorrectly classifies entities that went bankrupt (referred to as type 2 error rates) as well as the rate at which it misclassifies entities that did not go bankrupt (referred to as type 1 error rates). We examine type 1 and type 2 error rates for both the learning sample and the testing sample. Type 1 and type 2 error rates for the learning sample provide useful information for building the model. On the other hand, error rates for the testing sample tell how well the model is likely to predict bankruptcy when a manager uses it for assessing financial risk in real-world cases.

LOGICAL RULES AND PREDICTION SUCCESS

Figure 3 shows the logical rules we obtained for predicting bankruptcy based on the data used to illustrate the data-mining process. The rules indicate that the ratio of earnings before interest and taxes to total assets (EBITAT) is the first variable to examine when predicting bankruptcy. If EBITAT is less than or equal to 2.5%, then we examine the ratio of fiscal year-end market value to total liabilities (MKVALFLT). If EBITAT is less than or equal to 2.5% and MKVALFLT is greater than 9.335, then the company is unlikely to go bankrupt within the next 12 months. If EBITAT is less than or equal to 2.5% and the value of MKVALFLT is less than or equal to 9.335, then we examine retained earnings to total assets (REAT). The threshold REAT is 21.5%. If REAT is less than or equal to 21.5%, then the company is likely to go bankrupt within the next 12 months. Companies that exceed this threshold are unlikely to go bankrupt. Finally, companies with EBITAT greater that 2.5% are likely to go bankrupt within the next 12 months only if their ratio of working capital to total assets is 17.5% or less.

[FIGURE 3 OMITTED]

Table 5, Panel B (based on the testing sample) shows that the decision tree in Figure 3 is relatively accurate. Although Panels A and B present the same type of information, Panel B is more relevant because it is based on data not used in building the model and, therefore, gives an idea of how well the model might work in the field. Based on Panel B, the type 1 error rate is a little under 22%, and the type 2 error rate is less than 5%. This means that a prediction that a company will not go bankrupt within 12 months is accurate 78% of the time, while a prediction that a company will go bankrupt within 12 months is accurate about 95% of the time. These predictions are highly accurate when judged against the observation that a fair coin toss would produce a type 1 error rate of 50% and a type 2 error rate of 50%.

Table 5, Panel C shows that although we used five variables to predict bankruptcy, only four were important in distinguishing bankrupt from nonbankrupt companies (fiscal year-end market value/total liabilities; earnings before interest and taxes/total assets; retained earnings/total assets; working capital/total assets). The fifth variable (sales/total assets) is not useful in distinguishing between bankrupt and other companies when we consider the other four variables.

COMPLICATED, BUT USEFUL

We have provided an overview of the data-mining process and presented the results of a survey of data-mining practices. We have examined which companies use data mining, why they use or do not use it, the software they use, the specific techniques and data sources they use, and the business applications for which managers use data mining. Our survey found that many companies use data mining and that clustering, decision trees, and regression modeling are the most popular data-mining techniques among the companies surveyed.

The data-mining exercise shows how a management accountant might use a data-mining tool to create intuitive business rules for assessing a company’s financial risk. Management accountants who want to use data mining must be aware that there are many competing methodologies and that most successful data-mining projects use a multistep process that begins with understanding the business problem. While other steps in the process are vital, it is this first step that determines whether a data-mining model might offer the deliverables that can add value to a company.

Management accountants who want to use data mining as a tool for creating competitive business intelligence should be aware of several important guidelines. First, they should begin the process with a modest goal. A successful data miner looks for a solution to a well-defined business problem. Second, the management accountant should ensure that data are in a suitable format for data mining. Although a company may have a data warehouse or data mart, the data are seldom in a suitable format for data mining. Preparing the data is frequently a time-consuming but highly necessary activity. Third, it is mandatory to create both a learning sample (used directly in building the model) and a testing sample (used to evaluate the model) for data mining. The learning sample size should be as large as possible but must not be smaller than 10 times the number of variables used to predict an event or activity.

Fourth, the management accountant must have some basic knowledge of the model-building process. Very few data-mining tools allow an absolute novice to build an effective model. Model building typically is a computer-intensive activity that requires both an understanding of the business problem and the data-mining methodology for building the model. Management accountants should start with a low-end tool that provides an easy-to-use visual interface. Some good tools for beginners include those available as add-ons to familiar electronic spreadsheets and tools that use highly intuitive graphical user interfaces. The management accountant should begin by using techniques that are more familiar and easier to interpret and use, such as clustering, regression models, and decision trees.

Fifth, after building a data-mining model, management accountants must evaluate and validate it to assess the likelihood that it will work in practice. They should base their final assessment of the efficacy of a model on type 1 and type 2 errors for the testing sample. Finally, because there are competing techniques, a management accountant should attempt to compare the effectiveness of different techniques and select the one that produces the most accurate results.

Table 1: Respondent Information

PANEL A: CURRENT PROFESSIONAL POSITION

Professional Position Frequency Percent

Vice Chairman/

Senior VP/Exec VP 5 12%

Vice President 8 19%

Chief Financial Officer 9 21%

Chief Information Officer 4 10%

Director 7 17%

Manager 3 7%

Others * 6 14%

Total 42 100%

* These include business analyst, architect, comptroller, data

administrator, deputy CFO, and ex-director.

PANEL B: FUNCTIONAL AREA

Functional Area Frequency Percent

Accounting 14 24%

Finance 25 42%

Information Systems 11 19%

Marketing 3 5%

Purchasing 1 2%

Engineering 0 0%

Other 5 8%

Total 59 100%

Table 2: Company Information in General

PANEL A: ANNUAL SALES COMPARISON

Average

Number of Annual Sales

Description Companies (million dollars)

Companies that reported sales 41 $13,740

Companies that use data mining * 28 $11,839

Companies that did not use

data mining 15 $17,835

Companies that did not use

data mining due to pressing

business needs 8 $10,688

Companies that did not use

data mining because it was

not viewed as being

cost efficient 4 $10,000

* The average usage period for companies that use

data mining was found to be 4.7 years.

PANEL B: PERCENTAGE OF USERS BY INDUSTRY

Communications and Information Technology 24%

Energy and Public Utilities 16%

Financial Services and Insurance 40%

Healthcare and Pharmaceuticals 4%

Manufacturing 4%

Retail 8%

Service 4%

PANEL C: REASONS FOR NOT EMPLOYING DATA MINING

Number Percent of

of times times

Reasons mentioned * mentioned

Faced with more pressing

business problems 8 36%

Not viewed as cost effective 4 18%

Lack of familiarity 2 9%

Lack of skilled personnel 2 9%

Lack of top-management

support 1 5%

Not viewed as relevant for

our kind of business 1 5%

Lack of funding 1 5%

Other 3 14%

Total frequency 22 100%

* A single company may have multiple reasons.

Table 3: Technical Aspects of Data Mining

PANEL A: USE OF DATA-MINING SOFTWARE

Number Percent

Data-Mining of times of times

Software mentioned * mentioned

Darwin 10 20%

Intelligent Miner 12 24%

Enterprise Miner 9 18%

Business Miner 5 10%

Clementine 3 6%

MineSet 3 6%

KnowledgeStudio 1 2%

Scenario 1 2%

DataCruncher 0 0%

Others 5 10%

Total frequency 49 100%

* A single company may use multiple data-mining software.

PANEL B: USE OF DATA-MINING TECHNIQUES

Number Percent

Data-Mining of times of times

Technique mentioned * mentioned

Decision trees 16 23%

Regression modeling 16 23%

Clustering 13 19%

Discriminant analysis 9 13%

Neural networks 7 10%

Genetic algorithms 2 3%

Bayesian belief networks 1 1%

Fuzzy logic 1 1%

Others 4 6%

Total frequency 69 100%

* A single company may use multiple techniques.

PANEL C: USE OF DATA MARTS AND DATA WAREHOUSES

Number of Percent

Respondents of users

Data Marts 28 86%

Data Warehouses 28 86%

Table 4: Frequency of Application Use

User

Frequency Frequency

Data-Mining Application of Application (+) Percentage

Sales analysis 16 8.00%

Database marketing 15 7.50%

Trend analysis 15 7.50%

Customer retention 13 6.50%

Customer relationship management 12 6.00%

Direct marketing 12 6.00%

Financial analysis 11 5.50%

Cross-selling 9 4.50%

Fraud detection 9 4.50%

Portfolio management 9 4.50%

Product/market analysis 9 4.50%

Customer acquisition 8 4.00%

Activity-based costing 7 3.50%

Forecasting in financial markets 7 3.50%

Product mix 7 3.50%

Risk management 6 3.00%

Variance analysis 6 3.00%

Cost-benefit analysis 5 2.50%

Credit analysis 5 2.50%

Process control 1 0.50%

Survey analysis 1 0.50%

Theory of constraints 1 0.50%

Other 16 8.00%

Total Frequency 200 100.00%

Use Use

Cluster 1 * Cluster 2 *

Data-Mining Application

22% 78%

Sales analysis 30% 70%

Database marketing 0% 100%

Trend analysis 14% 86%

Customer retention 23% 77%

Customer relationship management 20% 80%

Direct marketing 34% 66%

Financial analysis 43% 57%

Cross-selling 17% 84%

Fraud detection 43% 57%

Portfolio management 0% 100%

Product/market analysis 25% 75%

Customer acquisition 75% 25%

Activity-based costing 25% 75%

Forecasting in financial markets 0% 100%

Product mix 40% 60%

Risk management 25% 75%

Variance analysis 100% 0%

Cost-benefit analysis 67% 33%

Credit analysis 0% 100%

Process control 0% 100%

Survey analysis 0% 100%

Theory of constraints 0% 100%

Other

Total Frequency

(+) A single company may use data mining for multiple business

applications.

* Cluster 1 and cluster 2 refer to clusters of software tools used

for the business applications listed in the table. Cluster 1

data-mining tools include Business Miner, Darwin, and Intelligent

Miner. Cluster 2 data-mining tools include Enterprise Miner, MineSet,

and Clementine. For example, the table shows that companies that use

data mining for sales analysis, database marketing, and trend

analysis primarily use software tools in cluster 1.

Table 5: Data-Mining Results

PANEL A: MISCLASSIFICATION FOR LEARNING SAMPLE

Number of

Class Number of Cases Misclassified Cases Percent Error

Bankrupt 78 12 15.38 (Type 2)

Nonbankrupt 12,716 2,588 20.35 (Type 1)

PANEL B: MISCLASSIFICATION FOR THE TESTING SAMPLE

Number of

Class Number of Cases Misclassified Cases Percent Error

Bankrupt 23 1 4.35 (Type 2)

Nonbankrupt 4,636 1,008 21.74 (Type 1)

PANEL C: VARIABLE IMPORTANCE SCORES

Variable Score *

Fiscal year-end market value/total liabilities 100.00

Earnings before interest & taxes/total assets 89.51

Retained earnings/total assets 87.61

Working capital/total assets 82.43

Sales/total assets 0.00

* This is an indicator of the importance of a variable. Scores range

from zero to 100. The higher the score, the more important a variable

is in distinguishing between bankrupt and nonbankrupt companies.

(1) Attar Software Limited, “Striking commercial gold with data mining at GE Capital,” Available online: http://www.attar.com/ (Accessed June 27, 2000); Christina Binkley, “Casino Chain Mines Data on Gamblers, and Strikes Pay Dirt with Low-Rollers,” Wall Street Journal, May 4, 2000, p. A1; Nanette Brynes, Alison Rea, Geoffrey Smith, and Linda Himelstein, “On the Cutting Edge: Finance firms duke it out,” Business Week, October 28, 1996, pp. 134-138; Peter Coy “Data Visualization,” Business Week, October 28, 1996, p. 150; Robert Groth, Data Mining: Building Competitive Advantage, Prentice Hall PTR, Upper Saddle River, N.J., 2000; John Verity, “A ‘Spreadsheet on Steroids’,” Business Week, October 28, 1996, p. 131; Phillip L. Zweig, John Verity, Stephanie Anderson Forrest, Greg Burns, Rob Hof, and Nicole Harris, “Beyond Bean-Counting,” Business Week, October 28, 1996, pp. 130-132.

(2) Julie Smith David and Paul John Steinbart, Data Warehousing and Data Mining: Opportunities for Internal Auditors, the Institute of Internal Auditors Research Foundation, Altamonte Springs, Fla., 2000.

Thomas G. Calderon, Ph.D., is professor of accounting at the University of Akron in Akron, Ohio. His e-mail address is tcalderon@uakron.edu, and his phone number is (330) 972-6099.

John J. Cheh, Ph.D., is associate professor of accounting and information systems at the University of Akron. He can be reached at (330) 972-6091 or cheh@uakron.edu.

Il-woon Kim, Ph.D., is professor of accounting at the University of Akron. His e-mail address is ikim@uakron.edu.

COPYRIGHT 2003 Institute of Management Accountants

COPYRIGHT 2003 Gale Group