DATA MINING TECHNOLOGY FOR PROBLEM SOLVING AND KNOWLEDGE CREATION


rit_cor.jpg (689 bytes)

By
Sunil Kumar,
Hari Babu,
Shankar Kumar Choudhary,
Vinit Kumar Mathur
and
Hridayeshwar Jha

 

 


Background


A Data Warehouse (DWH) was recently installed at the LD2 and Slab Caster Shop to make data available to plant engineers in a user-friendly manner and for conducting analysis linked to improvement projects. The DWH is designed to store large volume of data generated on different computer (hardware / software) platforms, at the several sections of the shop including hot metal pretreatment, primary and secondary steelmaking, slab casting and conditioning. The Data Warehouse facility was developed using the technology of IBM that includes DB2 UDB for windows, DB2 Warehouse Manager, DB2 OLAP Server and Intelligent Miner. COGNOS Impromptu was employed to develop standard reports. COGNOS Impromptu and Power-play are end-user tools for viewing reports and analysis cubes respectively. Thus, in addition to storing vast volume of data, the DWH facility is equipped with number of special tools for conducting data analysis. The focus of this paper is Data Mining Technology which makes available a wide range of special techniques for problem solving and knowledge creation.

What is Data Mining?


Data Mining, in simple terms, is extraction of previously unknown and/or potentially useful information from a historical database. Data Mining algorithms are based on techniques such as hypothesis generation, classification, prediction, association rules or affinity grouping and clustering. A brief description of these techniques is presented in Table 1 for reference purpose. The algorithms are quite powerful and have the ability to process large quantities of data and discover meaningful patterns and rules about a business process. The real power of these techniques is its inherent ability to reveal “hidden” patterns or relationships even in complex data-sets that can be associated with a high degree of uncertainty.

The methodology followed for Data Mining analysis is presented schematically in Figure 1. The steps are not clear-cut, and a number of iterations are required.

Before Data Mining analysis can start, it is necessary to describe the data at hand — i.e., summarize its statistical attributes (mean and standard deviation values), visually review the data using charts and graphs, and look for potential meaningful links among variables.

Data Mining and Other Analysis Techniques


Data Mining is different from other analysis techniques such as OLAP (on-line analytical processing) and statistical methods. While Data Mining is certainly different from these techniques, it is a fact that these tools are complimentary to each other and when applied


Fig 1:Methodology of Data Mining Analysis