rit_cor.jpg (689 bytes)

together in an intelligent manner, can significantly enhance the outcome of data analysis.

Data Mining and OLAP Techniques:

The traditional query and report tools describe what is there in a database. OLAP goes some steps further. OLAP is used to answer why certain things are the way they are. The user first forms a hypothesis about a relationship and then verifies it with a series of queries against the data. For example, a user might want to determine factors that lead to loan defaults. The user might initially hypothesize that people with low incomes are bad credit risks and analyze database with OLAP to verify (or disprove) this assumption. If the hypothesis is not supported by data, the user will try other parameters or combination of parameters as the determinant of risk. The process is repeated until some reasonable solutions emerge.

In simple terms, through OLAP, the user generates a series of hypothetical patterns and relationships and uses queries against the database to verify them or disprove them. OLAP analysis is essentially a deductive process. But when the number of variables being analyzed is quite large (in the dozens or even in hundreds), it becomes very cumbersome and time consuming to find a good hypothesis and verify the same with the data.

Data Mining is different from OLAP since this is essentially an inductive process. This uses the data itself to uncover such patterns, rather than verify the hypothetical patterns as is the case with OLAP. For example, suppose the user used Data Mining to identify the risk factors for loan default, then it would discover parameters or combination of parameters that are associated with bad credit risks. In addition, it is quite possible that the mining analysis might reveal a pattern in the data which the user did not anticipate as obvious.

As mentioned earlier, Data Mining and OLAP can actually complement each other. For example, before acting on the pattern revealed by Data Mining, the user needs to know the financial implications of using the discovered pattern to decide who are finally eligible for credit. The OLAP tool can allow the users to answer these types of questions. In addition, the OLAP is also complementary in the early stages of the knowledge discovery process because it can help explore data, for instance by focusing on important variables, identifying exceptions, or finding interactions.

Data Mining and Statistical Techniques:

The differences between Data Mining and statistical techniques are as follows:

(a) Data Mining is used for hypothesis generation, while statistical techniques are used for validation of hypothesis,

(b) Data Mining can process larger amount of data as compared to statistical techniques and

(c) Statistical techniques are more theory-based while Data Mining integrates both theory and heuristics, i.e, Data Mining algorithms utilizes advances made in fields of both artificial intelligence (AI) as well as statistics.

It must be mentioned that Data Mining does not replace the traditional statistical techniques, rather it is an extension of the statistical techniques. Statistical analysis is basically concerned with primary data analysis i.e., data is collected with a particular question or a set of questions in mind. The various sub-disciplines of statistical analysis such as experimental design and survey design etc., have evolved mainly to facilitate efficient collection of data, with the purpose to answer the given questions. On the other hand, Data Mining, can be considered as a process of secondary analysis of large databases, that aims at finding “unexpected” or “hidden” relationships which are of interest or value to the database owners.

Data Mining is relatively less concerned with identifying the specific relationships between the involved variables. For example, uncovering the nature of underlying functions or specific types of interactive, multivariate dependencies are not the main objectives of Data Mining. Instead, the focus is on producing a solution that is meaningful.