
together in an
intelligent manner, can significantly enhance the outcome of data
analysis.
Data Mining and OLAP
Techniques:
The traditional query
and report tools describe what is there in a database. OLAP goes some
steps further. OLAP is used to answer why certain things are the way
they are. The user first forms a hypothesis about a relationship and
then verifies it with a series of queries against the data. For example,
a user might want to determine factors that lead to loan defaults. The
user might initially hypothesize that people with low incomes are bad
credit risks and analyze database with OLAP to verify (or disprove) this
assumption. If the hypothesis is not supported by data, the user will
try other parameters or combination of parameters as the determinant of
risk. The process is repeated until some reasonable solutions emerge.
In simple terms,
through OLAP, the user generates a series of hypothetical patterns and
relationships and uses queries against the database to verify them or
disprove them. OLAP analysis is essentially a deductive process. But
when the number of variables being analyzed is quite large (in the
dozens or even in hundreds), it becomes very cumbersome and time
consuming to find a good hypothesis and verify the same with the data.
Data Mining is
different from OLAP since this is essentially an inductive process. This
uses the data itself to uncover such patterns, rather than verify the
hypothetical patterns as is the case with OLAP. For example, suppose the
user used Data Mining to identify the risk factors for loan default,
then it would discover parameters or combination of parameters that are
associated with bad credit risks. In addition, it is quite possible that
the mining analysis might reveal a pattern in the data which the user
did not anticipate as obvious.
As mentioned earlier,
Data Mining and OLAP can actually complement each other. For example,
before acting on the pattern revealed by Data Mining, the user needs to
know the financial implications of using the discovered pattern to
decide who are finally eligible for credit. The OLAP tool can allow the
users to answer these types of questions. In addition, the OLAP is also
complementary in the early stages of the knowledge discovery process
because it can help explore data, for instance by focusing on important
variables, identifying exceptions, or finding interactions.
Data Mining and
Statistical Techniques:
The differences between
Data Mining and statistical techniques are as follows:
(a) Data Mining is used
for hypothesis generation, while statistical techniques are used for
validation of hypothesis,
(b) Data Mining can
process larger amount of data as compared to statistical techniques and
(c) Statistical
techniques are more theory-based while Data Mining integrates both
theory and heuristics, i.e, Data Mining algorithms utilizes advances
made in fields of both artificial intelligence (AI) as well as
statistics.
It must be mentioned
that Data Mining does not replace the traditional statistical
techniques, rather it is an extension of the statistical techniques.
Statistical analysis is basically concerned with primary data analysis
i.e., data is collected with a particular question or a set of questions
in mind. The various sub-disciplines of statistical analysis such as
experimental design and survey design etc., have evolved mainly to
facilitate efficient collection of data, with the purpose to answer the
given questions. On the other hand, Data Mining, can be considered as a
process of secondary analysis of large databases, that aims at finding
“unexpected” or “hidden” relationships which are of interest or value to
the database owners.
Data Mining is
relatively less concerned with identifying the specific relationships
between the involved variables. For example, uncovering the nature of
underlying functions or specific types of interactive, multivariate
dependencies are not the main objectives of Data Mining. Instead, the
focus is on producing a solution that is meaningful.
|