Use Domain Knowledge to Improve Data Mining Performance of Very Large Datasets via Clustering

Data mining is a very computationally intensive task. It is not the same as data query problems where information from a data repository is queried. Data mining involves exhaustive computation to uncover information hidden in the data—information that represents patterns in this data [1]. Therefore, the task is, to a great extent, unlimited. Using statistical analysis methods, data mining tools analyze the data and compute the relationships among the attributes (also called "features") of the data, seeking strong correlations that may be evidence of new and important information [2, 3].

We present methods for using domain knowledge, particularly in the medical domain, to reduce the dataset size for further data mining analysis.

By: Uri Shani; Simona Cohen

Published in: H-0239 in 2006


