Deep Data Mining Blog: Data Mining Components

Saturday, September 01, 2012

Data Mining Components

We identify four components or layers in a data mining engagement as shown in the figure below. Two abstract layers, business problem and data mining algorithms, are on the top. And two physical layers, data mining tools and data management, are at the bottom.

Business problems that we want to solve is one of the most important abstract layers. It could be predicting fraud (bank card, check, medical claims), new customer life time value at point of sales, online ads click rate, credit worthiness, customer segmentation. We can address the above business problems using various predictive models such as logistic regressions, neural nets, support vector machines, K-means clustering.

Data mining tools layer contains commercial or open source software such as SAS, Splus, R, Weca, SPSS, Statsoft, Oracle Data Mining. Common data mining algorithms can be found in almost all of the software mentioned above. Data management/storage layer are relational databases such as Oracle, SQL server, MySQL, or simply files such as SAS files or text files.

We can predict if a new customer will pay his car loan using logistic regression model implemented in SAS and store the data in Oracle. Or we can solve the same problem using decision tree models implemented in R and store data in SQL server. It is important to realize that items within each layer are sometimes exchangeable. We can solve any business problems with varieties of data mining algorithms implemented by commercial or open source tools and store data in any databases. Thus it is a misconcept that neural nets are best in predicting credit card fraud. An experienced data miner can build a decision tree model to predict credit card fraud that performs equally well as a neural net does. We can select the combination of data mining models, tools and databases that suit our needs.

Four Components or Layers in Data Mining

10 Most Influential People	Text Files and Oracle DB	Predictive Model vs Rule	Build Predictive Model	About Predictive Model Variable	Logistic Regression
Recency Frequency Monetary Analysis	Unique Identifier in Oracle	Materialized View	Database Link	Calculate Percentage Using SQL	Handle NULL Value
Calculate Cumulative Perentage	Find Score Cutoff Value	Remove Duplicates	Calculate Correlation Coefficients	Oracle vs SQL Server	Random Sampling
Table Insert	Read Only Table	Clustering	Ranking	Find Most Frequent	Median Value
Oracle Source Code	Debug PL/SQL	Hide PL/SQL Scripts	Repair Views	Dump Schema	Move Big Files to Amazon

Popular Topics

Popular Topics

Saturday, September 01, 2012

Data Mining Components

No comments: