DEV Community

Cover image for Data Mining
Ketan Patil
Ketan Patil

Posted on

Data Mining

What is data mining?

Data mining is a process of extracting useful data from large set of raw data or process of discovering patterns in large data set. Data mining is also known as knowledge discovery in data. (KDD)

Use of data mining

• Automatic summarization of data
• Extracting useful information
• Discovering patterns in raw data

Applications of data mining

• Relational marketing
• Fraud detection
• Risk evaluation
• Text mining
• Web mining

Steps in data mining

Data gathering and integration: once the objectives and definition is identified, gathering of data begins as data comes from different sources therefore may requires integration. Data integration is a process of combining all gathered data into a single view.

Exploratory analysis: This is a third phase of data mining process. In this process, integrated data is investigated and summarized in main characteristics. It helps to identify errors and understand pattern in data before any assumptions.

Attribute Selection: This is a process of selecting attributes for integrated and summarized data. Here attributes that are n little use are removed to cleanse dataset. Moreover, new required attributes are added which are obtained from original attributes.

Alt Text

Model development and validation: once high quality dataset with newly added attribute is obtained, models are developed. In this phase data is split into two subsets training and testing.
Training set which is relatively small is use to identify learning model and testing set is use to access the accuracy model generated using training set.

Prediction and interpretation: this is final process of data mining where developed models and implemented and used to achieve goals.

Data mining process includes feedback cycles, represented by dotted arrow in figure. which indicates return in previous phase depending on outcome of subsequent phase.

Major issues in datamining:

• Efficiency of data mining algorithm
• Relational and complex types of data
• Poor data quality
• Presentation and visualization of mined data
• Interactive mining of knowledge

Data mining tools:

Oracle data miner
Rapid miner
IBM SPSS modeller
Weka
• Many more...

Hope you found it informative :)

Top comments (0)