Data is widely available in many forms.
Data is expanding in all forms and piling up to such an extent where we need a way to process that raw data and transforming it into useful information. The raw data is of no use.
The need of data mining comes here. It is a process of extracting useful information/knowledge from raw data. This useful information is used in analysis, fraud detection, exploration and etc.
Data mining involves these three basic steps. They are
Exploration of Data: The nature of data is determined. It clears the data and transform into other form.
Pattern Identification: Choosing of pattern to predict useful information from the data.
Deployment: Whatever pattern we have finalized are deployed to get the final outcome.
Data Mining Techniques
1. Association :
Here we associate two or more items of the same pattern or co relating the items depending on the patterns. We go to the restaurant and order 2-3 items then the probability of this similar order is higher when we visit next time. This is where association is playing a major role in examining and forecasting the behavior.
There are three types of association rule.
- Multilevel Association Rule
- Multidimensional Association Rule
- Quantitative Association Rule
Classification is the process of differentiating between two items by its multiple attributes. Each car is categorized into different variants based on the different attributes (number of seats, car shape, driven wheels). We can apply the same principles to customers, by classifying them by age and social group.
There are different types of classification models.
- Classification by decision tree induction
- Bayesian Classification
- Neural Networks
- Support Vector Machines
- Classification Based on Associations
Grouping of similar items based on attributes to form a cluster.At a simple level, clustering is using one or more attributes as our basis for identifying a cluster of correlating results. E.g we can group people based on ages and their income.
There are different types of clustering methods. They
- Partitioning Methods
- Hierarchical Agglomerate methods
- Density Based Methods
- Grid Based Methods
- Model Based Methods
We can predict an event by analyzing the past events or its instances. Prediction is a wide topic and runs from predicting the failure of components to identifying fraud and many more predictions. Prediction involves analyzing trends, classification, pattern matching, and relation. e.g we can use it to validate the authorization of the credit card transaction.
5. Decision Trees
Within the decision tree, we start with a question that has two or more answers. Each answer leads to a further question to classify or identify the data so that it can be categorized, or a prediction can be made based on answer.
6. Sequential Patterns
Sequential patterns are a useful method for identifying trends, or regular occurrences of similar events. For example, by customer data we can identify that customers buy a particular collection of products together at different times of the year.
Visualization is now used to discover data patterns. This technique is used in the beginning of the Data Mining process. Visualization is a technique which converts poor data into good data helping to discover hidden patterns.
Here we mentioned what is data mining and techniques available for data mining. We are going to cover each technique and way of implementing it in coming sections.