What is Data mining ?
“The non-trivial extraction of implicit, previously unknown, and potentially useful information from data”
- Data Mining mines the valuable patterns which are hidden in the data.
- Data mining does the analysis on data using some software techniques to discover patterns from a set of data.
- It finds the patterns which satisfies the rules and features of the data.
- Like mining gold by striking it in unexpected places, same mining of data extracts patterns that are not previously discovered and are hence valuable.
- Mining analogy:
o Large volumes of data are taken to extract valuable patterns.
o In a mining operation large amounts of low grade materials are sifted through in order to find something of value.
Data Mining Techniques
- Prediction Tasks
- Use some variables to predict unknown or future values of other variables
- Description Tasks
- Find human-interpretable patterns that describe the data.
- Classification [Predictive] : Classification is a process of finding a model that describes or distinguishes data classes or concepts ,for the purpose of being able to use the model to predict the class of object whose class label is unknown. Eg. Classifying country based on climate or classifying cars based on gas mileage
Classifier consists of two steps:
(1) Building Classifier or Model:
In which training instances with their class label is studied to build a model or classifier.
(2) Using Classifier for classification:
Classifier is applied to identify unknown class label of test data.
Various methods of Classification are Bayesian Classification ,Rule-Based Classification, Associative Classification, Support Vector Machines , Backpropogation , Decision Tree ,k-Nearest-Neighbor
- Clustering [Descriptive] : Clustering is a data mining technique that makes a meaningful or useful cluster of objects which have similar characteristics using the automatic technique. The clustering technique defines the classes and puts objects in each class. Methods of Clustering can be classified as Partitioning Method ,Hierarchical Method ,Density-based Method ,Grid-Based Method ,Model-Based Method ,Constraint-based Method
- Prediction [Predictive]: Whereas classification predicts categorical (discrete, unordered) labels, prediction models continuous-valued functions. That is, it is used to predict missing or unavailable numerical data values rather than class labels. Greatly studied in statistics, neural network fields. For Example: Predicting sales amounts of new product based on advertising expenditure, Predicting wind velocities as a function of temperature, humidity, air pressure, etc.
- Association Rule Mining[Descriptive] : Given a set of records each of which contain some number of items from a given collection it Produce dependency rules which will predict occurrence of an item based on occurrences of other items. It is used in Marketing and Sales Promotion.
Application of data mining:
- Banking: loan/credit card approval
- predict good customers based on old customers
- Customer relationship management:
- identify those who are likely to leave for a competitor.
- Targeted marketing:
- identify likely responders to promotions
- Fraud detection: telecommunications, financial transactions
- from an online stream of event identify fraudulent events
- Manufacturing and production:
- automatically adjust knobs when process parameter changes