Book contents
- Frontmatter
- Dedication
- Contents
- List of Figures
- List of Tables
- Preface
- Acknowledgments
- 1 Beginning with Machine Learning
- 2 Introduction to Data Mining
- 3 Beginning with Weka and R Language
- 4 Data Preprocessing
- 5 Classification
- 6 Implementing Classification in Weka and R
- 7 Cluster Analysis
- 8 Implementing Clustering with Weka and R
- 9 Association Mining
- 10 Implementing Association Mining with Weka and R
- 11 Web Mining and Search Engines
- 12 Data Warehouse
- 13 Data Warehouse Schema
- 14 Online Analytical Processing
- 15 Big Data and NoSQL
- Index
- Colour Plates
5 - Classification
Published online by Cambridge University Press: 26 April 2019
- Frontmatter
- Dedication
- Contents
- List of Figures
- List of Tables
- Preface
- Acknowledgments
- 1 Beginning with Machine Learning
- 2 Introduction to Data Mining
- 3 Beginning with Weka and R Language
- 4 Data Preprocessing
- 5 Classification
- 6 Implementing Classification in Weka and R
- 7 Cluster Analysis
- 8 Implementing Clustering with Weka and R
- 9 Association Mining
- 10 Implementing Association Mining with Weka and R
- 11 Web Mining and Search Engines
- 12 Data Warehouse
- 13 Data Warehouse Schema
- 14 Online Analytical Processing
- 15 Big Data and NoSQL
- Index
- Colour Plates
Summary
Chapter Objectives
✓ To comprehend the concept, types and working of classification
✓ To identify the major differences between classification and regression problems
✓ To become familiar about the working of classification
✓ To introduce the decision tree classification system with concepts of information gain and Gini Index
✓ To understand the workings of the Naïve Bayes method
Introduction to Classification
Nowadays databases are used for making intelligent decisions. Two forms of data analysis namely classification and regression are used for predicting future trends by analyzing existing data. Classification models predict discrete value or class, while Regression models predict a continuous value. For example, a classification model can be built to predict whether India will win a cricket match or not, while regression can be used to predict the runs that will be scored by India in a forthcoming cricket match.
Classification is a classical method which is used by machine learning researchers and statisticians for predicting the outcome of unknown samples. It is used for categorization of objects (or things) into given discrete number of classes. Classification problems can be of two types, either binary or multiclass. In binary classification the target attribute can only have two possible values. For example, a tumor is either cancerous or not, a team will either win or lose, a sentiment of a sentence is either positive or negative and so on. In multiclass classification, the target attribute can have more than two values. For example, a tumor can be of type 1, type 2 or type 3 cancer; the sentiment of a sentence can be happy, sad, angry or of love; news stories can be classified as weather, finance, entertainment or sports news.
Some examples of business situations where the classification technique is applied are:
• To analyze the credit history of bank customers to identify if it would be risky or safe to grant them loans.
• To analyze the purchase history of a shopping mall's customers to predict whether they will buy a certain product or not.
In first example, the system will predict a discrete value representing either risky or safe, while in second example, the system will predict yes or no.
Some more examples to distinguish the concept of regression from classification are:
• To predict how much a given customer will spend during a sale.
- Type
- Chapter
- Information
- Data Mining and Data WarehousingPrinciples and Practical Techniques, pp. 65 - 127Publisher: Cambridge University PressPrint publication year: 2019