Cluster Analysis

Parteek Bhatia

doi:10.1017/9781108635592.008

Chapter Objectives

✓ To comprehend the concept of clustering, its applications, and features.

✓ To understand various distance metrics for clustering of data.

✓ To comprehend the process of K-means clustering.

✓ To comprehend the process of hierarchical clustering algorithms.

✓ To comprehend the process of DBSCAN algorithms.

Introduction to Cluster Analysis

Generally, in the case of large datasets, data is not labeled because labeling a large number of records requires a great deal of human effort. The unlabeled data can be analyzed with the help of clustering techniques. Clustering is an unsupervised learning technique which does not require a labeled dataset.

Clustering is defined as grouping a set of similar objects into classes or clusters. In other words, during cluster analysis, the data is grouped into classes or clusters, so that records within a cluster (intra-cluster) have high similarity with one another but have high dissimilarities in comparison to objects in other clusters (inter-cluster), as shown in Figure 7.1.

The similarity of records is identified on the basis of values of attributes describing the objects. Cluster analysis is an important human activity. The first human beings Adam and Eve actually learned through the process of clustering. They did not know the name of any object, they simply observed each and every object. Based on the similarity of their properties, they identified these objects in groups or clusters. For example, one group or cluster was named as trees, another as fruits and so on. They further classified the fruits on the basis of their properties like size, colour, shape, taste, and others. After that, people assigned labels or names to these objects calling them mango, banana, orange, and so on. And finally, all objects were labeled. Thus, we can say that the first human beings used clustering for their learning and they made clusters or groups of physical objects based on the similarity of their attributes.

Applications of Cluster Analysis

Cluster analysis has been widely used in various important applications such as:

• Marketing: It helps marketers find out distinctive groups among their customer bases, and this knowledge helps them improve their targeted marketing programs.

• Land use: Clustering is used for identifying areas of similar land use from the databases of earth observations.

• Insurance: Clustering is helpful for recognizing clusters of insurance policyholders with a high regular claim cost.

Book contents

7 - Cluster Analysis

Summary

Access options

Book contents

7 - Cluster Analysis

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive