No CrossRef data available.
Article contents
A PROBABILISTIC ℓ1 METHOD FOR CLUSTERING HIGH-DIMENSIONAL DATA
Published online by Cambridge University Press: 05 April 2021
Abstract
In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the unreliability of distances in very high-dimensional spaces. We propose a probabilistic, distance-based, iterative method for clustering data in very high-dimensional space, using the ℓ1-metric that is less sensitive to high dimensionality than the Euclidean distance. For K clusters in ℝn, the problem decomposes to K problems coupled by probabilities, and an iteration reduces to finding Kn weighted medians of points on a line. The complexity of the algorithm is linear in the dimension of the data space, and its performance was observed to improve significantly as the dimension increases.
- Type
- Research Article
- Information
- Probability in the Engineering and Informational Sciences , Volume 36 , Issue 2 , April 2022 , pp. 433 - 448
- Copyright
- Copyright © The Author(s), 2021. Published by Cambridge University Press