Facility Location and k-Means Clustering

Rahul Vaze

doi:10.1017/9781009349178.012

Introduction

In this section, we consider two related combinatorial online problems that have wide applications in the area of operations research and machine learning, called the facility location problem, and the k-clustering problem. With the facility location problem, requests arrive sequentially whose locations belong to a metric space. On the arrival of a new request, the decision to be made is whether to assign this request to any one of the currently open facilities or open a new facility. The cost of assigning a request to an open facility is equal to the distance between the location of the request and the location of the open facility, while opening a new facility incurs a fixed cost. The cost of an online algorithm is the sum of the costs of all requests plus the total facility opening cost, and the objective is to find online algorithms to minimize the competitive ratio.

The facility location problem is a rich object and captures important problems such as: where to install charging stations for electric vehicles with routing and infrastructure costs. In this chapter, we first derive lower bounds on the competitive ratios of both deterministic and randomized algorithms, and show that the best competitive ratio possible for any online algorithm is at least, where n is the number of requests. On the positive side, we present a randomized algorithm whose competitive ratio is at most O(log n). We also consider a secretarial input setting where the order of arrival of requests is uniformly random, for which the same randomized algorithm is at most 8-competitive. A deterministic algorithm with a competitive ratio of at most O(log n) is also established for a more general setting, where the facility-opening cost depends on the location.

Next, we consider a related problem called the k-clustering problem, where requests arrive online and the objective is to partition the set of requests into at most k-clusters that minimizes the total cost defined as follows. The cost of each cluster is the distance of all requests that belong to the cluster from its centroid (called the centre), and the total cost is the sum of the cost of all the clusters. The k-clustering problem essentially models the classification problem, a fundamental object in machine learning.

Book contents

11 - Facility Location and k-Means Clustering

Summary

Access options

Book purchase

Temporarily unavailable

Book contents

11 - Facility Location and k-Means Clustering

Summary

Access options

Book purchase

Temporarily unavailable

Save book to Kindle

Save book to Dropbox

Save book to Google Drive