Hostname: page-component-cd9895bd7-gxg78 Total loading time: 0 Render date: 2024-12-26T20:41:28.538Z Has data issue: false hasContentIssue false

Partitioning strategies for distributed association rule mining

Published online by Cambridge University Press:  07 July 2006

FRANS COENEN
Affiliation:
Department of Computer Science, University of Liverpool, Liverpool L69 3BX, UK. E-mail: [email protected], [email protected]
PAUL LENG
Affiliation:
Department of Computer Science, University of Liverpool, Liverpool L69 3BX, UK. E-mail: [email protected], [email protected]

Abstract

In this paper a number of alternative strategies for distributed/parallel association rule mining are investigated. The methods examined make use of a data structure, the T-tree, introduced previously by the authors as a structure for organizing sets of attributes for which support is being counted. We consider six different approaches, representing different ways of parallelizing the basic Apriori-T algorithm that we use. The methods focus on different mechanisms for partitioning the data between processes, and for reducing the message-passing overhead. Both ‘horizontal’ (data distribution) and ‘vertical’ (candidate distribution) partitioning strategies are considered, including a vertical partitioning algorithm (DATA-VP) which we have developed to exploit the structure of the T-tree. We present experimental results examining the performance of the methods in implementations using JavaSpaces. We conclude that in a JavaSpaces environment, candidate distribution strategies offer better performance than those that distribute the original dataset, because of the lower messaging overhead, and the DATA-VP algorithm produced results that are especially encouraging.

Type
Research Article
Copyright
© 2006 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)