Universität Düsseldorf: Theoretische Untersuchung praxisrelevanter Fragestellungen der Clusteranalyse

Theoretical analysis of practically important aspects of cluster analysis

Heiko Röglin
Melanie Schmidt
Anna Arutyunova
Julian Wargalla

The area of cluster analysis is concerned with the search for unknown structure in data. The goal is to find hidden clusters, i.e., groups of points that belong together. Clustering has many applications in the area of data analysis and has thus been studied in various variations, leading to a huge body of clustering research.
However, clustering also suffers from a large gap between theory and practice, even with respect to the objective functions that are studied. In theory, combinatorial problems like k-center and k-median are extremely well studied.

In practice, methods for very different types of problems play a much larger role, e.g., for k-means clustering or for hierarchical clustering. Additionally, there can be additional demands on the solution, leading to constrained clustering problems. Our goal is the theoretical analysis of practically relevant questions in the area of cluster analysis. This includes the study of the approximability of more practical variants of clustering as well as the analysis of popular heuristics, e.g., under structural assumptions on the input data. Our project consists of two parts.

Clustering with constraints (Melanie Schmidt)

In the first part, we consider the effect of constraints on clustering problems. Examples for constraints are capacities, lower bounds, fairness constraints and outliers. First, we want to develop better approximation algorithms for clustering problems with constraints. Second, we want to analyze popular heuristics and develop practically efficient algorithms. In both cases, we are also interested in combining different constraints. Third, we want to study clustering problems with constraints in data streams.

Hierarchical clustering (Heiko Röglin)

The second part considers hierarchical clustering. Instead of fixing a specific number of clusters k, hierarchical clustering computes a hierarchy of clusterings. Hierarchical clusterings play a major role in applications, yet they have been studied very little in theory. First, we want to analyze the approximation ratio of the very popular greedy heuristics for hierarchical clustering. Second, we want to develop approximation algorithms with small approximation ratios, which are not yet known for most of the variants of hierarchical clustering. We are also interested in lower bounds on the ratios of algorithms and lower bounds on the approximability itself. Third, we want to develop global objective functions for hierarchical clustering.

Projektseite bei der DFG

Verantwortlichkeit: