We present a model for clustering which combines two criteria: Given a collection of objects with pairwise similarity measure, the problem is to find a cluster that is as dissimilar as possible from the complement, while having as much similarity as possible within the cluster. The two objectives are combined either as a ratio or with linear weights. The ratio problem, and its linear weighted version, are solved by a combinatorial algorithm within the complexity of a single minimum s,t-cut algorithm. This problem (HNC) is closely related to the NP-hard problem of normalized cut that is often used in image segmentation and for which heuristic solutions are generated with the eigenvector technique (spectral method).
It is demonstrated that HNC is a form of relaxation of normalized cut and its generalization to "q-normalized cut". We study the relationship between this HNC relaxation and the spectral method and demonstrate a number of advantages for the combinatorial algorithm. These advantages include a better approximation, in practice, of the normalized cut objective for image segmentation benchmark problems.
HNC can be utilized as a supervised or unsupervised machine learning technique. It has been used for data mining, and its comparison to leading machine learning techniques on datasets selected from the UCI data mining benchmark and other benchmarks indicates that its use of pairwise comparisons is powerful in improving performance accuracy. HNC is currently the leading neuron segmentation algorithm in the Neurofinder benchmark for cell identification in calcium imaging movies. Time permitting, we will discuss our recently developed methods employed to make the family of HNC algorithms scalable for massive scale data sets.
Dorit S. Hochbaum is a full professor and Chancellor chair at UC Berkeley's department of Industrial Engineering and Operations Research (IEOR). Her research interests are in the areas of design and analysis of computer algorithms, approximation algorithms and discrete and continuous optimization. Her recent work focuses on efficient techniques related to network flows with novel applications in ranking, pattern recognition, data mining and image segmentation problems. Professor Hochbaum is the author of over 150 papers that appeared in the Operations Research, Management Science and Theoretical Computer Science literature. Prof. Dorit S. Hochbaum was awarded an honorary doctorate of sciences by the University of Copenhagen recognizing Hochbaum's ground-breaking achievements and leadership in optimization in general and in the field of approximation algorithms for intractable problems in particular. Hochbaum was awarded the title of INFORMS fellow in fall 2005 for her extensive contributions to Operations Research, Management Science and design of algorithms. She is the winner of the 2011 INFORMS Computing Society prize for best paper dealing with the Operations Research/Computer Science interface. Professor Hochbaum is a fellow of the Society of Industrial and Applied Mathematics (SIAM) since 2014.