PLinkage: Density Ascending Clustering via Pairwise Linkage


Xiaohu Lu, Jian Yao*, Li Li, Yahui Liu and Wei Zhang

School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, Hubei, P.R.China

*EMail: jian.yao@whu.edu.cn

*Web: http://www.scholat.com/jianyao http://cvrs.whu.edu.cn


[] The C++ source codes are released.download (15/4/2016).


1. Abstract

In this paper, we present a general density ascending clustering algorithm named Pairwise Linkage (PLinkage), which can be used for clustering any dimensional data. The proposed method is based on the assumption that: a data point should be in the same cluster with its closest neighboring point (CNP) which is more likely to be a cluster center. For each data point, a global feature value which represents the quality of the cluster center is calculated at first, for example, the densities for 2D data points. Then, for each data point, a pairwise linkage is created between itself and its closest neighboring point. After that, the clusters can be quickly discovered by searching along the linkages in a simple way. The linkage based property of PLinkage makes it very efficient to be applied on incremental clustering with low computational complexity. To sufficiently verify the efficiency of our proposed PLinkage clustering algorithm in specific applications, we propose a PLinkage based algorithm for image segmentation. Experimental results on various public data sets and different dimensional synthetic data from 2D to 4D sufficiently demonstrate the efficiency and robustness of the proposed PLinkage clustering algorithm, and extensive experimental results on the BSDS dataset illustrate the effectiveness of the PLinkage based image segmentation method.

2. Pairwise Linkage

Based on the assumption that: a data point should be in the same cluster with its closest neighboring point (CNP) that is more likely to be a cluster center, we propose a novel density ascending clustering algorithm named Pairwise Linkage (PLinkage) which can discover the clusters in a simple and efficient way. There are generally three steps of the PLinkage method on a Gaussian distributed data set:

  • Firstly, the density of each data point is calculated by applying a Gaussian kernel.
  • Then, a pairwise linkage procedure is applied to link each data point to its closest neighboring point with a higher density. If the density of a data point is local maximal, it is recorded as a cluster center.
  • Finally, all the clusters can be discovered by searching along the pairwise linkages starting from each cluster center.

Fig.1 shows an illustration of some basic ideas of the proposed PLinkage clustering method, from which we can observe that there are two clusters in blue and red, respectively. For each data point, the pairwise linkage is formed by searching for the closest neighboring point whose density is greater than its own. For example, p8 is first linked to p27, p27 is then linked to p4, p4 is further linked to p13, and p13 is finally linked to p1 with the greatest density in its neighborhood. In this way, the complete linkage is found as p8 -> p27 -> p4 -> p13 -> p1, and thus all of these five data points are classified into the same cluster whose cluster center is p1. The same procedure occurs for all the data points in blue as a separated cluster. Similarly the red cluster can be formed by this way. As to the data points on the boundary between two clusters, the pairwise linkage can still be applied. Taking the data point p33 for example, p19, p7, p6, and p28 are its neighboring points, p6 is its closest neighboring point (CNP), and thus p33 is classified into the blue cluster, which is quite reasonable. The four data points in black, p26, p18, p17, and p16, are classified as outliers because there exists no CNP in their neighborhoods, nor the densities of their corresponding cluster centers are not high enough.

   

Fig. 1. An illustration of the density ascending clustering procedure. The data points p1 and p34 are the cluster centers, the symbol '->' indicates the pairwise linkage, and the big circle in green denotes the neighborhood set of a data point with a cutoff distance dc.

Algorithm 1 Pairwise Linkage.

   

3. Incremental Pairwise Linkage

There are three updating steps for the incremental PLinkage as Algorithm 2 describes:

  • density updating, the densities and neighbors of the newly added data points are obtained by searching their neighborhood defined by the cutoff distance dc in all the data points. What calls for special attention is that the densities of the former data points that are in the neighborhood of the newly added data points, which are also known as the "connected data points", should also be updated.
  • linkage updating, the CNPs of all the newly added data points and their neighbors are recalculated, and the corresponding linkages are created or adjusted as the same as the formal PLinkage.
  • cluster updating, all the clusters are collected by searching the linkages once again as the same as the formal PLinkage.

Algorithm 2 Incremental Pairwise Linkage.

   

3. Applications

3.1. Image Segmentation

There are three steps for the PLinkage image segmentation method as Fig. 2 shows:

  • Density Calculating: For each pixel, the LAB density of it is calculated.
  • Linkage Building: Then for each pixel pi, the PLinkage is applied to linkage it to the neighboring point who has higher density than pi and is closest to pi.
  • Segments Merging: For each initial segment, the adjacent ones to it is searched and merged iteratively if the distance between the average LAB values of two segments is smaller than hr.
original image density map initial segments final segments

Fig. 2. An illustration of the frame work of the PLinkage image segmentation method.

3. Experimental Results

3.1. Evaluation on PLinkage Clustering

  • S-sets The S-sets is made up of 4 synthetic 2-d subsets with N=5000 vectors and M=15 Gaussian clusters with different degree of cluster overlapping. The ground truth of each data point's label is also provided. In all the 4 subsets, scale=8.0.
    s1 s2 s3 s4

    Fig. 3. Clustering results of PLinkage on the S-sets.

  • Shape-sets The Shape-sets is made up of 8 synthetic 2-d subsets with different shapes, including the Aggregation, the Compound, the Pathbased, the Spiral, the D31, the R15, the Jain and the Flame. We set scale = 5.0 for all the subsets.
    aggregation compound D31
    flame R15 spiral

    Fig. 4. Clustering results of PLinkage on the Shape-sets.

  • DIM-sets (high) The DIM-sets (high) contains 6 high-dimensional data sets with N=1024 and M=16 in Gaussian clusters, they are 32d, 64d, 128d, 256d, 512d and 1024d, respectively. To test the ability of the proposed PLinkage on handling the high dimensional data sets, we run it on the DIM-sets (high) with scale=5.0 for all the cases.

Table 1. Statical results of PLinkage on the DIM-sets (high).

Subset D32 D64 D128 D256 D512 D1024
Measure t c p t c p t c p t c p t c p t c p
PLinkage 16 16 0.9980 16 16 0.9951 16 16 0.9824 16 16 1.0000 16 16 1.0000 16 16 1.0000

3.2. Evaluation on PLinkage Image Segmentation

To test the robustness of the proposed image segmentation algorithm, we apply it on the BSDS500 dataset, and compare it with the Mean shift based image segmentation algorithm We set hs = 10 and hr = 6.5 for both the PLinkage and Mean shift methods.

Fig. 5. Image segmentation results of the PLinkage based method and the Mean shift based method. From left to right are the original image, the result of PLinkage and the result of Mean shift.

Dataset and Codes

Citation

  • Xiaohu Lu, Jian Yao*, Li Li, Yahui Liu, and Wei Zhang. PLinkage: Density Ascending Clustering via Pairwise Linkage, Submitted to Pattern Recognition, April 2016.
  • Xiaohu Lu, Jian Yao*, Li Li, Yahui Liu, and Wei Zhang. Supplemental Material on PLinkage: Density Ascending Clustering via Pairwise Linkage, Submitted to Pattern Recognition, April 2016.