[] The C++ source codes are released.download (15/4/2016).
1. Abstract
In this paper, we present a general density ascending clustering algorithm named Pairwise Linkage (PLinkage), which can be used for clustering any dimensional data.
The proposed method is based on the assumption that: a data point should be in the same cluster with its closest neighboring point (CNP)
which is more likely to be a cluster center. For each data point, a global feature value which represents the quality of the cluster center is calculated at first,
for example, the densities for 2D data points. Then, for each data point, a pairwise linkage is created between itself and its closest neighboring point.
After that, the clusters can be quickly discovered by searching along the linkages in a simple way. The linkage based property of PLinkage makes it very efficient
to be applied on incremental clustering with low computational complexity. To sufficiently verify the efficiency of our proposed PLinkage clustering algorithm in
specific applications, we propose a PLinkage based algorithm for image segmentation. Experimental results on various public data sets and different dimensional
synthetic data from 2D to 4D sufficiently demonstrate the efficiency and robustness of the proposed PLinkage clustering algorithm, and extensive experimental
results on the BSDS dataset illustrate the effectiveness of the PLinkage based image segmentation method.
2. Pairwise Linkage
Based on the assumption that: a data point should be in the same cluster with its closest neighboring point (CNP) that is more likely to be a cluster center,
we propose a novel density ascending clustering algorithm named Pairwise Linkage (PLinkage) which can discover the clusters in a simple and efficient way.
There are generally three steps of the PLinkage method on a Gaussian distributed data set:
Firstly, the density of each data point is calculated by applying a Gaussian kernel.
Then, a pairwise linkage procedure is applied to link each data point to its closest neighboring point with a higher density.
If the density of a data point is local maximal, it is recorded as a cluster center.
Finally, all the clusters can be discovered by searching along the pairwise linkages starting from each cluster center.
Fig.1 shows an illustration of some basic ideas of the proposed PLinkage clustering method, from which we can observe that there are two clusters
in blue and red, respectively. For each data point, the pairwise linkage is formed by searching for the closest neighboring point whose density is greater than its own.
For example, p_{8} is first linked to p_{27}, p_{27} is then linked to p_{4}, p_{4} is further linked
to p_{13}, and p_{13} is finally linked to p_{1} with the greatest density in its neighborhood. In this way,
the complete linkage is found as p_{8} -> p_{27} -> p_{4} -> p_{13} -> p_{1}, and thus all
of these five data points are classified into the same cluster whose cluster center is p_{1}. The same procedure occurs for all the data points in blue
as a separated cluster. Similarly the red cluster can be formed by this way. As to the data points on the boundary between two clusters, the pairwise linkage can
still be applied. Taking the data point p_{33} for example, p_{19}, p_{7}, p_{6}, and p_{28}
are its neighboring points, p_{6} is its closest neighboring point (CNP), and thus p_{33} is classified into the blue cluster,
which is quite reasonable. The four data points in black, p_{26}, p_{18}, p_{17}, and p_{16},
are classified as outliers because there exists no CNP in their neighborhoods, nor the densities of their corresponding cluster centers are not high enough.
Fig. 1. An illustration of the density ascending clustering procedure. The data points p_{1} and p_{34} are the cluster centers,
the symbol '->' indicates the pairwise linkage, and the big circle in green denotes the neighborhood set of a data point with a cutoff distance d_{c}.
Algorithm 1 Pairwise Linkage.
3. Incremental Pairwise Linkage
There are three updating steps for the incremental PLinkage as Algorithm 2 describes:
density updating, the densities and neighbors of the newly added data points are obtained by searching their neighborhood defined by the cutoff distance
d_{c} in all the data points. What calls for special attention is that the densities of the former data points that are in the neighborhood of the newly
added data points, which are also known as the "connected data points", should also be updated.
linkage updating, the CNPs of all the newly added data points and their neighbors are recalculated, and the corresponding linkages are created or
adjusted as the same as the formal PLinkage.
cluster updating, all the clusters are collected by searching the linkages once again as the same as the formal PLinkage.
Algorithm 2 Incremental Pairwise Linkage.
3. Applications
3.1. Image Segmentation
There are three steps for the PLinkage image segmentation method as Fig. 2 shows:
Density Calculating: For each pixel, the LAB density of it is calculated.
Linkage Building: Then for each pixel p_{i}, the PLinkage is applied to linkage it to the neighboring point who has higher density
than p_{i} and is closest to p_{i}.
Segments Merging: For each initial segment, the adjacent ones to it is searched and merged iteratively if the distance between the average LAB values
of two segments is smaller than h_{r}.
original image
density map
initial segments
final segments
Fig. 2. An illustration of the frame work of the PLinkage image segmentation method.
3. Experimental Results
3.1. Evaluation on PLinkage Clustering
S-sets
The S-sets is made up of 4 synthetic 2-d subsets with N=5000 vectors and M=15 Gaussian clusters with different degree of cluster overlapping.
The ground truth of each data point's label is also provided. In all the 4 subsets, scale=8.0.
s1
s2
s3
s4
Fig. 3. Clustering results of PLinkage on the S-sets.
Shape-sets The Shape-sets is made up of 8 synthetic 2-d subsets with different shapes, including the Aggregation, the Compound, the Pathbased,
the Spiral, the D31, the R15, the Jain and the Flame. We set scale = 5.0 for all the subsets.
aggregation
compound
D31
flame
R15
spiral
Fig. 4. Clustering results of PLinkage on the Shape-sets.
DIM-sets (high) The DIM-sets (high) contains 6 high-dimensional data sets with N=1024 and M=16 in Gaussian clusters, they are 32d, 64d, 128d, 256d, 512d
and 1024d, respectively. To test the ability of the proposed PLinkage on handling the high dimensional data sets, we run it on the DIM-sets (high) with scale=5.0
for all the cases.
Table 1. Statical results of PLinkage on the DIM-sets (high).
Subset
D32
D64
D128
D256
D512
D1024
Measure
t
c
p
t
c
p
t
c
p
t
c
p
t
c
p
t
c
p
PLinkage
16
16
0.9980
16
16
0.9951
16
16
0.9824
16
16
1.0000
16
16
1.0000
16
16
1.0000
3.2. Evaluation on PLinkage Image Segmentation
To test the robustness of the proposed image segmentation algorithm, we apply it on the BSDS500 dataset, and compare it with the Mean shift based
image segmentation algorithm We set h_{s} = 10 and h_{r} = 6.5 for both the PLinkage and Mean shift methods.
Fig. 5. Image segmentation results of the PLinkage based method and the Mean shift based method. From left to right are the original image,
the result of PLinkage and the result of Mean shift.
Xiaohu Lu, Jian Yao*, Li Li, Yahui Liu, and Wei Zhang. PLinkage: Density Ascending Clustering via Pairwise Linkage, Submitted to Pattern Recognition, April 2016.
Xiaohu Lu, Jian Yao*, Li Li, Yahui Liu, and Wei Zhang. Supplemental Material on PLinkage: Density Ascending Clustering via Pairwise Linkage, Submitted to Pattern Recognition, April 2016.