<th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
<progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
<th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
<progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
<th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
<progress id="5nh9l"><noframes id="5nh9l">

基于近鄰的不均衡數據聚類算法

Clustering algorithm for imbalanced data based on nearest neighbor

  • 摘要: 針對經典K–means算法對不均衡數據進行聚類時產生的“均勻效應”問題,提出一種基于近鄰的不均衡數據聚類算法(Clustering algorithm for imbalanced data based on nearest neighbor,CABON)。CABON算法首先對數據對象進行初始聚類,通過定義的類別待定集來確定初始聚類結果中類別歸屬有待進一步核定的數據對象集合;并給出一種類別待定集的動態調整機制,利用近鄰思想實現此集合中數據對象所屬類別的重新劃分,按照從集合邊緣到中心的順序將類別待定集中的數據對象依次歸入其最近鄰居所在的類別中,得到最終的聚類結果,以避免“均勻效應”對聚類結果的影響。將該算法與K–means、多中心的非平衡K_均值聚類方法(Imbalanced K–means clustering method with multiple centers,MC_IK)和非均勻數據的變異系數聚類算法(Coefficient of variation clustering for non-uniform data,CVCN)在人工數據集和真實數據集上分別進行實驗對比,結果表明CABON算法能夠有效消減K–means算法對不均衡數據聚類時所產生的“均勻效應”,聚類效果明顯優于K–means、MC_IK和CVCN算法。

     

    Abstract: Clustering is an important task in the field of data mining. Most clustering algorithms can effectively deal with the clustering problems of balanced datasets, but their processing ability is weak for imbalanced datasets. For example, K–means, a classical partition clustering algorithm, tends to produce a “uniform effect” when dealing with imbalanced datasets, i.e., the K–means algorithm often produces clusters that are relatively uniform in size when clustering unbalanced datasets with the data objects in small clusters “swallowing” the part of the data objects in large clusters. This means that the number and density of the data objects in different clusters tend to be the same. To solve the problem of “uniform effect” generated by the classical K–means algorithm in the clustering of imbalanced data, a clustering algorithm based on nearest neighbor (CABON) is proposed for imbalanced data. Firstly, the initial clustering of data objects is performed to obtain the undetermined-cluster set, which is defined as a set that consists of the data objects that must be checked further regarding the clusters in which they belong. Then, from the edge to the center of the set, the nearest-neighbor method is used to reassign the data objects in the undetermined-cluster set to the clusters of their nearest neighbors. Meanwhile the undetermined-cluster set is dynamically adjusted, to obtain the final clustering result, which prevents the influence of the “uniform effect” on the clustering result. The clustering results of the proposed algorithm is compared with that of K–means, the imbalanced K–means clustering method with multiple centers (MC_IK), and the coefficient of variation clustering for non-uniform data (CVCN) on synthetic and real datasets. The experimental results reveal that the CABON algorithm effectively reduces “uniform effect” generated by the K–means algorithm on imbalanced data, and its clustering result is superior to that of the K–means, MC_IK, and CVCN algorithms.

     

/

返回文章
返回
<th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
<progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
<th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
<progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
<th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
<progress id="5nh9l"><noframes id="5nh9l">
259luxu-164