分類屬性數據聚類算法HABOS

武森; 姜丹丹; 王薔

doi:10.13374/j.issn2095-9389.2016.07.018

分類屬性數據聚類算法HABOS

doi: 10.13374/j.issn2095-9389.2016.07.018

北京科技大學東凌經濟管理學院,北京 100083

基金項目:

國家自然科學基金資助項目(71271027)

高等學校博士學科點專項科研基金資助項目(20120006110037)

詳細信息

通訊作者:
武森,E-mail:wusen@manage.ustb.edu.cn

中圖分類號: TP311
計量
- 文章訪問數: 227
- HTML全文瀏覽量: 50
- PDF下載量: 11
- 被引次數: 0
出版歷程
- 收稿日期: 2016-01-05
- 網絡出版日期: 2021-07-22

HABOS clustering algorithm for categorical data

Donlinks School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China

摘要

摘要: CABOSFV_C是一種針對分類屬性高維數據的高效聚類算法,該算法采用集合稀疏差異度進行距離計算,并采用稀疏特征向量實現數據壓縮.該算法的聚類效果受集合稀疏差異度上限參數的影響,而該參數的選取沒有明確的指導.針對該問題提出基于集合稀疏差異度的啟發式分類屬性數據層次聚類算法(heuristic hierarchical clustering algorithm of categorical data based on sparse feature dissimilarity,HABOS),該方法從聚結型層次聚類思想的角度出發,在聚類數上限參數的約束下,應用新的內部聚類有效性評價指標(clustering validation index based on sparse feature dissimilarity,CVISFD)進行啟發式度量,從而實現對聚類層次的自動選取.UCI基準數據集的實驗結果表明,HABOS有效地提高了聚類準確性和穩定性.
- 數據挖掘 /
- 聚類算法 /
- 分類數據 /
- 屬性
Abstract: The clustering algorithm based on sparse feature vector for categorical attributes(CABOSFVC) is an efficient high-dimensional clustering method for categorical data. Sparse feature dissimilarity(SFD) is used to calculate the distance and sparse feature vector is used to achieve data compression. However,CABOSFVC algorithm is dependent upon SFD upper limit parameter for which there is no guidance for configuration. Aimed at solving the problem that CABOSFVC algorithm is sensitive to this parameter,a new heuristic hierarchical clustering algorithm of categorical data based on SFD(HABOS) was proposed in this paper. With the constraint of the upper limit number of clusters,this algorithm applied agglomerative hierarchical clustering and the new internal clustering validation index based on SFD(CVISFD) which was used to measure the results heuristically to achieve the best choice of the clustering level. Three UCI benchmark data sets were used to compare the improved algorithm with the traditional ones. The empirical tests show that HABOS increases the clustering accuracy and stability effectively.
- data mining /
- clustering algorithms /
- categorical data /
- attributes