Clustering algorithm of categorical data in consideration of sorting by weight
-
摘要: 針對部分聚類算法對數據輸入順序敏感的問題,定義了不干涉序列指數,提出了應用不干涉序列指數對分類數據進行加權排序的方法,并基于該方法對受數據輸入順序影響的CABOSFV_C分類數據高效聚類算法進行改進,提出了考慮加權排序的聚類算法(CABOSFV_CSW),消除了算法對數據輸入順序的敏感性.采用UCI基準數據集進行實驗,發現應用加權升序排序的CABOSFV_CSW算法在處理分類數據時,聚類質量較原始CABOSFV_C算法和其他受數據輸入順序影響的算法在準確性上有改善,在穩定性上有顯著提高.Abstract: Aimed at solving the problem that part of clustering algorithms are sensitive to the data input order, a non-interference sequence index was defined, and an approach applying the non-interference sequence was proposed to sort categorical data by weight. Based on this approach, a new clustering algorithm considering sorting by weight (CABOSFV_CSW) was presented to improve CABOSFV_C, which is an efficient clustering algorithm for categorical data but sensitive to the data input order. This approach eliminates sensitivity to the data input order. UCI benchmark data sets were used to compare the proposed CABOSFV_CSW algorithm with traditional CABOSFV_C algorithm and other algorithms sensitive to the data input order. Empirical tests show that the new CABOSFV_CSW clustering algorithm for categorical data improves the accuracy and increases the stability effectively.
-
Key words:
- data mining /
- clustering algorithm /
- sorting /
- categorical data
-

計量
- 文章訪問數: 218
- HTML全文瀏覽量: 61
- PDF下載量: 7
- 被引次數: 0