<th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
<progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
<th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
<progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
<th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
<progress id="5nh9l"><noframes id="5nh9l">
Volume 36 Issue 10
Jul.  2021
Turn off MathJax
Article Contents
WU Sen, FENG Xiao-dong, YANG Jie, ZHANG Xiao-nan. Parallel clustering of very large document datasets with MapReduce[J]. Chinese Journal of Engineering, 2014, 36(10): 1411-1419. doi: 10.13374/j.issn1001-053x.2014.10.019
Citation: WU Sen, FENG Xiao-dong, YANG Jie, ZHANG Xiao-nan. Parallel clustering of very large document datasets with MapReduce[J]. Chinese Journal of Engineering, 2014, 36(10): 1411-1419. doi: 10.13374/j.issn1001-053x.2014.10.019

Parallel clustering of very large document datasets with MapReduce

doi: 10.13374/j.issn1001-053x.2014.10.019
  • Received Date: 2013-09-30
    Available Online: 2021-07-19
  • To develop fast and efficient methods to cluster mass document data is one of the hot issues of current data mining research and applications. In order to ensure the clustering result and simultaneously improve the clustering efficiency, a document clustering algorithm was proposed based on searching a document pair with minimum similarity for each other and its distributed parallel computing models were provided. Firstly a document similarity measure was presented using a vector space model (VSM); then bisecting clustering was raised combining the bisecting K-means and the proposed initial cluster center selection approach to find the optimized cluster centroids by once partitioning; finally a distributed parallel document clustering model was designed for cloud computing based on MapReduce framework. Experiments on Hadoop platform, using real document datasets, showed the obvious efficiency advantages of the novel document clustering algorithm compared to the original bisecting K-means with an equivalent clustering result, and the scalability of parallel clustering with different data sizes and different computation node numbers was also evaluated.

     

  • loading
  • 加載中

Catalog

    通訊作者: 陳斌, bchen63@163.com
    • 1. 

      沈陽化工大學材料科學與工程學院 沈陽 110142

    1. 本站搜索
    2. 百度學術搜索
    3. 萬方數據庫搜索
    4. CNKI搜索
    Article views (171) PDF downloads(7) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return
    <th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
    <progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
    <th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
    <progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
    <span id="5nh9l"><noframes id="5nh9l">
    <span id="5nh9l"><noframes id="5nh9l">
    <span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
    <th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
    <progress id="5nh9l"><noframes id="5nh9l">
    259luxu-164