| Peer-Reviewed

Text Clustering Incremental Algorithm in Sensitive Topic Detection

Received: 28 August 2018    Accepted: 27 September 2018    Published: 30 October 2018
Views:       Downloads:
Abstract

With the rapid development of Internet technology, the influence of online consensus continues to expand. How to quickly and effectively discover sensitive topics and keep track of those topics has become an important research recently. Text clustering can aggregate news texts with the same or similar content to achieve the purpose of discovering topics automatically. Make improvement to clustering algorithm according to different media types is the main research direction. Although the existing typical clustering algorithms have certain advantages, they all face constraints on data size and data characteristics in specific applications. There is no existing algorithm can fully adapt to these characteristics. Although the application of more Single-pass algorithms in the (TDT) field can realize the discovery and tracking of topics, there are disadvantages of poor accuracy and slow speed under massive data. According to the dynamic evolution characteristics of online consensus, this paper proposes an incremental text clustering algorithm based on Single-pass, which optimizes the clustering accuracy and efficiency of massive news. Based on the real online news texts from the online consensus analysis system, we conduct an experiment to test and verify the feasibility and effectiveness of the algorithm we proposed. The result shows that the new algorithm is much more efficient compared to the original Single-pass clustering algorithm. In the real application, the new incremental text clustering algorithm basically meets the real-time demand of online topic detection and has a certain practical value.

Published in International Journal of Information and Communication Sciences (Volume 3, Issue 3)
DOI 10.11648/j.ijics.20180303.12
Page(s) 88-95
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Topic Detection, Online Consensus, Simhash Algorithm, Text Clustering, Incremental Algorithm, Single-Pass Algorithm

References
[1] Chen Ning. Research in clustering algorithm of data excavation [D]. Mathematics and systematic science in CAS, 2001.
[2] Chen C C, Chen Y T, Sun Y, et al. Life cycle modeling of news events using aging theory[M]//Machine Learning: ECML 2003. Springer Berlin Heidelberg, 2003: 47-59.
[3] Liu Yuanchao, Wang Xiaolong, Xu Zhiming etc. Text clustering Summary [J]. Chinese Information Journal, 2006, 20(3): 55-62.
[4] J Azzopardi, C Staff. Incremental Clustering of News Reports. Algorithms, 2012, 5 (3): 364-378.
[5] Company, Suizhou 441300, China. Applied research of text clustering algorithm in network monitoring public opinion [J]. Electronic Design Engineering, 2013-01.
[6] Yang Y, Carbonell J, Brown R, et al. Learning approaches for detecting and tracking news events [J]. Intelligent Systems & Their Applications IEEE, 1999, 14(4): 32-43.
[7] Yin Fengjing, Xiao Weidong, Gebing etc. An incremental text clustering algorithm facing to internet topic detection [J]. Computer Application Research, 2011, 28(1): 54-57.
[8] Lei Zhen, Wu Lingda, Lei Lei etc. The incremental parameter K in average value method of initial class center and its application in news exploration [J]. Intelligence Academic Journal, 2006, 25(3): 289-295.
[9] X Yi, X Zhao, N Ke, F Zhao etc. An improved Single-Pass clustering algorithm internet-oriented network topic detection. International Conference on Intelligent Control & information processing, 2013: 560-564.
[10] M Mittal, RK Sharma, VP Singh. Modified single pass clustering with variable threshold approach. «International Journal of Innovative Computing information & control Ijicic», 2015, 11 (1): 375-386.
[11] Charikar M S. Similarity estimation techniques from rounding algorithms [C]//Proceedings of the thirty-fourth annual ACM symposium on Theory of computing. ACM, 2002: 380-388.
Cite This Article
  • APA Style

    Yuejin Zhang, Jiajia Zhang, Dongmei Zhao. (2018). Text Clustering Incremental Algorithm in Sensitive Topic Detection. International Journal of Information and Communication Sciences, 3(3), 88-95. https://doi.org/10.11648/j.ijics.20180303.12

    Copy | Download

    ACS Style

    Yuejin Zhang; Jiajia Zhang; Dongmei Zhao. Text Clustering Incremental Algorithm in Sensitive Topic Detection. Int. J. Inf. Commun. Sci. 2018, 3(3), 88-95. doi: 10.11648/j.ijics.20180303.12

    Copy | Download

    AMA Style

    Yuejin Zhang, Jiajia Zhang, Dongmei Zhao. Text Clustering Incremental Algorithm in Sensitive Topic Detection. Int J Inf Commun Sci. 2018;3(3):88-95. doi: 10.11648/j.ijics.20180303.12

    Copy | Download

  • @article{10.11648/j.ijics.20180303.12,
      author = {Yuejin Zhang and Jiajia Zhang and Dongmei Zhao},
      title = {Text Clustering Incremental Algorithm in Sensitive Topic Detection},
      journal = {International Journal of Information and Communication Sciences},
      volume = {3},
      number = {3},
      pages = {88-95},
      doi = {10.11648/j.ijics.20180303.12},
      url = {https://doi.org/10.11648/j.ijics.20180303.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijics.20180303.12},
      abstract = {With the rapid development of Internet technology, the influence of online consensus continues to expand. How to quickly and effectively discover sensitive topics and keep track of those topics has become an important research recently. Text clustering can aggregate news texts with the same or similar content to achieve the purpose of discovering topics automatically. Make improvement to clustering algorithm according to different media types is the main research direction. Although the existing typical clustering algorithms have certain advantages, they all face constraints on data size and data characteristics in specific applications. There is no existing algorithm can fully adapt to these characteristics. Although the application of more Single-pass algorithms in the (TDT) field can realize the discovery and tracking of topics, there are disadvantages of poor accuracy and slow speed under massive data. According to the dynamic evolution characteristics of online consensus, this paper proposes an incremental text clustering algorithm based on Single-pass, which optimizes the clustering accuracy and efficiency of massive news. Based on the real online news texts from the online consensus analysis system, we conduct an experiment to test and verify the feasibility and effectiveness of the algorithm we proposed. The result shows that the new algorithm is much more efficient compared to the original Single-pass clustering algorithm. In the real application, the new incremental text clustering algorithm basically meets the real-time demand of online topic detection and has a certain practical value.},
     year = {2018}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Text Clustering Incremental Algorithm in Sensitive Topic Detection
    AU  - Yuejin Zhang
    AU  - Jiajia Zhang
    AU  - Dongmei Zhao
    Y1  - 2018/10/30
    PY  - 2018
    N1  - https://doi.org/10.11648/j.ijics.20180303.12
    DO  - 10.11648/j.ijics.20180303.12
    T2  - International Journal of Information and Communication Sciences
    JF  - International Journal of Information and Communication Sciences
    JO  - International Journal of Information and Communication Sciences
    SP  - 88
    EP  - 95
    PB  - Science Publishing Group
    SN  - 2575-1719
    UR  - https://doi.org/10.11648/j.ijics.20180303.12
    AB  - With the rapid development of Internet technology, the influence of online consensus continues to expand. How to quickly and effectively discover sensitive topics and keep track of those topics has become an important research recently. Text clustering can aggregate news texts with the same or similar content to achieve the purpose of discovering topics automatically. Make improvement to clustering algorithm according to different media types is the main research direction. Although the existing typical clustering algorithms have certain advantages, they all face constraints on data size and data characteristics in specific applications. There is no existing algorithm can fully adapt to these characteristics. Although the application of more Single-pass algorithms in the (TDT) field can realize the discovery and tracking of topics, there are disadvantages of poor accuracy and slow speed under massive data. According to the dynamic evolution characteristics of online consensus, this paper proposes an incremental text clustering algorithm based on Single-pass, which optimizes the clustering accuracy and efficiency of massive news. Based on the real online news texts from the online consensus analysis system, we conduct an experiment to test and verify the feasibility and effectiveness of the algorithm we proposed. The result shows that the new algorithm is much more efficient compared to the original Single-pass clustering algorithm. In the real application, the new incremental text clustering algorithm basically meets the real-time demand of online topic detection and has a certain practical value.
    VL  - 3
    IS  - 3
    ER  - 

    Copy | Download

Author Information
  • Cyber Security Department, Municipal Cyberspace Administration, Beijing, China

  • Eliot K-8 Innovation School, Boston, America

  • Department of Electronic Commerce, China Agricultural University, Beijing, China

  • Sections