Decomposition of term-document matrix representation for clustering analysis

Jianxiong Yang, Junzo Watada

    研究成果: Conference contribution

    4 被引用数 (Scopus)

    抄録

    Latent Semantic Indexing (LSI) is an information retrieval technique using a low-rank singular value decomposition (SVD) of term-document matrix. The aim of this method is to reduce the matrix dimension by finding a pattern in document collection with concurrently referring terms. The methods are implemented to calculate the weight of term-document in vector space model (VSM) for document clustering using fuzzy clustering algorithm. LSI is an attempt to exploit the underlying semantic structure of word usage in documents. During the query-matching phase of LSI, a user's query is first projected into the term-document space, and then compared to all terms and documents represented in the vector space. Using some similarity measure, the nearest (most relevant) terms and documents are identified and returned to the user. The current LSI query-matching method requires computing the similarity measure about the query of every term and document in the vector space. In this paper, the Maximal Tree Algorithm is used within a recent LSI implementation to mitigate the computational time and computational complexity of query matching. The Maximal Tree data structure stores the term and document vectors in such a way that only those terms and documents are most likely qualified as the nearest neighbor to the query will be examined and retrieved. In a word, this novel algorithm is suitable for improving the accuracy of data miners.

    本文言語English
    ホスト出版物のタイトルIEEE International Conference on Fuzzy Systems
    ページ976-983
    ページ数8
    DOI
    出版ステータスPublished - 2011
    イベント2011 IEEE International Conference on Fuzzy Systems, FUZZ 2011 - Taipei
    継続期間: 2011 6 272011 6 30

    Other

    Other2011 IEEE International Conference on Fuzzy Systems, FUZZ 2011
    CityTaipei
    Period11/6/2711/6/30

    ASJC Scopus subject areas

    • ソフトウェア
    • 人工知能
    • 応用数学
    • 理論的コンピュータサイエンス

    フィンガープリント

    「Decomposition of term-document matrix representation for clustering analysis」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

    引用スタイル