Mining subtopics from text fragments for a web query

Qinglei Wang, Yanan Qian, Ruihua Song, Zhicheng Dou, Fan Zhang, Tetsuya Sakai, Qinghua Zheng

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

Web search queries are often ambiguous or faceted, and the task of identifying the major underlying senses and facets of queries has received much attention in recent years. We refer to this task as query subtopic mining. In this paper, we propose to use surrounding text of query terms in top retrieved documents to mine subtopics and rank them. We first extract text fragments containing query terms from different parts of documents. Then we group similar text fragments into clusters and generate a readable subtopic for each cluster. Based on the cluster and the language model trained from a query log, we calculate three features and combine them into a relevance score for each subtopic. Subtopics are finally ranked by balancing relevance and novelty. Our evaluation experiments with the NTCIR-9 INTENT Chinese Subtopic Mining test collection show that our method significantly outperforms a query log based method proposed by Radlinski et al. (2010) and a search result clustering based method proposed by Zeng et al. (2004) in terms of precision, I-rec, D-nDCG and D#-nDCG, the official evaluation metrics used at the NTCIR-9 INTENT task. Moreover, our generated subtopics are significantly more readable than those generated by the search result clustering method.

Original languageEnglish
Pages (from-to)484-503
Number of pages20
JournalInformation Retrieval
Volume16
Issue number4
DOIs
Publication statusPublished - 2013
Externally publishedYes

Fingerprint

evaluation
Experiments
experiment
language
Group

Keywords

  • Intent mining
  • Intents ranking
  • Query intent

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Cite this

Wang, Q., Qian, Y., Song, R., Dou, Z., Zhang, F., Sakai, T., & Zheng, Q. (2013). Mining subtopics from text fragments for a web query. Information Retrieval, 16(4), 484-503. https://doi.org/10.1007/s10791-013-9221-8

Mining subtopics from text fragments for a web query. / Wang, Qinglei; Qian, Yanan; Song, Ruihua; Dou, Zhicheng; Zhang, Fan; Sakai, Tetsuya; Zheng, Qinghua.

In: Information Retrieval, Vol. 16, No. 4, 2013, p. 484-503.

Research output: Contribution to journalArticle

Wang, Q, Qian, Y, Song, R, Dou, Z, Zhang, F, Sakai, T & Zheng, Q 2013, 'Mining subtopics from text fragments for a web query', Information Retrieval, vol. 16, no. 4, pp. 484-503. https://doi.org/10.1007/s10791-013-9221-8
Wang, Qinglei ; Qian, Yanan ; Song, Ruihua ; Dou, Zhicheng ; Zhang, Fan ; Sakai, Tetsuya ; Zheng, Qinghua. / Mining subtopics from text fragments for a web query. In: Information Retrieval. 2013 ; Vol. 16, No. 4. pp. 484-503.
@article{ed24e84ebd3044ecb091c52f821ebdd7,
title = "Mining subtopics from text fragments for a web query",
abstract = "Web search queries are often ambiguous or faceted, and the task of identifying the major underlying senses and facets of queries has received much attention in recent years. We refer to this task as query subtopic mining. In this paper, we propose to use surrounding text of query terms in top retrieved documents to mine subtopics and rank them. We first extract text fragments containing query terms from different parts of documents. Then we group similar text fragments into clusters and generate a readable subtopic for each cluster. Based on the cluster and the language model trained from a query log, we calculate three features and combine them into a relevance score for each subtopic. Subtopics are finally ranked by balancing relevance and novelty. Our evaluation experiments with the NTCIR-9 INTENT Chinese Subtopic Mining test collection show that our method significantly outperforms a query log based method proposed by Radlinski et al. (2010) and a search result clustering based method proposed by Zeng et al. (2004) in terms of precision, I-rec, D-nDCG and D#-nDCG, the official evaluation metrics used at the NTCIR-9 INTENT task. Moreover, our generated subtopics are significantly more readable than those generated by the search result clustering method.",
keywords = "Intent mining, Intents ranking, Query intent",
author = "Qinglei Wang and Yanan Qian and Ruihua Song and Zhicheng Dou and Fan Zhang and Tetsuya Sakai and Qinghua Zheng",
year = "2013",
doi = "10.1007/s10791-013-9221-8",
language = "English",
volume = "16",
pages = "484--503",
journal = "Information Retrieval",
issn = "1386-4564",
publisher = "Springer Netherlands",
number = "4",

}

TY - JOUR

T1 - Mining subtopics from text fragments for a web query

AU - Wang, Qinglei

AU - Qian, Yanan

AU - Song, Ruihua

AU - Dou, Zhicheng

AU - Zhang, Fan

AU - Sakai, Tetsuya

AU - Zheng, Qinghua

PY - 2013

Y1 - 2013

N2 - Web search queries are often ambiguous or faceted, and the task of identifying the major underlying senses and facets of queries has received much attention in recent years. We refer to this task as query subtopic mining. In this paper, we propose to use surrounding text of query terms in top retrieved documents to mine subtopics and rank them. We first extract text fragments containing query terms from different parts of documents. Then we group similar text fragments into clusters and generate a readable subtopic for each cluster. Based on the cluster and the language model trained from a query log, we calculate three features and combine them into a relevance score for each subtopic. Subtopics are finally ranked by balancing relevance and novelty. Our evaluation experiments with the NTCIR-9 INTENT Chinese Subtopic Mining test collection show that our method significantly outperforms a query log based method proposed by Radlinski et al. (2010) and a search result clustering based method proposed by Zeng et al. (2004) in terms of precision, I-rec, D-nDCG and D#-nDCG, the official evaluation metrics used at the NTCIR-9 INTENT task. Moreover, our generated subtopics are significantly more readable than those generated by the search result clustering method.

AB - Web search queries are often ambiguous or faceted, and the task of identifying the major underlying senses and facets of queries has received much attention in recent years. We refer to this task as query subtopic mining. In this paper, we propose to use surrounding text of query terms in top retrieved documents to mine subtopics and rank them. We first extract text fragments containing query terms from different parts of documents. Then we group similar text fragments into clusters and generate a readable subtopic for each cluster. Based on the cluster and the language model trained from a query log, we calculate three features and combine them into a relevance score for each subtopic. Subtopics are finally ranked by balancing relevance and novelty. Our evaluation experiments with the NTCIR-9 INTENT Chinese Subtopic Mining test collection show that our method significantly outperforms a query log based method proposed by Radlinski et al. (2010) and a search result clustering based method proposed by Zeng et al. (2004) in terms of precision, I-rec, D-nDCG and D#-nDCG, the official evaluation metrics used at the NTCIR-9 INTENT task. Moreover, our generated subtopics are significantly more readable than those generated by the search result clustering method.

KW - Intent mining

KW - Intents ranking

KW - Query intent

UR - http://www.scopus.com/inward/record.url?scp=84880796141&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880796141&partnerID=8YFLogxK

U2 - 10.1007/s10791-013-9221-8

DO - 10.1007/s10791-013-9221-8

M3 - Article

AN - SCOPUS:84880796141

VL - 16

SP - 484

EP - 503

JO - Information Retrieval

JF - Information Retrieval

SN - 1386-4564

IS - 4

ER -