TY - JOUR
T1 - Mining subtopics from text fragments for a web query
AU - Wang, Qinglei
AU - Qian, Yanan
AU - Song, Ruihua
AU - Dou, Zhicheng
AU - Zhang, Fan
AU - Sakai, Tetsuya
AU - Zheng, Qinghua
N1 - Funding Information:
Acknowledgments The research was supported in part by National Science Foundation of China under Grant Nos. 61173112, 61070072, 61103160, 61103239; National High Technology Research and Development Program 863 of China under Grant No. 2012AA011003; Cheung Kong Scholars Program; Ministry of Education of China Humanities and Social Sciences Project under Grant No. 12YJC880117. The fifth author is supported by NSFC of China (60903028, 61070014), Key Projects in the Tianjin Science & Technology Pillar Program (11ZCKFGX01100).
PY - 2013/8
Y1 - 2013/8
N2 - Web search queries are often ambiguous or faceted, and the task of identifying the major underlying senses and facets of queries has received much attention in recent years. We refer to this task as query subtopic mining. In this paper, we propose to use surrounding text of query terms in top retrieved documents to mine subtopics and rank them. We first extract text fragments containing query terms from different parts of documents. Then we group similar text fragments into clusters and generate a readable subtopic for each cluster. Based on the cluster and the language model trained from a query log, we calculate three features and combine them into a relevance score for each subtopic. Subtopics are finally ranked by balancing relevance and novelty. Our evaluation experiments with the NTCIR-9 INTENT Chinese Subtopic Mining test collection show that our method significantly outperforms a query log based method proposed by Radlinski et al. (2010) and a search result clustering based method proposed by Zeng et al. (2004) in terms of precision, I-rec, D-nDCG and D#-nDCG, the official evaluation metrics used at the NTCIR-9 INTENT task. Moreover, our generated subtopics are significantly more readable than those generated by the search result clustering method.
AB - Web search queries are often ambiguous or faceted, and the task of identifying the major underlying senses and facets of queries has received much attention in recent years. We refer to this task as query subtopic mining. In this paper, we propose to use surrounding text of query terms in top retrieved documents to mine subtopics and rank them. We first extract text fragments containing query terms from different parts of documents. Then we group similar text fragments into clusters and generate a readable subtopic for each cluster. Based on the cluster and the language model trained from a query log, we calculate three features and combine them into a relevance score for each subtopic. Subtopics are finally ranked by balancing relevance and novelty. Our evaluation experiments with the NTCIR-9 INTENT Chinese Subtopic Mining test collection show that our method significantly outperforms a query log based method proposed by Radlinski et al. (2010) and a search result clustering based method proposed by Zeng et al. (2004) in terms of precision, I-rec, D-nDCG and D#-nDCG, the official evaluation metrics used at the NTCIR-9 INTENT task. Moreover, our generated subtopics are significantly more readable than those generated by the search result clustering method.
KW - Intent mining
KW - Intents ranking
KW - Query intent
UR - http://www.scopus.com/inward/record.url?scp=84880796141&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84880796141&partnerID=8YFLogxK
U2 - 10.1007/s10791-013-9221-8
DO - 10.1007/s10791-013-9221-8
M3 - Article
AN - SCOPUS:84880796141
VL - 16
SP - 484
EP - 503
JO - Information Retrieval
JF - Information Retrieval
SN - 1386-4564
IS - 4
ER -