Flexible Pseudo-Relevance Feedback via Selective Sampling

Tetsuya Sakai, Toshihiko Manabe, Makoto Koyama

研究成果: Article

52 引用 (Scopus)

抄録

Although Pseudo-Relevance Feedback (PRF) is a widely used technique for enhancing average retrieval performance, it may actually hurt performance for around one-third of a given set of topics. To enhance the reliability of PRF, Flexible PRF has been proposed, which adjusts the number of pseudo-relevant documents and/or the number of expansion terms for each topic. This paper explores a new, inexpensive Flexible PRF method, called Selective Sampling, which is unique in that it can skip documents in the initial ranked output to look for more “novel” pseudo-relevant documents. While Selective Sampling is only comparable to Traditional PRF in terms of average performance and reliability, per-topic analyses show that Selective Sampling outperforms Traditional PRF almost as often as Traditional PRF outperforms Selective Sampling. Thus, treating the top P documents as relevant is often not the best strategy. However, predicting when Selective Sampling outperforms Traditional PRF appears to be as difficult as predicting when a PRF method fails. For example, our per-topic analyses show that even the proportion of truly relevant documents in the pseudo-relevant set is not necessarily a good performance predictor.

元の言語English
ページ(範囲)111-135
ページ数25
ジャーナルACM Transactions on Asian Language Information Processing
4
発行部数2
DOI
出版物ステータスPublished - 2005 6 1
外部発表Yes

Fingerprint

Sampling
Feedback

ASJC Scopus subject areas

  • Computer Science(all)

これを引用

Flexible Pseudo-Relevance Feedback via Selective Sampling. / Sakai, Tetsuya; Manabe, Toshihiko; Koyama, Makoto.

:: ACM Transactions on Asian Language Information Processing, 巻 4, 番号 2, 01.06.2005, p. 111-135.

研究成果: Article

Sakai, Tetsuya ; Manabe, Toshihiko ; Koyama, Makoto. / Flexible Pseudo-Relevance Feedback via Selective Sampling. :: ACM Transactions on Asian Language Information Processing. 2005 ; 巻 4, 番号 2. pp. 111-135.
@article{feb6c70c7ade42aea754c85bd583fcbc,
title = "Flexible Pseudo-Relevance Feedback via Selective Sampling",
abstract = "Although Pseudo-Relevance Feedback (PRF) is a widely used technique for enhancing average retrieval performance, it may actually hurt performance for around one-third of a given set of topics. To enhance the reliability of PRF, Flexible PRF has been proposed, which adjusts the number of pseudo-relevant documents and/or the number of expansion terms for each topic. This paper explores a new, inexpensive Flexible PRF method, called Selective Sampling, which is unique in that it can skip documents in the initial ranked output to look for more “novel” pseudo-relevant documents. While Selective Sampling is only comparable to Traditional PRF in terms of average performance and reliability, per-topic analyses show that Selective Sampling outperforms Traditional PRF almost as often as Traditional PRF outperforms Selective Sampling. Thus, treating the top P documents as relevant is often not the best strategy. However, predicting when Selective Sampling outperforms Traditional PRF appears to be as difficult as predicting when a PRF method fails. For example, our per-topic analyses show that even the proportion of truly relevant documents in the pseudo-relevant set is not necessarily a good performance predictor.",
keywords = "Experimentation, flexible pseudo-relevance feedback, Performance, Pseudo-relevance feedback, selective sampling",
author = "Tetsuya Sakai and Toshihiko Manabe and Makoto Koyama",
year = "2005",
month = "6",
day = "1",
doi = "10.1145/1105696.1105699",
language = "English",
volume = "4",
pages = "111--135",
journal = "ACM Transactions on Asian Language Information Processing",
issn = "1530-0226",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

TY - JOUR

T1 - Flexible Pseudo-Relevance Feedback via Selective Sampling

AU - Sakai, Tetsuya

AU - Manabe, Toshihiko

AU - Koyama, Makoto

PY - 2005/6/1

Y1 - 2005/6/1

N2 - Although Pseudo-Relevance Feedback (PRF) is a widely used technique for enhancing average retrieval performance, it may actually hurt performance for around one-third of a given set of topics. To enhance the reliability of PRF, Flexible PRF has been proposed, which adjusts the number of pseudo-relevant documents and/or the number of expansion terms for each topic. This paper explores a new, inexpensive Flexible PRF method, called Selective Sampling, which is unique in that it can skip documents in the initial ranked output to look for more “novel” pseudo-relevant documents. While Selective Sampling is only comparable to Traditional PRF in terms of average performance and reliability, per-topic analyses show that Selective Sampling outperforms Traditional PRF almost as often as Traditional PRF outperforms Selective Sampling. Thus, treating the top P documents as relevant is often not the best strategy. However, predicting when Selective Sampling outperforms Traditional PRF appears to be as difficult as predicting when a PRF method fails. For example, our per-topic analyses show that even the proportion of truly relevant documents in the pseudo-relevant set is not necessarily a good performance predictor.

AB - Although Pseudo-Relevance Feedback (PRF) is a widely used technique for enhancing average retrieval performance, it may actually hurt performance for around one-third of a given set of topics. To enhance the reliability of PRF, Flexible PRF has been proposed, which adjusts the number of pseudo-relevant documents and/or the number of expansion terms for each topic. This paper explores a new, inexpensive Flexible PRF method, called Selective Sampling, which is unique in that it can skip documents in the initial ranked output to look for more “novel” pseudo-relevant documents. While Selective Sampling is only comparable to Traditional PRF in terms of average performance and reliability, per-topic analyses show that Selective Sampling outperforms Traditional PRF almost as often as Traditional PRF outperforms Selective Sampling. Thus, treating the top P documents as relevant is often not the best strategy. However, predicting when Selective Sampling outperforms Traditional PRF appears to be as difficult as predicting when a PRF method fails. For example, our per-topic analyses show that even the proportion of truly relevant documents in the pseudo-relevant set is not necessarily a good performance predictor.

KW - Experimentation

KW - flexible pseudo-relevance feedback

KW - Performance

KW - Pseudo-relevance feedback

KW - selective sampling

UR - http://www.scopus.com/inward/record.url?scp=33750320351&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33750320351&partnerID=8YFLogxK

U2 - 10.1145/1105696.1105699

DO - 10.1145/1105696.1105699

M3 - Article

AN - SCOPUS:33750320351

VL - 4

SP - 111

EP - 135

JO - ACM Transactions on Asian Language Information Processing

JF - ACM Transactions on Asian Language Information Processing

SN - 1530-0226

IS - 2

ER -