TY - GEN
T1 - Evaluating mobile search with height-biased gain
AU - Luo, Cheng
AU - Liu, Yiqun
AU - Sakai, Tetsuya
AU - Zhang, Fan
AU - Zhang, Min
AU - Ma, Shaoping
N1 - Funding Information:
This work was supported by Natural Science Foundation (61622208, 61532011, 61472206) of China and National Key Basic Research Program (2015CB358700).
PY - 2017/8/7
Y1 - 2017/8/7
N2 - Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, traditional retrieval measures that regard the SERP as a ranked list of homogeneous items are not adequate for evaluating the overall quality of mobile SERPs. Specifically, we address the following new problems in mobile search evaluation: (1) Different retrieved items have different heights within the scrollable SERP, unlike a ten-blue-link SERP in which results have similar heights with each other. Therefore, the traditional rank-based decaying functions are not adequate for mobile search metrics. (2) For some types of verticals and cards, the information that the user seeks is already embedded in the snippet, which makes clicking on those items to access the landing page unnecessary. (3) For some results with complex sub-components (and usually a large height), the total gain of the results cannot be obtained if users only read part of their contents. The benefit brought by the result is affected by user's reading behavior and the internal gain distribution (over the height) should be modeled to get a more accurate estimation. To tackle these problems, we conduct a lab-based user study to construct suitable user behavior model for mobile search evaluation. From the results, we find that the geometric heights of user's browsing trails can be adopted as a good signal of user effort. Based on these findings, we propose a new evaluation metric, Height-Biased Gain, which is calculated by summing up the product of gain distribution and discount factors that are both modeled in terms of result height. To evaluate the effectiveness of the proposed metric, we compare the agreement of evaluation metrics with side-by-side user preferences on a test collection composed of four mobile search engines. Experimental results show that HBG agrees with user preferences 85.33% of the time, which is better than all existing metrics.
AB - Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, traditional retrieval measures that regard the SERP as a ranked list of homogeneous items are not adequate for evaluating the overall quality of mobile SERPs. Specifically, we address the following new problems in mobile search evaluation: (1) Different retrieved items have different heights within the scrollable SERP, unlike a ten-blue-link SERP in which results have similar heights with each other. Therefore, the traditional rank-based decaying functions are not adequate for mobile search metrics. (2) For some types of verticals and cards, the information that the user seeks is already embedded in the snippet, which makes clicking on those items to access the landing page unnecessary. (3) For some results with complex sub-components (and usually a large height), the total gain of the results cannot be obtained if users only read part of their contents. The benefit brought by the result is affected by user's reading behavior and the internal gain distribution (over the height) should be modeled to get a more accurate estimation. To tackle these problems, we conduct a lab-based user study to construct suitable user behavior model for mobile search evaluation. From the results, we find that the geometric heights of user's browsing trails can be adopted as a good signal of user effort. Based on these findings, we propose a new evaluation metric, Height-Biased Gain, which is calculated by summing up the product of gain distribution and discount factors that are both modeled in terms of result height. To evaluate the effectiveness of the proposed metric, we compare the agreement of evaluation metrics with side-by-side user preferences on a test collection composed of four mobile search engines. Experimental results show that HBG agrees with user preferences 85.33% of the time, which is better than all existing metrics.
KW - Evaluation Metric
KW - Mobile Search
KW - User Behavior
UR - http://www.scopus.com/inward/record.url?scp=85029354113&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029354113&partnerID=8YFLogxK
U2 - 10.1145/3077136.3080795
DO - 10.1145/3077136.3080795
M3 - Conference contribution
AN - SCOPUS:85029354113
T3 - SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 435
EP - 444
BT - SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017
Y2 - 7 August 2017 through 11 August 2017
ER -