Evaluating mobile search with height-biased gain

Cheng Luo, Yiqun Liu, Tetsuya Sakai, Fan Zhang, Min Zhang, Shaoping Ma

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    13 Citations (Scopus)

    Abstract

    Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, traditional retrieval measures that regard the SERP as a ranked list of homogeneous items are not adequate for evaluating the overall quality of mobile SERPs. Specifically, we address the following new problems in mobile search evaluation: (1) Different retrieved items have different heights within the scrollable SERP, unlike a ten-blue-link SERP in which results have similar heights with each other. Therefore, the traditional rank-based decaying functions are not adequate for mobile search metrics. (2) For some types of verticals and cards, the information that the user seeks is already embedded in the snippet, which makes clicking on those items to access the landing page unnecessary. (3) For some results with complex sub-components (and usually a large height), the total gain of the results cannot be obtained if users only read part of their contents. The benefit brought by the result is affected by user's reading behavior and the internal gain distribution (over the height) should be modeled to get a more accurate estimation. To tackle these problems, we conduct a lab-based user study to construct suitable user behavior model for mobile search evaluation. From the results, we find that the geometric heights of user's browsing trails can be adopted as a good signal of user effort. Based on these findings, we propose a new evaluation metric, Height-Biased Gain, which is calculated by summing up the product of gain distribution and discount factors that are both modeled in terms of result height. To evaluate the effectiveness of the proposed metric, we compare the agreement of evaluation metrics with side-by-side user preferences on a test collection composed of four mobile search engines. Experimental results show that HBG agrees with user preferences 85.33% of the time, which is better than all existing metrics.

    Original languageEnglish
    Title of host publicationSIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
    PublisherAssociation for Computing Machinery, Inc
    Pages435-444
    Number of pages10
    ISBN (Electronic)9781450350228
    DOIs
    Publication statusPublished - 2017 Aug 7
    Event40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017 - Tokyo, Shinjuku, Japan
    Duration: 2017 Aug 72017 Aug 11

    Other

    Other40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017
    CountryJapan
    CityTokyo, Shinjuku
    Period17/8/717/8/11

    Fingerprint

    Search engines
    Landing

    Keywords

    • Evaluation Metric
    • Mobile Search
    • User Behavior

    ASJC Scopus subject areas

    • Information Systems
    • Software
    • Computer Graphics and Computer-Aided Design

    Cite this

    Luo, C., Liu, Y., Sakai, T., Zhang, F., Zhang, M., & Ma, S. (2017). Evaluating mobile search with height-biased gain. In SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 435-444). Association for Computing Machinery, Inc. https://doi.org/10.1145/3077136.3080795

    Evaluating mobile search with height-biased gain. / Luo, Cheng; Liu, Yiqun; Sakai, Tetsuya; Zhang, Fan; Zhang, Min; Ma, Shaoping.

    SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, 2017. p. 435-444.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Luo, C, Liu, Y, Sakai, T, Zhang, F, Zhang, M & Ma, S 2017, Evaluating mobile search with height-biased gain. in SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, pp. 435-444, 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, Tokyo, Shinjuku, Japan, 17/8/7. https://doi.org/10.1145/3077136.3080795
    Luo C, Liu Y, Sakai T, Zhang F, Zhang M, Ma S. Evaluating mobile search with height-biased gain. In SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc. 2017. p. 435-444 https://doi.org/10.1145/3077136.3080795
    Luo, Cheng ; Liu, Yiqun ; Sakai, Tetsuya ; Zhang, Fan ; Zhang, Min ; Ma, Shaoping. / Evaluating mobile search with height-biased gain. SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, 2017. pp. 435-444
    @inproceedings{fe8bf59e10d0417f8cd1629eec1ea727,
    title = "Evaluating mobile search with height-biased gain",
    abstract = "Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, traditional retrieval measures that regard the SERP as a ranked list of homogeneous items are not adequate for evaluating the overall quality of mobile SERPs. Specifically, we address the following new problems in mobile search evaluation: (1) Different retrieved items have different heights within the scrollable SERP, unlike a ten-blue-link SERP in which results have similar heights with each other. Therefore, the traditional rank-based decaying functions are not adequate for mobile search metrics. (2) For some types of verticals and cards, the information that the user seeks is already embedded in the snippet, which makes clicking on those items to access the landing page unnecessary. (3) For some results with complex sub-components (and usually a large height), the total gain of the results cannot be obtained if users only read part of their contents. The benefit brought by the result is affected by user's reading behavior and the internal gain distribution (over the height) should be modeled to get a more accurate estimation. To tackle these problems, we conduct a lab-based user study to construct suitable user behavior model for mobile search evaluation. From the results, we find that the geometric heights of user's browsing trails can be adopted as a good signal of user effort. Based on these findings, we propose a new evaluation metric, Height-Biased Gain, which is calculated by summing up the product of gain distribution and discount factors that are both modeled in terms of result height. To evaluate the effectiveness of the proposed metric, we compare the agreement of evaluation metrics with side-by-side user preferences on a test collection composed of four mobile search engines. Experimental results show that HBG agrees with user preferences 85.33{\%} of the time, which is better than all existing metrics.",
    keywords = "Evaluation Metric, Mobile Search, User Behavior",
    author = "Cheng Luo and Yiqun Liu and Tetsuya Sakai and Fan Zhang and Min Zhang and Shaoping Ma",
    year = "2017",
    month = "8",
    day = "7",
    doi = "10.1145/3077136.3080795",
    language = "English",
    pages = "435--444",
    booktitle = "SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval",
    publisher = "Association for Computing Machinery, Inc",

    }

    TY - GEN

    T1 - Evaluating mobile search with height-biased gain

    AU - Luo, Cheng

    AU - Liu, Yiqun

    AU - Sakai, Tetsuya

    AU - Zhang, Fan

    AU - Zhang, Min

    AU - Ma, Shaoping

    PY - 2017/8/7

    Y1 - 2017/8/7

    N2 - Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, traditional retrieval measures that regard the SERP as a ranked list of homogeneous items are not adequate for evaluating the overall quality of mobile SERPs. Specifically, we address the following new problems in mobile search evaluation: (1) Different retrieved items have different heights within the scrollable SERP, unlike a ten-blue-link SERP in which results have similar heights with each other. Therefore, the traditional rank-based decaying functions are not adequate for mobile search metrics. (2) For some types of verticals and cards, the information that the user seeks is already embedded in the snippet, which makes clicking on those items to access the landing page unnecessary. (3) For some results with complex sub-components (and usually a large height), the total gain of the results cannot be obtained if users only read part of their contents. The benefit brought by the result is affected by user's reading behavior and the internal gain distribution (over the height) should be modeled to get a more accurate estimation. To tackle these problems, we conduct a lab-based user study to construct suitable user behavior model for mobile search evaluation. From the results, we find that the geometric heights of user's browsing trails can be adopted as a good signal of user effort. Based on these findings, we propose a new evaluation metric, Height-Biased Gain, which is calculated by summing up the product of gain distribution and discount factors that are both modeled in terms of result height. To evaluate the effectiveness of the proposed metric, we compare the agreement of evaluation metrics with side-by-side user preferences on a test collection composed of four mobile search engines. Experimental results show that HBG agrees with user preferences 85.33% of the time, which is better than all existing metrics.

    AB - Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, traditional retrieval measures that regard the SERP as a ranked list of homogeneous items are not adequate for evaluating the overall quality of mobile SERPs. Specifically, we address the following new problems in mobile search evaluation: (1) Different retrieved items have different heights within the scrollable SERP, unlike a ten-blue-link SERP in which results have similar heights with each other. Therefore, the traditional rank-based decaying functions are not adequate for mobile search metrics. (2) For some types of verticals and cards, the information that the user seeks is already embedded in the snippet, which makes clicking on those items to access the landing page unnecessary. (3) For some results with complex sub-components (and usually a large height), the total gain of the results cannot be obtained if users only read part of their contents. The benefit brought by the result is affected by user's reading behavior and the internal gain distribution (over the height) should be modeled to get a more accurate estimation. To tackle these problems, we conduct a lab-based user study to construct suitable user behavior model for mobile search evaluation. From the results, we find that the geometric heights of user's browsing trails can be adopted as a good signal of user effort. Based on these findings, we propose a new evaluation metric, Height-Biased Gain, which is calculated by summing up the product of gain distribution and discount factors that are both modeled in terms of result height. To evaluate the effectiveness of the proposed metric, we compare the agreement of evaluation metrics with side-by-side user preferences on a test collection composed of four mobile search engines. Experimental results show that HBG agrees with user preferences 85.33% of the time, which is better than all existing metrics.

    KW - Evaluation Metric

    KW - Mobile Search

    KW - User Behavior

    UR - http://www.scopus.com/inward/record.url?scp=85029354113&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85029354113&partnerID=8YFLogxK

    U2 - 10.1145/3077136.3080795

    DO - 10.1145/3077136.3080795

    M3 - Conference contribution

    AN - SCOPUS:85029354113

    SP - 435

    EP - 444

    BT - SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

    PB - Association for Computing Machinery, Inc

    ER -