TY - GEN

T1 - Retrieved Image Refinement by Bootstrap Outlier Test

AU - Watanabe, Hayato

AU - Hino, Hideitsu

AU - Akaho, Shotaro

AU - Murata, Noboru

N1 - Funding Information:
Partly supported by JST CREST JPMJCR1761, JSPS KAKENHI 17H01748,17H02953 and 19H04113.

PY - 2019

Y1 - 2019

N2 - Outlier detection is used to identify data points or a small number of subsets of data that are significantly different from most other data in a given dataset. It is challenging to detect outliers using an objective and quantitative approach. Methods that use the framework of statistical hypothesis testing are widely used by assuming a specific parametric distribution as a data generation model, but there is no guarantee that the distribution of data can be adequately approximated by a parametric distribution in practical problems. In this paper, a simple method is proposed to objectively detect outliers by hypothesis testing without assuming a specific distribution of outlier scores. By using an arbitrary outlier score function, hypothesis testing is used to determine whether each given sample is an outlier. The distribution of the test statistics is needed for the hypothesis test, and is estimated based on the given data using the bootstrap method. The effectiveness of the proposed outlier test was verified by applying it to outlier detection for text-based image retrieval, where it improved the quality of image searches by removing irrelevant images.

AB - Outlier detection is used to identify data points or a small number of subsets of data that are significantly different from most other data in a given dataset. It is challenging to detect outliers using an objective and quantitative approach. Methods that use the framework of statistical hypothesis testing are widely used by assuming a specific parametric distribution as a data generation model, but there is no guarantee that the distribution of data can be adequately approximated by a parametric distribution in practical problems. In this paper, a simple method is proposed to objectively detect outliers by hypothesis testing without assuming a specific distribution of outlier scores. By using an arbitrary outlier score function, hypothesis testing is used to determine whether each given sample is an outlier. The distribution of the test statistics is needed for the hypothesis test, and is estimated based on the given data using the bootstrap method. The effectiveness of the proposed outlier test was verified by applying it to outlier detection for text-based image retrieval, where it improved the quality of image searches by removing irrelevant images.

KW - Hypothesis testing

KW - Image retrieval

KW - Outlier removal

UR - http://www.scopus.com/inward/record.url?scp=85072871968&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072871968&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-29888-3_41

DO - 10.1007/978-3-030-29888-3_41

M3 - Conference contribution

AN - SCOPUS:85072871968

SN - 9783030298876

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 505

EP - 517

BT - Computer Analysis of Images and Patterns - 18th International Conference, CAIP 2019, Proceedings

A2 - Vento, Mario

A2 - Percannella, Gennaro

PB - Springer Verlag

T2 - 18th International Conference on Computer Analysis of Images and Patterns, CAIP 2019

Y2 - 3 September 2019 through 5 September 2019

ER -