### Abstract

Outlier detection is used to identify data points or a small number of subsets of data that are significantly different from most other data in a given dataset. It is challenging to detect outliers using an objective and quantitative approach. Methods that use the framework of statistical hypothesis testing are widely used by assuming a specific parametric distribution as a data generation model, but there is no guarantee that the distribution of data can be adequately approximated by a parametric distribution in practical problems. In this paper, a simple method is proposed to objectively detect outliers by hypothesis testing without assuming a specific distribution of outlier scores. By using an arbitrary outlier score function, hypothesis testing is used to determine whether each given sample is an outlier. The distribution of the test statistics is needed for the hypothesis test, and is estimated based on the given data using the bootstrap method. The effectiveness of the proposed outlier test was verified by applying it to outlier detection for text-based image retrieval, where it improved the quality of image searches by removing irrelevant images.

Original language | English |
---|---|

Title of host publication | Computer Analysis of Images and Patterns - 18th International Conference, CAIP 2019, Proceedings |

Editors | Mario Vento, Gennaro Percannella |

Publisher | Springer-Verlag |

Pages | 505-517 |

Number of pages | 13 |

ISBN (Print) | 9783030298876 |

DOIs | |

Publication status | Published - 2019 Jan 1 |

Event | 18th International Conference on Computer Analysis of Images and Patterns, CAIP 2019 - Salerno, Italy Duration: 2019 Sep 3 → 2019 Sep 5 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Volume | 11678 LNCS |

ISSN (Print) | 0302-9743 |

ISSN (Electronic) | 1611-3349 |

### Conference

Conference | 18th International Conference on Computer Analysis of Images and Patterns, CAIP 2019 |
---|---|

Country | Italy |

City | Salerno |

Period | 19/9/3 → 19/9/5 |

### Fingerprint

### Keywords

- Hypothesis testing
- Image retrieval
- Outlier removal

### ASJC Scopus subject areas

- Theoretical Computer Science
- Computer Science(all)

### Cite this

*Computer Analysis of Images and Patterns - 18th International Conference, CAIP 2019, Proceedings*(pp. 505-517). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11678 LNCS). Springer-Verlag. https://doi.org/10.1007/978-3-030-29888-3_41

**Retrieved Image Refinement by Bootstrap Outlier Test.** / Watanabe, Hayato; Hino, Hideitsu; Akaho, Shotaro; Murata, Noboru.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Computer Analysis of Images and Patterns - 18th International Conference, CAIP 2019, Proceedings.*Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11678 LNCS, Springer-Verlag, pp. 505-517, 18th International Conference on Computer Analysis of Images and Patterns, CAIP 2019, Salerno, Italy, 19/9/3. https://doi.org/10.1007/978-3-030-29888-3_41

}

TY - GEN

T1 - Retrieved Image Refinement by Bootstrap Outlier Test

AU - Watanabe, Hayato

AU - Hino, Hideitsu

AU - Akaho, Shotaro

AU - Murata, Noboru

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Outlier detection is used to identify data points or a small number of subsets of data that are significantly different from most other data in a given dataset. It is challenging to detect outliers using an objective and quantitative approach. Methods that use the framework of statistical hypothesis testing are widely used by assuming a specific parametric distribution as a data generation model, but there is no guarantee that the distribution of data can be adequately approximated by a parametric distribution in practical problems. In this paper, a simple method is proposed to objectively detect outliers by hypothesis testing without assuming a specific distribution of outlier scores. By using an arbitrary outlier score function, hypothesis testing is used to determine whether each given sample is an outlier. The distribution of the test statistics is needed for the hypothesis test, and is estimated based on the given data using the bootstrap method. The effectiveness of the proposed outlier test was verified by applying it to outlier detection for text-based image retrieval, where it improved the quality of image searches by removing irrelevant images.

AB - Outlier detection is used to identify data points or a small number of subsets of data that are significantly different from most other data in a given dataset. It is challenging to detect outliers using an objective and quantitative approach. Methods that use the framework of statistical hypothesis testing are widely used by assuming a specific parametric distribution as a data generation model, but there is no guarantee that the distribution of data can be adequately approximated by a parametric distribution in practical problems. In this paper, a simple method is proposed to objectively detect outliers by hypothesis testing without assuming a specific distribution of outlier scores. By using an arbitrary outlier score function, hypothesis testing is used to determine whether each given sample is an outlier. The distribution of the test statistics is needed for the hypothesis test, and is estimated based on the given data using the bootstrap method. The effectiveness of the proposed outlier test was verified by applying it to outlier detection for text-based image retrieval, where it improved the quality of image searches by removing irrelevant images.

KW - Hypothesis testing

KW - Image retrieval

KW - Outlier removal

UR - http://www.scopus.com/inward/record.url?scp=85072871968&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072871968&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-29888-3_41

DO - 10.1007/978-3-030-29888-3_41

M3 - Conference contribution

AN - SCOPUS:85072871968

SN - 9783030298876

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 505

EP - 517

BT - Computer Analysis of Images and Patterns - 18th International Conference, CAIP 2019, Proceedings

A2 - Vento, Mario

A2 - Percannella, Gennaro

PB - Springer-Verlag

ER -