Multi-modal Embedding for Main Product Detection in Fashion

Long Long Yu, Edgar Simo Serra, Francesc Moreno-Noguer, Antonio Rubio

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

We present an approach to detect the main product in fashion images by exploiting the textual metadata associated with each image. Our approach is based on a Convolutional Neural Network and learns a joint embedding of object proposals and textual metadata to predict the main product in the image. We additionally use several complementary classification and overlap losses in order to improve training stability and performance. Our tests on a large-scale dataset taken from eight e-commerce sites show that our approach outperforms strong baselines and is able to accurately detect the main product in a wide diversity of challenging fashion images.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2236-2242
Number of pages7
ISBN (Electronic)9781538610343
DOIs
Publication statusPublished - 2018 Jan 19
Externally publishedYes
Event16th IEEE International Conference on Computer Vision Workshops, ICCVW 2017 - Venice, Italy
Duration: 2017 Oct 222017 Oct 29

Publication series

NameProceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017
Volume2018-January

Other

Other16th IEEE International Conference on Computer Vision Workshops, ICCVW 2017
CountryItaly
CityVenice
Period17/10/2217/10/29

Fingerprint

Metadata
Neural networks

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Cite this

Yu, L. L., Simo Serra, E., Moreno-Noguer, F., & Rubio, A. (2018). Multi-modal Embedding for Main Product Detection in Fashion. In Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017 (pp. 2236-2242). (Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017; Vol. 2018-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCVW.2017.261

Multi-modal Embedding for Main Product Detection in Fashion. / Yu, Long Long; Simo Serra, Edgar; Moreno-Noguer, Francesc; Rubio, Antonio.

Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017. Institute of Electrical and Electronics Engineers Inc., 2018. p. 2236-2242 (Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017; Vol. 2018-January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yu, LL, Simo Serra, E, Moreno-Noguer, F & Rubio, A 2018, Multi-modal Embedding for Main Product Detection in Fashion. in Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017. Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017, vol. 2018-January, Institute of Electrical and Electronics Engineers Inc., pp. 2236-2242, 16th IEEE International Conference on Computer Vision Workshops, ICCVW 2017, Venice, Italy, 17/10/22. https://doi.org/10.1109/ICCVW.2017.261
Yu LL, Simo Serra E, Moreno-Noguer F, Rubio A. Multi-modal Embedding for Main Product Detection in Fashion. In Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017. Institute of Electrical and Electronics Engineers Inc. 2018. p. 2236-2242. (Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017). https://doi.org/10.1109/ICCVW.2017.261
Yu, Long Long ; Simo Serra, Edgar ; Moreno-Noguer, Francesc ; Rubio, Antonio. / Multi-modal Embedding for Main Product Detection in Fashion. Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 2236-2242 (Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017).
@inproceedings{35a9d959d8d24c2daa1ed5b6a21c078d,
title = "Multi-modal Embedding for Main Product Detection in Fashion",
abstract = "We present an approach to detect the main product in fashion images by exploiting the textual metadata associated with each image. Our approach is based on a Convolutional Neural Network and learns a joint embedding of object proposals and textual metadata to predict the main product in the image. We additionally use several complementary classification and overlap losses in order to improve training stability and performance. Our tests on a large-scale dataset taken from eight e-commerce sites show that our approach outperforms strong baselines and is able to accurately detect the main product in a wide diversity of challenging fashion images.",
author = "Yu, {Long Long} and {Simo Serra}, Edgar and Francesc Moreno-Noguer and Antonio Rubio",
year = "2018",
month = "1",
day = "19",
doi = "10.1109/ICCVW.2017.261",
language = "English",
series = "Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "2236--2242",
booktitle = "Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017",

}

TY - GEN

T1 - Multi-modal Embedding for Main Product Detection in Fashion

AU - Yu, Long Long

AU - Simo Serra, Edgar

AU - Moreno-Noguer, Francesc

AU - Rubio, Antonio

PY - 2018/1/19

Y1 - 2018/1/19

N2 - We present an approach to detect the main product in fashion images by exploiting the textual metadata associated with each image. Our approach is based on a Convolutional Neural Network and learns a joint embedding of object proposals and textual metadata to predict the main product in the image. We additionally use several complementary classification and overlap losses in order to improve training stability and performance. Our tests on a large-scale dataset taken from eight e-commerce sites show that our approach outperforms strong baselines and is able to accurately detect the main product in a wide diversity of challenging fashion images.

AB - We present an approach to detect the main product in fashion images by exploiting the textual metadata associated with each image. Our approach is based on a Convolutional Neural Network and learns a joint embedding of object proposals and textual metadata to predict the main product in the image. We additionally use several complementary classification and overlap losses in order to improve training stability and performance. Our tests on a large-scale dataset taken from eight e-commerce sites show that our approach outperforms strong baselines and is able to accurately detect the main product in a wide diversity of challenging fashion images.

UR - http://www.scopus.com/inward/record.url?scp=85046277673&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046277673&partnerID=8YFLogxK

U2 - 10.1109/ICCVW.2017.261

DO - 10.1109/ICCVW.2017.261

M3 - Conference contribution

T3 - Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017

SP - 2236

EP - 2242

BT - Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -