TY - JOUR
T1 - STED-Net
T2 - Self-taught encoder-decoder network for unsupervised feature representation
AU - Du, Songlin
AU - Ikenaga, Takeshi
N1 - Funding Information:
This work was jointly supported by the Waseda University Grant for Special Research Projects under grants 2020C-657 and 2020R-040, the National Natural Science Foundation of China under grant 62001110, the Natural Science Foundation of Jiangsu Province under grant SBK2020041044, and the Fundamental Research Funds for the Central Universities under grant 2242020R10054.
Publisher Copyright:
© 2020, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2021/1
Y1 - 2021/1
N2 - Compared with the great successes achieved by supervised learning, e.g. convolutional neural network (CNN), unsupervised feature learning is still a highly-challenging task suffering from no training labels. Because of no training labels for reference, blindly reducing the gap between features and image semantics is the most challenging problem. This paper proposes a Self-Taught Encoder-Decoder Network (STED-Net), which consists of a representation sub-network and a classification sub-network, for unsupervised feature learning. On one hand, the representation sub-network maps images to feature representation. On the other hand, using the features generated by representation sub-network, classification sub-network simultaneously maps feature representation to class representation and estimates pseudo labels by clustering feature representation. By minimizing the distance between class representation and the estimated pseudo labels, STED-Net teaches the features to represent class information. Through the self-taught feature representation, the gap between features and image semantics is reduced, and the features are promoted to be more and more “class-aware”. The whole learning process of the STED-Net does not refer to any ground-truth class labels. Experimental results on widely-used image classification datasets prove that STED-Net achieves state-of-the-art classification performance compared with existing supervised and unsupervised feature learning models.
AB - Compared with the great successes achieved by supervised learning, e.g. convolutional neural network (CNN), unsupervised feature learning is still a highly-challenging task suffering from no training labels. Because of no training labels for reference, blindly reducing the gap between features and image semantics is the most challenging problem. This paper proposes a Self-Taught Encoder-Decoder Network (STED-Net), which consists of a representation sub-network and a classification sub-network, for unsupervised feature learning. On one hand, the representation sub-network maps images to feature representation. On the other hand, using the features generated by representation sub-network, classification sub-network simultaneously maps feature representation to class representation and estimates pseudo labels by clustering feature representation. By minimizing the distance between class representation and the estimated pseudo labels, STED-Net teaches the features to represent class information. Through the self-taught feature representation, the gap between features and image semantics is reduced, and the features are promoted to be more and more “class-aware”. The whole learning process of the STED-Net does not refer to any ground-truth class labels. Experimental results on widely-used image classification datasets prove that STED-Net achieves state-of-the-art classification performance compared with existing supervised and unsupervised feature learning models.
KW - Autoencoder
KW - Feature representation
KW - Self-taught learning
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85091734483&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091734483&partnerID=8YFLogxK
U2 - 10.1007/s11042-020-09734-4
DO - 10.1007/s11042-020-09734-4
M3 - Article
AN - SCOPUS:85091734483
SN - 1380-7501
VL - 80
SP - 4673
EP - 4691
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 3
ER -