Integrating Semantic-Space Finetuning and Self-Training for Semi-Supervised Multi-label Text Classification

Zhewei Xu*, Mizuho Iwaihara

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To meet the challenge of lack of labeled data in document classification tasks, semi-supervised learning has been studied, in which unlabeled samples are also utilized for training. Self-training is one of the iconic strategies for semi-supervised learning, in which a classifier trains itself by its own predictions. However, self-training has been mostly applied to multi-class classification, and rarely applied to the multi-label scenario. In this paper, we propose a self-training-based approach for semi-supervised multi-label document classification, in which semantic-space finetuning is introduced and integrated into the self-training process. Newly discovered credible predictions are used not only for classifier finetuning, but also for semantic-space finetuning, which further benefit label propagation for exploring more credible predictions. Experimental results confirm the effectiveness of the proposed approach and show a satisfactory improvement over the baseline methods.

Original languageEnglish
Title of host publicationTowards Open and Trustworthy Digital Societies - 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings
EditorsHao-Ren Ke, Chei Sian Lee, Kazunari Sugiyama
PublisherSpringer Science and Business Media Deutschland GmbH
Pages249-263
Number of pages15
ISBN (Print)9783030916688
DOIs
Publication statusPublished - 2021
Event23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021 - Virtual, Online
Duration: 2021 Dec 12021 Dec 3

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13133 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021
CityVirtual, Online
Period21/12/121/12/3

Keywords

  • Label propagation
  • Multi-label classification
  • Self-training
  • Semantic-space finetuning
  • Semi-supervised learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Integrating Semantic-Space Finetuning and Self-Training for Semi-Supervised Multi-label Text Classification'. Together they form a unique fingerprint.

Cite this