TY - JOUR
T1 - Multi-Source Domain Generalization Using Domain Attributes for Recurrent Neural Network Language Models
AU - Tawara, Naohiro
AU - Ogawa, Atsunori
AU - Iwata, Tomoharu
AU - Ashikawa, Hiroto
AU - Kobayashi, Tetsunori
AU - Ogawa, Tetsuji
N1 - Publisher Copyright:
Copyright c 2022 The Institute of Electronics, Information and Communication Engineers.
PY - 2022
Y1 - 2022
N2 - Most conventional multi-source domain adaptation techniques for recurrent neural network language models (RNNLMs) are domain-centric. In these approaches, each domain is considered independently and this makes it difficult to apply the models to completely unseen target domains that are unobservable during training. Instead, our study exploits domain attributes, which represent common knowledge among such different domains as dialects, types of wordings, styles, and topics, to achieve domain generalization that can robustly represent unseen target domains by combining the domain attributes. To achieve attribute-based domain generalization system in language modeling, we introduce domain attribute-based experts to a multi-stream RNNLM called recurrent adaptive mixture model (RADMM) instead of domain-based experts. In the proposed system, a long short-term memory is independently trained on each domain attribute as an expert model. Then by integrating the outputs from all the experts in response to the context-dependent weight of the domain attributes of the current input, we predict the subsequent words in the unseen target domain and exploit the specific knowledge of each domain attribute. To demonstrate the effectiveness of our proposed domain attributes-centric language model, we experimentally compared the proposed model with conventional domain-centric language model by using texts taken from multiple domains including different writing styles, topics, dialects, and types of wordings. The experimental results demonstrated that lower perplexity can be achieved using domain attributes.
AB - Most conventional multi-source domain adaptation techniques for recurrent neural network language models (RNNLMs) are domain-centric. In these approaches, each domain is considered independently and this makes it difficult to apply the models to completely unseen target domains that are unobservable during training. Instead, our study exploits domain attributes, which represent common knowledge among such different domains as dialects, types of wordings, styles, and topics, to achieve domain generalization that can robustly represent unseen target domains by combining the domain attributes. To achieve attribute-based domain generalization system in language modeling, we introduce domain attribute-based experts to a multi-stream RNNLM called recurrent adaptive mixture model (RADMM) instead of domain-based experts. In the proposed system, a long short-term memory is independently trained on each domain attribute as an expert model. Then by integrating the outputs from all the experts in response to the context-dependent weight of the domain attributes of the current input, we predict the subsequent words in the unseen target domain and exploit the specific knowledge of each domain attribute. To demonstrate the effectiveness of our proposed domain attributes-centric language model, we experimentally compared the proposed model with conventional domain-centric language model by using texts taken from multiple domains including different writing styles, topics, dialects, and types of wordings. The experimental results demonstrated that lower perplexity can be achieved using domain attributes.
KW - Domain attribute
KW - Domain generalization
KW - Language model
KW - Mixture-of-experts
KW - Recurrent adaptive mixture model
UR - http://www.scopus.com/inward/record.url?scp=85123433341&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123433341&partnerID=8YFLogxK
U2 - 10.1587/transinf.2021EDP7081
DO - 10.1587/transinf.2021EDP7081
M3 - Article
AN - SCOPUS:85123433341
SN - 0916-8532
VL - E105D
SP - 150
EP - 160
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
IS - 1
ER -