Production of Large Analogical Clusters from Smaller Example Seed Clusters Using Word Embeddings

Yuzhong Hong, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We introduce a method to automatically produce large analogical clusters from smaller seed clusters of representative examples. The method is based on techniques of processing and solving analogical equations in word vector space models, i.e., word embeddings. In our experiments, we use standard data sets in English which cover different relations extending from derivational morphology (like adjective–adverb, positive–comparative forms of adjectives) or inflectional morphology (like present–past forms) to encyclopedic semantics (like country–capital relations). The analogical clusters produced by our method are shown to be of reasonably good quality, as shown by comparing human judgment against automatic NDCG@n scores. In total, they contain 8.5 times as many relevant word pairs as the seed clusters.

Original languageEnglish
Title of host publicationCase-Based Reasoning Research and Development - 26th International Conference, ICCBR 2018, Proceedings
EditorsMichael T. Cox, Peter Funk, Shahina Begum
PublisherSpringer-Verlag
Pages548-562
Number of pages15
ISBN (Print)9783030010805
DOIs
Publication statusPublished - 2018 Jan 1
Event26th International Conference on Case-Based Reasoning, ICCBR 2018 - Stockholm, Sweden
Duration: 2018 Jul 92018 Jul 12

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11156 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other26th International Conference on Case-Based Reasoning, ICCBR 2018
CountrySweden
CityStockholm
Period18/7/918/7/12

Fingerprint

Seed
Vector spaces
Semantics
Vector Space Model
Processing
Experiments
Cover
Experiment
Form

Keywords

  • Analogical clusters
  • Analogy
  • Word embeddings

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Hong, Y., & Lepage, Y. (2018). Production of Large Analogical Clusters from Smaller Example Seed Clusters Using Word Embeddings. In M. T. Cox, P. Funk, & S. Begum (Eds.), Case-Based Reasoning Research and Development - 26th International Conference, ICCBR 2018, Proceedings (pp. 548-562). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11156 LNAI). Springer-Verlag. https://doi.org/10.1007/978-3-030-01081-2_36

Production of Large Analogical Clusters from Smaller Example Seed Clusters Using Word Embeddings. / Hong, Yuzhong; Lepage, Yves.

Case-Based Reasoning Research and Development - 26th International Conference, ICCBR 2018, Proceedings. ed. / Michael T. Cox; Peter Funk; Shahina Begum. Springer-Verlag, 2018. p. 548-562 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11156 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hong, Y & Lepage, Y 2018, Production of Large Analogical Clusters from Smaller Example Seed Clusters Using Word Embeddings. in MT Cox, P Funk & S Begum (eds), Case-Based Reasoning Research and Development - 26th International Conference, ICCBR 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11156 LNAI, Springer-Verlag, pp. 548-562, 26th International Conference on Case-Based Reasoning, ICCBR 2018, Stockholm, Sweden, 18/7/9. https://doi.org/10.1007/978-3-030-01081-2_36
Hong Y, Lepage Y. Production of Large Analogical Clusters from Smaller Example Seed Clusters Using Word Embeddings. In Cox MT, Funk P, Begum S, editors, Case-Based Reasoning Research and Development - 26th International Conference, ICCBR 2018, Proceedings. Springer-Verlag. 2018. p. 548-562. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-01081-2_36
Hong, Yuzhong ; Lepage, Yves. / Production of Large Analogical Clusters from Smaller Example Seed Clusters Using Word Embeddings. Case-Based Reasoning Research and Development - 26th International Conference, ICCBR 2018, Proceedings. editor / Michael T. Cox ; Peter Funk ; Shahina Begum. Springer-Verlag, 2018. pp. 548-562 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{9d8dd722b4fb4e7eb3dd594ae6a10616,
title = "Production of Large Analogical Clusters from Smaller Example Seed Clusters Using Word Embeddings",
abstract = "We introduce a method to automatically produce large analogical clusters from smaller seed clusters of representative examples. The method is based on techniques of processing and solving analogical equations in word vector space models, i.e., word embeddings. In our experiments, we use standard data sets in English which cover different relations extending from derivational morphology (like adjective–adverb, positive–comparative forms of adjectives) or inflectional morphology (like present–past forms) to encyclopedic semantics (like country–capital relations). The analogical clusters produced by our method are shown to be of reasonably good quality, as shown by comparing human judgment against automatic NDCG@n scores. In total, they contain 8.5 times as many relevant word pairs as the seed clusters.",
keywords = "Analogical clusters, Analogy, Word embeddings",
author = "Yuzhong Hong and Yves Lepage",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-030-01081-2_36",
language = "English",
isbn = "9783030010805",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag",
pages = "548--562",
editor = "Cox, {Michael T.} and Peter Funk and Shahina Begum",
booktitle = "Case-Based Reasoning Research and Development - 26th International Conference, ICCBR 2018, Proceedings",

}

TY - GEN

T1 - Production of Large Analogical Clusters from Smaller Example Seed Clusters Using Word Embeddings

AU - Hong, Yuzhong

AU - Lepage, Yves

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We introduce a method to automatically produce large analogical clusters from smaller seed clusters of representative examples. The method is based on techniques of processing and solving analogical equations in word vector space models, i.e., word embeddings. In our experiments, we use standard data sets in English which cover different relations extending from derivational morphology (like adjective–adverb, positive–comparative forms of adjectives) or inflectional morphology (like present–past forms) to encyclopedic semantics (like country–capital relations). The analogical clusters produced by our method are shown to be of reasonably good quality, as shown by comparing human judgment against automatic NDCG@n scores. In total, they contain 8.5 times as many relevant word pairs as the seed clusters.

AB - We introduce a method to automatically produce large analogical clusters from smaller seed clusters of representative examples. The method is based on techniques of processing and solving analogical equations in word vector space models, i.e., word embeddings. In our experiments, we use standard data sets in English which cover different relations extending from derivational morphology (like adjective–adverb, positive–comparative forms of adjectives) or inflectional morphology (like present–past forms) to encyclopedic semantics (like country–capital relations). The analogical clusters produced by our method are shown to be of reasonably good quality, as shown by comparing human judgment against automatic NDCG@n scores. In total, they contain 8.5 times as many relevant word pairs as the seed clusters.

KW - Analogical clusters

KW - Analogy

KW - Word embeddings

UR - http://www.scopus.com/inward/record.url?scp=85055683619&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055683619&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-01081-2_36

DO - 10.1007/978-3-030-01081-2_36

M3 - Conference contribution

SN - 9783030010805

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 548

EP - 562

BT - Case-Based Reasoning Research and Development - 26th International Conference, ICCBR 2018, Proceedings

A2 - Cox, Michael T.

A2 - Funk, Peter

A2 - Begum, Shahina

PB - Springer-Verlag

ER -