Newsmap: A semi-supervised approach to geographical news classification

Kohei Watanabe

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

This paper presents the results of an evaluation of three different types of geographical news classification methods: (1) simple keyword matching, a popular method in media and communications research; (2) geographical information extraction systems equipped with named-entity recognition and place name disambiguation mechanisms (Open Calais and Geoparser.io); and (3) a semi-supervised machine learning classifier developed by the author (Newsmap). Newsmap substitutes manual coding of news stories with dictionary-based labelling in the creation of large training sets to extract large numbers of geographical words without human involvement and it also identifies multi-word names to reduce the ambiguity of the geographical traits fully automatically. The evaluation of classification accuracy of the three types of methods against 5000 human-coded news summaries reveals that Newsmap outperforms the geographical information extraction systems in overall accuracy, while the simple keyword matching suffers from ambiguity of place names in countries with ambiguous place names.

Original languageEnglish
Pages (from-to)294-309
Number of pages16
JournalDigital Journalism
Volume6
Issue number3
DOIs
Publication statusPublished - 2018 Mar 16
Externally publishedYes

Fingerprint

news
communication research
Glossaries
Labeling
Learning systems
Classifiers
Communication
evaluation
dictionary
coding
learning

Keywords

  • content analysis
  • digital methods
  • geographical classification
  • international news
  • machine learning
  • news flow

ASJC Scopus subject areas

  • Communication

Cite this

Newsmap : A semi-supervised approach to geographical news classification. / Watanabe, Kohei.

In: Digital Journalism, Vol. 6, No. 3, 16.03.2018, p. 294-309.

Research output: Contribution to journalArticle

Watanabe, Kohei. / Newsmap : A semi-supervised approach to geographical news classification. In: Digital Journalism. 2018 ; Vol. 6, No. 3. pp. 294-309.
@article{7a651139a43c4fbb8165dea8b7a87b27,
title = "Newsmap: A semi-supervised approach to geographical news classification",
abstract = "This paper presents the results of an evaluation of three different types of geographical news classification methods: (1) simple keyword matching, a popular method in media and communications research; (2) geographical information extraction systems equipped with named-entity recognition and place name disambiguation mechanisms (Open Calais and Geoparser.io); and (3) a semi-supervised machine learning classifier developed by the author (Newsmap). Newsmap substitutes manual coding of news stories with dictionary-based labelling in the creation of large training sets to extract large numbers of geographical words without human involvement and it also identifies multi-word names to reduce the ambiguity of the geographical traits fully automatically. The evaluation of classification accuracy of the three types of methods against 5000 human-coded news summaries reveals that Newsmap outperforms the geographical information extraction systems in overall accuracy, while the simple keyword matching suffers from ambiguity of place names in countries with ambiguous place names.",
keywords = "content analysis, digital methods, geographical classification, international news, machine learning, news flow",
author = "Kohei Watanabe",
year = "2018",
month = "3",
day = "16",
doi = "10.1080/21670811.2017.1293487",
language = "English",
volume = "6",
pages = "294--309",
journal = "Digital Journalism",
issn = "2167-0811",
publisher = "Taylor and Francis Ltd.",
number = "3",

}

TY - JOUR

T1 - Newsmap

T2 - A semi-supervised approach to geographical news classification

AU - Watanabe, Kohei

PY - 2018/3/16

Y1 - 2018/3/16

N2 - This paper presents the results of an evaluation of three different types of geographical news classification methods: (1) simple keyword matching, a popular method in media and communications research; (2) geographical information extraction systems equipped with named-entity recognition and place name disambiguation mechanisms (Open Calais and Geoparser.io); and (3) a semi-supervised machine learning classifier developed by the author (Newsmap). Newsmap substitutes manual coding of news stories with dictionary-based labelling in the creation of large training sets to extract large numbers of geographical words without human involvement and it also identifies multi-word names to reduce the ambiguity of the geographical traits fully automatically. The evaluation of classification accuracy of the three types of methods against 5000 human-coded news summaries reveals that Newsmap outperforms the geographical information extraction systems in overall accuracy, while the simple keyword matching suffers from ambiguity of place names in countries with ambiguous place names.

AB - This paper presents the results of an evaluation of three different types of geographical news classification methods: (1) simple keyword matching, a popular method in media and communications research; (2) geographical information extraction systems equipped with named-entity recognition and place name disambiguation mechanisms (Open Calais and Geoparser.io); and (3) a semi-supervised machine learning classifier developed by the author (Newsmap). Newsmap substitutes manual coding of news stories with dictionary-based labelling in the creation of large training sets to extract large numbers of geographical words without human involvement and it also identifies multi-word names to reduce the ambiguity of the geographical traits fully automatically. The evaluation of classification accuracy of the three types of methods against 5000 human-coded news summaries reveals that Newsmap outperforms the geographical information extraction systems in overall accuracy, while the simple keyword matching suffers from ambiguity of place names in countries with ambiguous place names.

KW - content analysis

KW - digital methods

KW - geographical classification

KW - international news

KW - machine learning

KW - news flow

UR - http://www.scopus.com/inward/record.url?scp=85014450025&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014450025&partnerID=8YFLogxK

U2 - 10.1080/21670811.2017.1293487

DO - 10.1080/21670811.2017.1293487

M3 - Article

AN - SCOPUS:85014450025

VL - 6

SP - 294

EP - 309

JO - Digital Journalism

JF - Digital Journalism

SN - 2167-0811

IS - 3

ER -