HMM-based attacks on Google’s ReCAPTCHA with continuous visual and audio symbols

Shotaro Sano, Takuma Otsuka, Katsutoshi Itoyama, Hiroshi G. Okuno

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

CAPTCHAs distinguish humans from automated programs by presenting questions that are easy for humans but difficult for computers, e.g., recognition of visual characters or audio utterances. The state of the art research suggests that the security of visual and audio CAPTCHAs mainly lies in anti-segmentation techniques, because individual symbol recognition after segmentation can be solved with a high success rate with certain machine learning algorithms. Thus, most recent commercial CAPTCHAs present continuous symbols to prevent automated segmentation. We propose a novel framework that can automatically decode continuous CAPTCHAs and assess its effectiveness with actual CAPTCHA questions from Google’s reCAPTCHA. Our framework is constructed on the basis of a sequence recognition method based on hidden Markov models (HMMs), which can be concisely implemented by using an offthe-shelf library HMM toolkit. This method concatenates several HMMs, each of which recognizes a symbol, to build a larger HMM that recognizes a question. Our experimental results reveal vulnerabilities in continuous CAPTCHAs because the solver cracks the visual and audio reCAPTCHA systems with 31.75% and 58.75% accuracy, respectively. We further propose guidelines to prevent possible attacking from HMM-based CAPTCHA solvers on the basis of synthetic experiments with simulated continuous CAPTCHAs.

Original languageEnglish
Pages (from-to)814-826
Number of pages13
JournalJournal of Information Processing
Volume23
Issue number6
DOIs
Publication statusPublished - 2015 Nov 15
Externally publishedYes

Fingerprint

Hidden Markov models
Audio systems
Learning algorithms
Learning systems
Cracks
Experiments

Keywords

  • CAPTCHA
  • Continuous character/speech recognition
  • Hidden markov model
  • Human interaction proof

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

HMM-based attacks on Google’s ReCAPTCHA with continuous visual and audio symbols. / Sano, Shotaro; Otsuka, Takuma; Itoyama, Katsutoshi; Okuno, Hiroshi G.

In: Journal of Information Processing, Vol. 23, No. 6, 15.11.2015, p. 814-826.

Research output: Contribution to journalArticle

Sano, Shotaro ; Otsuka, Takuma ; Itoyama, Katsutoshi ; Okuno, Hiroshi G. / HMM-based attacks on Google’s ReCAPTCHA with continuous visual and audio symbols. In: Journal of Information Processing. 2015 ; Vol. 23, No. 6. pp. 814-826.
@article{b719af31814047dcb06b97c88ebf1634,
title = "HMM-based attacks on Google’s ReCAPTCHA with continuous visual and audio symbols",
abstract = "CAPTCHAs distinguish humans from automated programs by presenting questions that are easy for humans but difficult for computers, e.g., recognition of visual characters or audio utterances. The state of the art research suggests that the security of visual and audio CAPTCHAs mainly lies in anti-segmentation techniques, because individual symbol recognition after segmentation can be solved with a high success rate with certain machine learning algorithms. Thus, most recent commercial CAPTCHAs present continuous symbols to prevent automated segmentation. We propose a novel framework that can automatically decode continuous CAPTCHAs and assess its effectiveness with actual CAPTCHA questions from Google’s reCAPTCHA. Our framework is constructed on the basis of a sequence recognition method based on hidden Markov models (HMMs), which can be concisely implemented by using an offthe-shelf library HMM toolkit. This method concatenates several HMMs, each of which recognizes a symbol, to build a larger HMM that recognizes a question. Our experimental results reveal vulnerabilities in continuous CAPTCHAs because the solver cracks the visual and audio reCAPTCHA systems with 31.75{\%} and 58.75{\%} accuracy, respectively. We further propose guidelines to prevent possible attacking from HMM-based CAPTCHA solvers on the basis of synthetic experiments with simulated continuous CAPTCHAs.",
keywords = "CAPTCHA, Continuous character/speech recognition, Hidden markov model, Human interaction proof",
author = "Shotaro Sano and Takuma Otsuka and Katsutoshi Itoyama and Okuno, {Hiroshi G.}",
year = "2015",
month = "11",
day = "15",
doi = "10.2197/ipsjjip.23.814",
language = "English",
volume = "23",
pages = "814--826",
journal = "Journal of Information Processing",
issn = "0387-5806",
publisher = "Information Processing Society of Japan",
number = "6",

}

TY - JOUR

T1 - HMM-based attacks on Google’s ReCAPTCHA with continuous visual and audio symbols

AU - Sano, Shotaro

AU - Otsuka, Takuma

AU - Itoyama, Katsutoshi

AU - Okuno, Hiroshi G.

PY - 2015/11/15

Y1 - 2015/11/15

N2 - CAPTCHAs distinguish humans from automated programs by presenting questions that are easy for humans but difficult for computers, e.g., recognition of visual characters or audio utterances. The state of the art research suggests that the security of visual and audio CAPTCHAs mainly lies in anti-segmentation techniques, because individual symbol recognition after segmentation can be solved with a high success rate with certain machine learning algorithms. Thus, most recent commercial CAPTCHAs present continuous symbols to prevent automated segmentation. We propose a novel framework that can automatically decode continuous CAPTCHAs and assess its effectiveness with actual CAPTCHA questions from Google’s reCAPTCHA. Our framework is constructed on the basis of a sequence recognition method based on hidden Markov models (HMMs), which can be concisely implemented by using an offthe-shelf library HMM toolkit. This method concatenates several HMMs, each of which recognizes a symbol, to build a larger HMM that recognizes a question. Our experimental results reveal vulnerabilities in continuous CAPTCHAs because the solver cracks the visual and audio reCAPTCHA systems with 31.75% and 58.75% accuracy, respectively. We further propose guidelines to prevent possible attacking from HMM-based CAPTCHA solvers on the basis of synthetic experiments with simulated continuous CAPTCHAs.

AB - CAPTCHAs distinguish humans from automated programs by presenting questions that are easy for humans but difficult for computers, e.g., recognition of visual characters or audio utterances. The state of the art research suggests that the security of visual and audio CAPTCHAs mainly lies in anti-segmentation techniques, because individual symbol recognition after segmentation can be solved with a high success rate with certain machine learning algorithms. Thus, most recent commercial CAPTCHAs present continuous symbols to prevent automated segmentation. We propose a novel framework that can automatically decode continuous CAPTCHAs and assess its effectiveness with actual CAPTCHA questions from Google’s reCAPTCHA. Our framework is constructed on the basis of a sequence recognition method based on hidden Markov models (HMMs), which can be concisely implemented by using an offthe-shelf library HMM toolkit. This method concatenates several HMMs, each of which recognizes a symbol, to build a larger HMM that recognizes a question. Our experimental results reveal vulnerabilities in continuous CAPTCHAs because the solver cracks the visual and audio reCAPTCHA systems with 31.75% and 58.75% accuracy, respectively. We further propose guidelines to prevent possible attacking from HMM-based CAPTCHA solvers on the basis of synthetic experiments with simulated continuous CAPTCHAs.

KW - CAPTCHA

KW - Continuous character/speech recognition

KW - Hidden markov model

KW - Human interaction proof

UR - http://www.scopus.com/inward/record.url?scp=84947276087&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947276087&partnerID=8YFLogxK

U2 - 10.2197/ipsjjip.23.814

DO - 10.2197/ipsjjip.23.814

M3 - Article

VL - 23

SP - 814

EP - 826

JO - Journal of Information Processing

JF - Journal of Information Processing

SN - 0387-5806

IS - 6

ER -