### Abstract

We consider the problem of automatically assigning a category to a given question posted to a Community Question Answering (CQA) site, where the question contains not only text but also an image. For example, CQA users may post a photograph of a dress and ask the community "Is this appropriate for a wedding?" where the appropriate category for this question might be "Manners, Ceremonial occasions." We tackle this problem using Convolutional Neural Networks with a DualNet architecture for combining the image and text representations. Our experiments with real data from Yahoo Chiebukuro and crowdsourced gold-standard categories show that the DualNet approach outperforms a text-only baseline (p = .0000), a sum-and-product baseline (p = .0000), Multimodal Compact Bilinear pooling (p = .0000), and a combination of sum-and-product and MCB (p = .0000), where the p-values are based on a randomised Tukey Honestly Significant Difference test with B = 5000 trials.

Original language | English |
---|---|

Title of host publication | ICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval |

Publisher | Association for Computing Machinery, Inc |

Pages | 219-222 |

Number of pages | 4 |

ISBN (Electronic) | 9781450356565 |

DOIs | |

Publication status | Published - 2018 Sep 10 |

Event | 8th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2018 - Tianjin, China Duration: 2018 Sep 14 → 2018 Sep 17 |

### Publication series

Name | ICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval |
---|

### Conference

Conference | 8th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2018 |
---|---|

Country | China |

City | Tianjin |

Period | 18/9/14 → 18/9/17 |

### Fingerprint

### Keywords

- community question answering
- convolutional neural networks
- question categorisation

### ASJC Scopus subject areas

- Information Systems
- Computer Science (miscellaneous)

### Cite this

*ICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval*(pp. 219-222). (ICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval). Association for Computing Machinery, Inc. https://doi.org/10.1145/3234944.3234948

**Classifying community QA questions that contain an image.** / Tamaki, Kenta; Togashi, Riku; Kato, Sosuke; Fujita, Sumio; Maeda, Hideyuki; Sakai, Tetsuya.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*ICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval.*ICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval, Association for Computing Machinery, Inc, pp. 219-222, 8th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2018, Tianjin, China, 18/9/14. https://doi.org/10.1145/3234944.3234948

}

TY - GEN

T1 - Classifying community QA questions that contain an image

AU - Tamaki, Kenta

AU - Togashi, Riku

AU - Kato, Sosuke

AU - Fujita, Sumio

AU - Maeda, Hideyuki

AU - Sakai, Tetsuya

PY - 2018/9/10

Y1 - 2018/9/10

N2 - We consider the problem of automatically assigning a category to a given question posted to a Community Question Answering (CQA) site, where the question contains not only text but also an image. For example, CQA users may post a photograph of a dress and ask the community "Is this appropriate for a wedding?" where the appropriate category for this question might be "Manners, Ceremonial occasions." We tackle this problem using Convolutional Neural Networks with a DualNet architecture for combining the image and text representations. Our experiments with real data from Yahoo Chiebukuro and crowdsourced gold-standard categories show that the DualNet approach outperforms a text-only baseline (p = .0000), a sum-and-product baseline (p = .0000), Multimodal Compact Bilinear pooling (p = .0000), and a combination of sum-and-product and MCB (p = .0000), where the p-values are based on a randomised Tukey Honestly Significant Difference test with B = 5000 trials.

AB - We consider the problem of automatically assigning a category to a given question posted to a Community Question Answering (CQA) site, where the question contains not only text but also an image. For example, CQA users may post a photograph of a dress and ask the community "Is this appropriate for a wedding?" where the appropriate category for this question might be "Manners, Ceremonial occasions." We tackle this problem using Convolutional Neural Networks with a DualNet architecture for combining the image and text representations. Our experiments with real data from Yahoo Chiebukuro and crowdsourced gold-standard categories show that the DualNet approach outperforms a text-only baseline (p = .0000), a sum-and-product baseline (p = .0000), Multimodal Compact Bilinear pooling (p = .0000), and a combination of sum-and-product and MCB (p = .0000), where the p-values are based on a randomised Tukey Honestly Significant Difference test with B = 5000 trials.

KW - community question answering

KW - convolutional neural networks

KW - question categorisation

UR - http://www.scopus.com/inward/record.url?scp=85063515836&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063515836&partnerID=8YFLogxK

U2 - 10.1145/3234944.3234948

DO - 10.1145/3234944.3234948

M3 - Conference contribution

AN - SCOPUS:85063515836

T3 - ICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval

SP - 219

EP - 222

BT - ICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval

PB - Association for Computing Machinery, Inc

ER -