EPCI: Extracting potentially copyright infringement texts from the web

Takashi Tashiro, Takanori Ueda, Taisuke Hori, Yu Hirate, Hayato Yamana

研究成果: Conference contribution

3 被引用数 (Scopus)

抄録

In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set of queries based on a given copyright reserved seed-text, (2) putting every query to search engine API, (3) gathering the search result Web pages from high ranking until the similarity between the given seed-text and the search result pages becomes less than a given threshold value, and (4) merging all the gathered pages, then re-ranking them in the order of their similarity. Our experimental result using 40 seed-texts shows that EPCI is able to extract 132 potentially copyright infringement Web pages per a given copyright reserved seed-text with 94% precision in average.

本文言語English
ホスト出版物のタイトル16th International World Wide Web Conference, WWW2007
ページ1151-1152
ページ数2
DOI
出版ステータスPublished - 2007 10 22
イベント16th International World Wide Web Conference, WWW2007 - Banff, AB, Canada
継続期間: 2007 5 82007 5 12

出版物シリーズ

名前16th International World Wide Web Conference, WWW2007

Conference

Conference16th International World Wide Web Conference, WWW2007
国/地域Canada
CityBanff, AB
Period07/5/807/5/12

ASJC Scopus subject areas

  • コンピュータ ネットワークおよび通信
  • ソフトウェア

フィンガープリント

「EPCI: Extracting potentially copyright infringement texts from the web」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル