EPCI: Extracting potentially copyright infringement texts from the web

Takashi Tashiro, Takanori Ueda, Taisuke Hori, Yu Hirate, Hayato Yamana

研究成果: Conference contribution

3 引用 (Scopus)

抜粋

In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set of queries based on a given copyright reserved seed-text, (2) putting every query to search engine API, (3) gathering the search result Web pages from high ranking until the similarity between the given seed-text and the search result pages becomes less than a given threshold value, and (4) merging all the gathered pages, then re-ranking them in the order of their similarity. Our experimental result using 40 seed-texts shows that EPCI is able to extract 132 potentially copyright infringement Web pages per a given copyright reserved seed-text with 94% precision in average.

元の言語English
ホスト出版物のタイトル16th International World Wide Web Conference, WWW2007
ページ1151-1152
ページ数2
DOI
出版物ステータスPublished - 2007 10 22
イベント16th International World Wide Web Conference, WWW2007 - Banff, AB, Canada
継続期間: 2007 5 82007 5 12

出版物シリーズ

名前16th International World Wide Web Conference, WWW2007

Conference

Conference16th International World Wide Web Conference, WWW2007
Canada
Banff, AB
期間07/5/807/5/12

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

フィンガープリント EPCI: Extracting potentially copyright infringement texts from the web' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Tashiro, T., Ueda, T., Hori, T., Hirate, Y., & Yamana, H. (2007). EPCI: Extracting potentially copyright infringement texts from the web. : 16th International World Wide Web Conference, WWW2007 (pp. 1151-1152). (16th International World Wide Web Conference, WWW2007). https://doi.org/10.1145/1242572.1242740