Privacy-preserving search for chemical compound databases

Kana Shimizu, Koji Nuida, Hiromi Arai, Shigeo Mitsunari, Nuttapong Attrapadung, Michiaki Hamada, Koji Tsuda, Takatsugu Hirokawa, Jun Sakuma, Goichiro Hanaoka, Kiyoshi Asai

    Research output: Contribution to journalArticle

    3 Citations (Scopus)

    Abstract

    Background: Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results: In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion: We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.

    Original languageEnglish
    Article numberS6
    JournalBMC Bioinformatics
    Volume16
    Issue number18
    DOIs
    Publication statusPublished - 2015 Dec 9

    Fingerprint

    Chemical Databases
    Chemical compounds
    Privacy
    Privacy Preserving
    Databases
    Dilemma
    Query
    Multiparty Computation
    Network protocols
    Drugs
    Preclinical Drug Evaluations
    Drug Discovery
    Cryptographic Protocols
    Homomorphic
    Computer Simulation
    Cryptosystem
    CPU Time
    Pharmaceutical Preparations
    Cryptography
    Accelerate

    Keywords

    • Additive homomorphic cryptosystem
    • Chemical compound
    • Privacy preserving data mining
    • Similarity search
    • Tversky index

    ASJC Scopus subject areas

    • Applied Mathematics
    • Structural Biology
    • Biochemistry
    • Molecular Biology
    • Computer Science Applications

    Cite this

    Shimizu, K., Nuida, K., Arai, H., Mitsunari, S., Attrapadung, N., Hamada, M., ... Asai, K. (2015). Privacy-preserving search for chemical compound databases. BMC Bioinformatics, 16(18), [S6]. https://doi.org/10.1186/1471-2105-16-S18-S6

    Privacy-preserving search for chemical compound databases. / Shimizu, Kana; Nuida, Koji; Arai, Hiromi; Mitsunari, Shigeo; Attrapadung, Nuttapong; Hamada, Michiaki; Tsuda, Koji; Hirokawa, Takatsugu; Sakuma, Jun; Hanaoka, Goichiro; Asai, Kiyoshi.

    In: BMC Bioinformatics, Vol. 16, No. 18, S6, 09.12.2015.

    Research output: Contribution to journalArticle

    Shimizu, K, Nuida, K, Arai, H, Mitsunari, S, Attrapadung, N, Hamada, M, Tsuda, K, Hirokawa, T, Sakuma, J, Hanaoka, G & Asai, K 2015, 'Privacy-preserving search for chemical compound databases', BMC Bioinformatics, vol. 16, no. 18, S6. https://doi.org/10.1186/1471-2105-16-S18-S6
    Shimizu, Kana ; Nuida, Koji ; Arai, Hiromi ; Mitsunari, Shigeo ; Attrapadung, Nuttapong ; Hamada, Michiaki ; Tsuda, Koji ; Hirokawa, Takatsugu ; Sakuma, Jun ; Hanaoka, Goichiro ; Asai, Kiyoshi. / Privacy-preserving search for chemical compound databases. In: BMC Bioinformatics. 2015 ; Vol. 16, No. 18.
    @article{2cd879331d124537ab705af309988db8,
    title = "Privacy-preserving search for chemical compound databases",
    abstract = "Background: Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results: In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion: We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.",
    keywords = "Additive homomorphic cryptosystem, Chemical compound, Privacy preserving data mining, Similarity search, Tversky index",
    author = "Kana Shimizu and Koji Nuida and Hiromi Arai and Shigeo Mitsunari and Nuttapong Attrapadung and Michiaki Hamada and Koji Tsuda and Takatsugu Hirokawa and Jun Sakuma and Goichiro Hanaoka and Kiyoshi Asai",
    year = "2015",
    month = "12",
    day = "9",
    doi = "10.1186/1471-2105-16-S18-S6",
    language = "English",
    volume = "16",
    journal = "BMC Bioinformatics",
    issn = "1471-2105",
    publisher = "BioMed Central",
    number = "18",

    }

    TY - JOUR

    T1 - Privacy-preserving search for chemical compound databases

    AU - Shimizu, Kana

    AU - Nuida, Koji

    AU - Arai, Hiromi

    AU - Mitsunari, Shigeo

    AU - Attrapadung, Nuttapong

    AU - Hamada, Michiaki

    AU - Tsuda, Koji

    AU - Hirokawa, Takatsugu

    AU - Sakuma, Jun

    AU - Hanaoka, Goichiro

    AU - Asai, Kiyoshi

    PY - 2015/12/9

    Y1 - 2015/12/9

    N2 - Background: Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results: In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion: We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.

    AB - Background: Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results: In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion: We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.

    KW - Additive homomorphic cryptosystem

    KW - Chemical compound

    KW - Privacy preserving data mining

    KW - Similarity search

    KW - Tversky index

    UR - http://www.scopus.com/inward/record.url?scp=84961594119&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84961594119&partnerID=8YFLogxK

    U2 - 10.1186/1471-2105-16-S18-S6

    DO - 10.1186/1471-2105-16-S18-S6

    M3 - Article

    VL - 16

    JO - BMC Bioinformatics

    JF - BMC Bioinformatics

    SN - 1471-2105

    IS - 18

    M1 - S6

    ER -