Secure Wavelet Matrix

Alphabet-Friendly Privacy-Preserving String Search for Bioinformatics

Hiroki Sudo, Masanobu Jimbo, Koji Nuida, Kana Shimizu

Research output: Contribution to journalArticle

Abstract

Biomedical data often includes personal information, and the technology is demanded that enables to search such a sensitive data while protecting privacy. We consider a case in which a server has a text database and a user searches the database to find substring matches. The user wants to conceal his/her query and the server wants to conceal the database except for the search results. The previous approach for this problem is based on a linear-time algorithm in terms of alphabet size <formula><tex>$|\Sigma|$</tex></formula>, and it cannot search on the database of large alphabet such as biomedical documents.We present a novel algorithm that can search a string in logarithmic time of <formula><tex>$|\Sigma|$</tex></formula>. In our algorithm, named secure wavelet matrix (sWM), we use an additively homomorphic encryption to build an efficient data structure called a wavelet matrix.In an experiment using a simulated string of length 10,000 whose alphabet size ranges from 4 to 1024, the run time of the sWM was up to around two orders of magnitude faster than that of the previous method.sWM enables to search a private database efficiently and thus it will facilitate utilizing sensitive biomedical information.

Original languageEnglish
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
DOIs
Publication statusAccepted/In press - 2018 Mar 8
Externally publishedYes

Fingerprint

Privacy
Privacy Preserving
Bioinformatics
Computational Biology
Wavelets
Strings
Databases
Servers
Server
Homomorphic Encryption
Cryptography
Data structures
Linear-time Algorithm
Data Structures
Logarithmic
Technology
Query
Range of data
Experiments
Experiment

Keywords

  • Complexity theory
  • Data structures
  • FM-index
  • Homomorphic Encryption
  • Indexes
  • Privacy
  • Protocols
  • Search problems
  • Servers
  • String Search
  • Wavelet Matrix

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Cite this

Secure Wavelet Matrix : Alphabet-Friendly Privacy-Preserving String Search for Bioinformatics. / Sudo, Hiroki; Jimbo, Masanobu; Nuida, Koji; Shimizu, Kana.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, 08.03.2018.

Research output: Contribution to journalArticle

@article{eebac7bd4f9747b1a758c04c9c9a3010,
title = "Secure Wavelet Matrix: Alphabet-Friendly Privacy-Preserving String Search for Bioinformatics",
abstract = "Biomedical data often includes personal information, and the technology is demanded that enables to search such a sensitive data while protecting privacy. We consider a case in which a server has a text database and a user searches the database to find substring matches. The user wants to conceal his/her query and the server wants to conceal the database except for the search results. The previous approach for this problem is based on a linear-time algorithm in terms of alphabet size $|\Sigma|$, and it cannot search on the database of large alphabet such as biomedical documents.We present a novel algorithm that can search a string in logarithmic time of $|\Sigma|$. In our algorithm, named secure wavelet matrix (sWM), we use an additively homomorphic encryption to build an efficient data structure called a wavelet matrix.In an experiment using a simulated string of length 10,000 whose alphabet size ranges from 4 to 1024, the run time of the sWM was up to around two orders of magnitude faster than that of the previous method.sWM enables to search a private database efficiently and thus it will facilitate utilizing sensitive biomedical information.",
keywords = "Complexity theory, Data structures, FM-index, Homomorphic Encryption, Indexes, Privacy, Protocols, Search problems, Servers, String Search, Wavelet Matrix",
author = "Hiroki Sudo and Masanobu Jimbo and Koji Nuida and Kana Shimizu",
year = "2018",
month = "3",
day = "8",
doi = "10.1109/TCBB.2018.2814039",
language = "English",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Secure Wavelet Matrix

T2 - Alphabet-Friendly Privacy-Preserving String Search for Bioinformatics

AU - Sudo, Hiroki

AU - Jimbo, Masanobu

AU - Nuida, Koji

AU - Shimizu, Kana

PY - 2018/3/8

Y1 - 2018/3/8

N2 - Biomedical data often includes personal information, and the technology is demanded that enables to search such a sensitive data while protecting privacy. We consider a case in which a server has a text database and a user searches the database to find substring matches. The user wants to conceal his/her query and the server wants to conceal the database except for the search results. The previous approach for this problem is based on a linear-time algorithm in terms of alphabet size $|\Sigma|$, and it cannot search on the database of large alphabet such as biomedical documents.We present a novel algorithm that can search a string in logarithmic time of $|\Sigma|$. In our algorithm, named secure wavelet matrix (sWM), we use an additively homomorphic encryption to build an efficient data structure called a wavelet matrix.In an experiment using a simulated string of length 10,000 whose alphabet size ranges from 4 to 1024, the run time of the sWM was up to around two orders of magnitude faster than that of the previous method.sWM enables to search a private database efficiently and thus it will facilitate utilizing sensitive biomedical information.

AB - Biomedical data often includes personal information, and the technology is demanded that enables to search such a sensitive data while protecting privacy. We consider a case in which a server has a text database and a user searches the database to find substring matches. The user wants to conceal his/her query and the server wants to conceal the database except for the search results. The previous approach for this problem is based on a linear-time algorithm in terms of alphabet size $|\Sigma|$, and it cannot search on the database of large alphabet such as biomedical documents.We present a novel algorithm that can search a string in logarithmic time of $|\Sigma|$. In our algorithm, named secure wavelet matrix (sWM), we use an additively homomorphic encryption to build an efficient data structure called a wavelet matrix.In an experiment using a simulated string of length 10,000 whose alphabet size ranges from 4 to 1024, the run time of the sWM was up to around two orders of magnitude faster than that of the previous method.sWM enables to search a private database efficiently and thus it will facilitate utilizing sensitive biomedical information.

KW - Complexity theory

KW - Data structures

KW - FM-index

KW - Homomorphic Encryption

KW - Indexes

KW - Privacy

KW - Protocols

KW - Search problems

KW - Servers

KW - String Search

KW - Wavelet Matrix

UR - http://www.scopus.com/inward/record.url?scp=85043457399&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85043457399&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2018.2814039

DO - 10.1109/TCBB.2018.2814039

M3 - Article

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

ER -