TY - GEN
T1 - Efficient privacy-preserving variable-length substring match for genome sequence
AU - Nakagawa, Yoshiki
AU - Ohata, Satsuya
AU - Shimizu, Kana
N1 - Funding Information:
Funding This work is partially supported by JST CREST grant number JPMJCR19F6. Kana Shimizu: Supported part in MEXT/JSPS KAKENHI Grant Number 19K12209 and 21H04871.
Publisher Copyright:
© Yoshiki Nakagawa, Satsuya Ohata, and Kana Shimizu; licensed under Creative Commons License CC-BY 4.0 21st International Workshop on Algorithms in Bioinformatics (WABI 2021).
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Finding a similar substring that commonly appears in query and database sequences is an essential task for genome data analysis. This study proposes a secure two-party variable-length string search protocol based on secret sharing. The unique feature of our protocol is that time, communication, and round complexities are not dependent on the database length N, after the query input. This property brings dramatic performance improvements in search time, since N is usually quite large in an actual genome database, and the same database is repeatedly used for many queries. Our concept hinges on a technique that efficiently applies the compressed full-text index (FOCS 2000) for a secret-sharing scheme. We conducted an experiment using a human genomic sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a well-designed baseline protocol under the realistic computation/network environment.
AB - Finding a similar substring that commonly appears in query and database sequences is an essential task for genome data analysis. This study proposes a secure two-party variable-length string search protocol based on secret sharing. The unique feature of our protocol is that time, communication, and round complexities are not dependent on the database length N, after the query input. This property brings dramatic performance improvements in search time, since N is usually quite large in an actual genome database, and the same database is repeatedly used for many queries. Our concept hinges on a technique that efficiently applies the compressed full-text index (FOCS 2000) for a secret-sharing scheme. We conducted an experiment using a human genomic sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a well-designed baseline protocol under the realistic computation/network environment.
KW - FM-index
KW - Maximal exact match
KW - Private genome sequence search
KW - Secret sharing
KW - Secure multiparty computation
KW - Suffix tree
UR - http://www.scopus.com/inward/record.url?scp=85115293453&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85115293453&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.WABI.2021.2
DO - 10.4230/LIPIcs.WABI.2021.2
M3 - Conference contribution
AN - SCOPUS:85115293453
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 21st International Workshop on Algorithms in Bioinformatics, WABI 2021
A2 - Carbone, Alessandra
A2 - El-Kebir, Mohammed
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 21st International Workshop on Algorithms in Bioinformatics, WABI 2021
Y2 - 2 August 2021 through 4 August 2021
ER -