Efficient privacy-preserving string search and an application in genomics

Kana Shimizu, Koji Nuida, Gunnar Rätsch

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. Approach: We propose a novel approach that combines efficient string data structures such as the Burrows-Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows-Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. Results: We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within ≈ 4.6 s and ≈ 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. Availability and implementation: https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec. Supplementary information: Supplementary data are available at Bioinformatics online.

Original languageEnglish
Pages (from-to)1652-1661
Number of pages10
JournalBioinformatics
Volume32
Issue number11
DOIs
Publication statusPublished - 2016 Jun 1
Externally publishedYes

Fingerprint

Privacy
Privacy Preserving
Genomics
Genome
Genes
Strings
Query
Servers
Server
Burrows-Wheeler Transform
Homomorphic Encryption
Databases
Cryptography
Single Nucleotide Polymorphism
Alleles
Oblivious Transfer
Laptop computers
Haploidy
Bioinformatics
Glossaries

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability

Cite this

Efficient privacy-preserving string search and an application in genomics. / Shimizu, Kana; Nuida, Koji; Rätsch, Gunnar.

In: Bioinformatics, Vol. 32, No. 11, 01.06.2016, p. 1652-1661.

Research output: Contribution to journalArticle

Shimizu, Kana ; Nuida, Koji ; Rätsch, Gunnar. / Efficient privacy-preserving string search and an application in genomics. In: Bioinformatics. 2016 ; Vol. 32, No. 11. pp. 1652-1661.
@article{c5f8e26871584670963ad4827e6a475f,
title = "Efficient privacy-preserving string search and an application in genomics",
abstract = "Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. Approach: We propose a novel approach that combines efficient string data structures such as the Burrows-Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows-Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. Results: We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within ≈ 4.6 s and ≈ 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. Availability and implementation: https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec. Supplementary information: Supplementary data are available at Bioinformatics online.",
author = "Kana Shimizu and Koji Nuida and Gunnar R{\"a}tsch",
year = "2016",
month = "6",
day = "1",
doi = "10.1093/bioinformatics/btw050",
language = "English",
volume = "32",
pages = "1652--1661",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "11",

}

TY - JOUR

T1 - Efficient privacy-preserving string search and an application in genomics

AU - Shimizu, Kana

AU - Nuida, Koji

AU - Rätsch, Gunnar

PY - 2016/6/1

Y1 - 2016/6/1

N2 - Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. Approach: We propose a novel approach that combines efficient string data structures such as the Burrows-Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows-Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. Results: We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within ≈ 4.6 s and ≈ 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. Availability and implementation: https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec. Supplementary information: Supplementary data are available at Bioinformatics online.

AB - Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. Approach: We propose a novel approach that combines efficient string data structures such as the Burrows-Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows-Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. Results: We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within ≈ 4.6 s and ≈ 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. Availability and implementation: https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec. Supplementary information: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=84973376088&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973376088&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btw050

DO - 10.1093/bioinformatics/btw050

M3 - Article

C2 - 27153731

AN - SCOPUS:84973376088

VL - 32

SP - 1652

EP - 1661

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 11

ER -