CoDP: Predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein

Hiroko Terui, Kiwamu Akagi, Hiroshi Kawame, Kei Yura

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Background: Lynch syndrome is a hereditary cancer predisposition syndrome caused by a mutation in one of the DNA mismatch repair (MMR) genes. About 24% of the mutations identified in Lynch syndrome are missense substitutions and the frequency of missense variants in MSH6 is the highest amongst these MMR genes. Because of this high frequency, the genetic testing was not effectively used in MSH6 so far. We, therefore, developed CoDP (Combination of the Different Properties), a bioinformatics tool to predict the impact of missense variants in MSH6. Methods. We integrated the prediction results of three methods, namely MAPP, PolyPhen-2 and SIFT. Two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. MSH6 germline missense variants classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The performance of CoDP was compared with those of other conventional tools, namely MAPP, SIFT, PolyPhen-2 and PON-MMR. Results: A total of 294 germline missense variants were collected from the variant databases and literature. Of them, 34 variants were available for the parameter training and the prediction performance test. We integrated the prediction results of MAPP, PolyPhen-2 and SIFT, and two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. Variants data classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The values of the positive predictive value (PPV), the negative predictive value (NPV), sensitivity, specificity and accuracy of the tools were compared on the whole data set. PPV of CoDP was 93.3% (14/15), NPV was 94.7% (18/19), specificity was 94.7% (18/19), sensitivity was 93.3% (14/15) and accuracy was 94.1% (32/34). Area under the curve of CoDP was 0.954, that of MAPP for MSH6 was 0.919, of SIFT was 0.864 and of PolyPhen-2 HumVar was 0.819. The power to distinguish between pathogenic and non-pathogenic variants of these methods was tested by Wilcoxon rank sum test (p < 8.9 × 10&-6 for CoDP, p < 3.3 × 10&-5 for MAPP, p < 3.1 × 10&-4 for SIFT and p < 1.2 × 10&-3 for PolyPhen-2 HumVar), and CoDP was shown to outperform other conventional methods. Conclusion: In this paper, we provide a human curated data set for MSH6 missense variants, and CoDP, the prediction tool, which achieved better accuracy for predicting the impact of missense variants in MSH6 than any other known tools. CoDP is available at.

Original languageEnglish
Article number25
JournalJournal of Biomedical Science
Volume20
Issue number1
DOIs
Publication statusPublished - 2013
Externally publishedYes

Fingerprint

DNA Mismatch Repair
Logistic Models
Hereditary Nonpolyposis Colorectal Neoplasms
Nonparametric Statistics
Proteins
Repair
Hereditary Neoplastic Syndromes
Amino Acids
Mutation
Logistics
Structural properties
Genetic Testing
Genes
Computational Biology
Area Under Curve
Atoms
Databases
Bioinformatics
Sensitivity and Specificity
Substitution reactions

Keywords

  • HNPCC
  • In silico
  • Lynch syndrome
  • Mismatch repair
  • MSH6
  • Unclassified variants

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Molecular Biology
  • Cell Biology
  • Biochemistry, medical
  • Endocrinology, Diabetes and Metabolism
  • Pharmacology (medical)
  • Medicine(all)

Cite this

CoDP : Predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein. / Terui, Hiroko; Akagi, Kiwamu; Kawame, Hiroshi; Yura, Kei.

In: Journal of Biomedical Science, Vol. 20, No. 1, 25, 2013.

Research output: Contribution to journalArticle

@article{d2f17fa21d334370bcdf43dbd2e4e4a3,
title = "CoDP: Predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein",
abstract = "Background: Lynch syndrome is a hereditary cancer predisposition syndrome caused by a mutation in one of the DNA mismatch repair (MMR) genes. About 24{\%} of the mutations identified in Lynch syndrome are missense substitutions and the frequency of missense variants in MSH6 is the highest amongst these MMR genes. Because of this high frequency, the genetic testing was not effectively used in MSH6 so far. We, therefore, developed CoDP (Combination of the Different Properties), a bioinformatics tool to predict the impact of missense variants in MSH6. Methods. We integrated the prediction results of three methods, namely MAPP, PolyPhen-2 and SIFT. Two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. MSH6 germline missense variants classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The performance of CoDP was compared with those of other conventional tools, namely MAPP, SIFT, PolyPhen-2 and PON-MMR. Results: A total of 294 germline missense variants were collected from the variant databases and literature. Of them, 34 variants were available for the parameter training and the prediction performance test. We integrated the prediction results of MAPP, PolyPhen-2 and SIFT, and two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. Variants data classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The values of the positive predictive value (PPV), the negative predictive value (NPV), sensitivity, specificity and accuracy of the tools were compared on the whole data set. PPV of CoDP was 93.3{\%} (14/15), NPV was 94.7{\%} (18/19), specificity was 94.7{\%} (18/19), sensitivity was 93.3{\%} (14/15) and accuracy was 94.1{\%} (32/34). Area under the curve of CoDP was 0.954, that of MAPP for MSH6 was 0.919, of SIFT was 0.864 and of PolyPhen-2 HumVar was 0.819. The power to distinguish between pathogenic and non-pathogenic variants of these methods was tested by Wilcoxon rank sum test (p < 8.9 × 10&-6 for CoDP, p < 3.3 × 10&-5 for MAPP, p < 3.1 × 10&-4 for SIFT and p < 1.2 × 10&-3 for PolyPhen-2 HumVar), and CoDP was shown to outperform other conventional methods. Conclusion: In this paper, we provide a human curated data set for MSH6 missense variants, and CoDP, the prediction tool, which achieved better accuracy for predicting the impact of missense variants in MSH6 than any other known tools. CoDP is available at.",
keywords = "HNPCC, In silico, Lynch syndrome, Mismatch repair, MSH6, Unclassified variants",
author = "Hiroko Terui and Kiwamu Akagi and Hiroshi Kawame and Kei Yura",
year = "2013",
doi = "10.1186/1423-0127-20-25",
language = "English",
volume = "20",
journal = "Journal of Biomedical Science",
issn = "1021-7770",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - CoDP

T2 - Predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein

AU - Terui, Hiroko

AU - Akagi, Kiwamu

AU - Kawame, Hiroshi

AU - Yura, Kei

PY - 2013

Y1 - 2013

N2 - Background: Lynch syndrome is a hereditary cancer predisposition syndrome caused by a mutation in one of the DNA mismatch repair (MMR) genes. About 24% of the mutations identified in Lynch syndrome are missense substitutions and the frequency of missense variants in MSH6 is the highest amongst these MMR genes. Because of this high frequency, the genetic testing was not effectively used in MSH6 so far. We, therefore, developed CoDP (Combination of the Different Properties), a bioinformatics tool to predict the impact of missense variants in MSH6. Methods. We integrated the prediction results of three methods, namely MAPP, PolyPhen-2 and SIFT. Two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. MSH6 germline missense variants classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The performance of CoDP was compared with those of other conventional tools, namely MAPP, SIFT, PolyPhen-2 and PON-MMR. Results: A total of 294 germline missense variants were collected from the variant databases and literature. Of them, 34 variants were available for the parameter training and the prediction performance test. We integrated the prediction results of MAPP, PolyPhen-2 and SIFT, and two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. Variants data classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The values of the positive predictive value (PPV), the negative predictive value (NPV), sensitivity, specificity and accuracy of the tools were compared on the whole data set. PPV of CoDP was 93.3% (14/15), NPV was 94.7% (18/19), specificity was 94.7% (18/19), sensitivity was 93.3% (14/15) and accuracy was 94.1% (32/34). Area under the curve of CoDP was 0.954, that of MAPP for MSH6 was 0.919, of SIFT was 0.864 and of PolyPhen-2 HumVar was 0.819. The power to distinguish between pathogenic and non-pathogenic variants of these methods was tested by Wilcoxon rank sum test (p < 8.9 × 10&-6 for CoDP, p < 3.3 × 10&-5 for MAPP, p < 3.1 × 10&-4 for SIFT and p < 1.2 × 10&-3 for PolyPhen-2 HumVar), and CoDP was shown to outperform other conventional methods. Conclusion: In this paper, we provide a human curated data set for MSH6 missense variants, and CoDP, the prediction tool, which achieved better accuracy for predicting the impact of missense variants in MSH6 than any other known tools. CoDP is available at.

AB - Background: Lynch syndrome is a hereditary cancer predisposition syndrome caused by a mutation in one of the DNA mismatch repair (MMR) genes. About 24% of the mutations identified in Lynch syndrome are missense substitutions and the frequency of missense variants in MSH6 is the highest amongst these MMR genes. Because of this high frequency, the genetic testing was not effectively used in MSH6 so far. We, therefore, developed CoDP (Combination of the Different Properties), a bioinformatics tool to predict the impact of missense variants in MSH6. Methods. We integrated the prediction results of three methods, namely MAPP, PolyPhen-2 and SIFT. Two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. MSH6 germline missense variants classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The performance of CoDP was compared with those of other conventional tools, namely MAPP, SIFT, PolyPhen-2 and PON-MMR. Results: A total of 294 germline missense variants were collected from the variant databases and literature. Of them, 34 variants were available for the parameter training and the prediction performance test. We integrated the prediction results of MAPP, PolyPhen-2 and SIFT, and two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. Variants data classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The values of the positive predictive value (PPV), the negative predictive value (NPV), sensitivity, specificity and accuracy of the tools were compared on the whole data set. PPV of CoDP was 93.3% (14/15), NPV was 94.7% (18/19), specificity was 94.7% (18/19), sensitivity was 93.3% (14/15) and accuracy was 94.1% (32/34). Area under the curve of CoDP was 0.954, that of MAPP for MSH6 was 0.919, of SIFT was 0.864 and of PolyPhen-2 HumVar was 0.819. The power to distinguish between pathogenic and non-pathogenic variants of these methods was tested by Wilcoxon rank sum test (p < 8.9 × 10&-6 for CoDP, p < 3.3 × 10&-5 for MAPP, p < 3.1 × 10&-4 for SIFT and p < 1.2 × 10&-3 for PolyPhen-2 HumVar), and CoDP was shown to outperform other conventional methods. Conclusion: In this paper, we provide a human curated data set for MSH6 missense variants, and CoDP, the prediction tool, which achieved better accuracy for predicting the impact of missense variants in MSH6 than any other known tools. CoDP is available at.

KW - HNPCC

KW - In silico

KW - Lynch syndrome

KW - Mismatch repair

KW - MSH6

KW - Unclassified variants

UR - http://www.scopus.com/inward/record.url?scp=84876674516&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876674516&partnerID=8YFLogxK

U2 - 10.1186/1423-0127-20-25

DO - 10.1186/1423-0127-20-25

M3 - Article

C2 - 23621914

AN - SCOPUS:84876674516

VL - 20

JO - Journal of Biomedical Science

JF - Journal of Biomedical Science

SN - 1021-7770

IS - 1

M1 - 25

ER -