Statistical haplotype inference is an indispensable technique in the field of medical science. The method usually has two steps: inference of haplotype frequencies and inference of diplotype for each subject. The first step can be done by using the expectation-maximization (EM) algorithm, but it incurs an unreasonably large calculation cost when the number of single-nucleotide polymorphism (SNP) loci of concern is large. In this article, we describe an approximate probabilistic model of haplotype frequencies. The model is constructed by using several distributions of nearby local SNPs. This approximation seems good because SNPs are generally more strongly correlated when they are close to one another on a chromosome. To implement this approach, we use a log linear model, the Walsh-Hadamard transform, and a combinatorial optimization method. Artificial data suggested that the overall haplotype inference of our method is good if there are nine or more local consecutive SNPs. Some minor problems should be dealt with before this method can be applied to real data.
ASJC Scopus subject areas