The a-divergence is utilized to derive a generalized expectation and maximization algorithm (EM algorithm). This algorithm has a wide range of applications. In this paper, neural network learning for mixture probabilities is focused. The a-EM algorithm includes the existing EM algorithm as a special case since that corresponds to a = -1. The parameter a specifies a probability weight for the learning. This number affects learning speed and local optimality. In the discussions of update equations of neural nets, extensions of basic statistics such as Fisher's efficient score, his measure of information and Cramdr-Rao's inequality are also given. Besides, this paper unveils another new idea. It is found that the cyclic EM structure can be used as a building block to generate a learning systolic array. Attaching monitors to this systolic array makes it possible to create a functionally distributed learning system.