For many problems in machine learning fields, the data are nonlinearly distributed. One popular way to tackle this kind of data is training a local kernel machine or a mixture of several locally linear models. However, both of these approaches heavily relies on local information, such as neighbor relations of each data sample, to capture potential data distribution. In this paper, we show the non-local information is more efficient for data representation. With an implementation of a winner-take-all autoencoder, several non-local templates are trained to trace the data distribution and to represent each sample in different subspaces with a suitable weight. By training a linear model for each subspace in a divide and conquer manner, one single support vector machine can be formulated to solve nonlinear classification problems. Experimental results demonstrate that a mixture of multiple linear classifiers from non-local information performs better than or is at least competitive with state-of-the-art mixtures of locally linear models.