The sparse coding method is formulated as an information theoretic optimization problem. The rate distortion theory leads to an objective functional which can be interpreted as an information theoretic formulation of the sparse coding. Viewing as an entropy minimization problem, the rate distortion theory and consequently the sparse coding are extended to discriminative variants. As a concrete example of this information theoretic sparse coding, a discriminative non-linear sparse coding algorithm with neural networks is proposed. Experimental results of gender classification by face images show that the discriminative sparse coding is more robust to noise, compared to the conventional method which directly uses images as inputs to a linear support vector machine.