With the development of deep learning, zero-shot learning (ZSL) issues deserve more attention. Due to the problems of projection domain shift and discriminative feature extraction, we propose an end-to-end framework, which is different from traditional ZSL methods in the following two aspects: (1) we use a cascaded network to automatically locate discriminative regions, which can better extract latent features and contribute to the representation of key semantic attributes. (2) our framework achieves mapping in visual-semantic embedding space and calculation procedure of the dot product in deep learning framework. In addition, a joint loss function is designed for the regularization constraint of the whole method and achieves supervised learning, which enhances generalization ability in test set. In this paper, we make some experiments on Animals with Attributes 2 (AwA2), Caltech-UCSD Birds 200-2011 (CUB) and SUN datasets, which achieves better results compared to the state-of-the-art methods.