Computer vision applications are rapidly gaining popularity in embedded systems, which typically involve a difficult trade-off between vision performance and energy consumption under a constraint of real-time processing throughput. Recently, hardware (FPGA and ASIC-based) implementations have emerged that significantly improve the energy efficiency of vision computation. These implementations, however, often involve intensive memory traffic that retains a significant portion of energy consumption at the system level. To address this issue, we present a lossy embedded compression framework to exploit the trade-off between vision performance and memory traffic for input images. Differential pulse-code modulation-based gradient-oriented quantization is developed as the lossy compression algorithm. We also present its hardware design that supports up to 12-scale 1080p@60fps real-time processing. For histogram of oriented gradient-based deformable part models on VOC2007, the proposed framework achieved a 49.6%-60.5% memory traffic reduction at a detection rate degradation of 0.05%-0.34%. For AlexNet on ImageNet, memory traffic reduction achieved up to 60.8% with less than 0.61% classification rate degradation.