This paper presents an interactive face retrieval framework for clarifying an image representation envisioned by a user. Our system is designed for a situation in which the user wishes to find a person but has only visual memory of the person. We address a critical challenge of image retrieval across the user's inputs. Instead of target-specific information, the user can select several images (or a single image) that are similar to an impression of the target person the user wishes to search for. Based on the user's selection, our proposed system automatically updates a deep convolutional neural network. By interactively repeating these process (human-in-the-loop optimization), the system can reduce the gap between human-based similarities and computer-based similarities and estimate the target image representation. We ran user studies with 10 subjects on a public database and confirmed that the proposed framework is effective for clarifying the image representation envisioned by the user easily and quickly.