Product image search is required to deal with large target image datasets which are frequently updated, and therefore it is not always practical to maintain exhaustive and up-to-date relevance assessments for tuning and evaluating the search engine. Moreover, in similar product image search where the query is also an image, it is difficult to identify the possible search intents behind it and thereby verbalise the relevance criteria for the assessors, especially if graded relevance assessments are required. In this study, we focus on similar product image search within a given product category (e.g., shoes), wherein each image is iconic (i.e., the image clearly shows what the product looks like and basically nothing else), and propose an initial approach to evaluating the task without relying on manual relevance assessments. More specifically, we build a simple probabilistic model that assumes that an image is generated from latent intents representing shape, texture, and colour, which enables us to estimate the relevance score of each image and thereby compute graded relevance measures for any image search engine result page. Through large-scale crowdsourcing experiments, we demonstrate that our proposed measures, InDCG (which is based on per-intent binary relevance) and D-InDCG (which is based on per-intent graded relevance), align reasonably well with human SERP preferences and with human image preferences. Hence, our automatic measures may be useful at least for rough tuning and evaluation of similar product image search.