While batch evaluation plays a central part in Information Retrieval (IR) research, most evaluation metrics are based on user models which mainly focus on browsing and clicking behaviors. As users' perceived satisfaction may also be impacted by their search intent, constructing different user models across various search intent may help design better evaluation metrics. However, user intents are usually unobservable in practice. As query reformulating behaviors may reflect their search intents to a certain extent and highly correlate with users' perceived satisfaction for a specific query, these observable factors may be beneficial for the design of evaluation metrics. How to incorporate the search intent behind query reformulation into user behavior and satisfaction models remains under-investigated. To investigate the relationships among query reformulations, search intent, and user satisfaction, we explore a publicly available web search dataset and find that query reformulations can be a good proxy for inferring user intent, and therefore, reformulating actions may be beneficial for designing better web search effectiveness metrics. A group of Reformulation-Aware Metrics (RAMs) is then proposed to improve existing click model-based metrics. Experimental results on two public session datasets have shown that RAMs have significantly higher correlations with user satisfaction than existing evaluation metrics. In the robustness test, we have found that RAMs can achieve good performance when only a small proportion of satisfaction training labels are available. We further show that RAMs can be directly applied in a new dataset for offline evaluation once trained. This work shows the possibility of designing better evaluation metrics by incorporating fine-grained search context factors.