This paper addresses the rhythmic reference in physical humanrobot interaction. Human refers to a rhythm from multiple sensing modalities when turning a rope with another human synchronously. This study verifies a hypothesis that some humans mix several rhythms of the modalities into a rhythm (rhythmic reference). Six participants, four males and two females, 21-23 years old, took part in eight experiments which examined the hypothesis. In each experiment, we masked the perception of each participant using eight combination of three kinds of masks, an eye-mask, headphones, and a force mask. Each participant interacted with an operator that turned a rope with a constant frequency. As a result of the experiments, a participant increased the controlling error as the number of masks was increased regardless the types of masked modalities. The result strongly supported our hypothesis.