TY - JOUR
T1 - PSARE
T2 - A RL-Based Online Participant Selection Scheme Incorporating Area Coverage Ratio and Degree in Mobile Crowdsensing
AU - Xu, Ying
AU - Wang, Yufeng
AU - Ma, Jianhua
AU - Jin, Qun
N1 - Publisher Copyright:
© 1967-2012 IEEE.
PY - 2022/10/1
Y1 - 2022/10/1
N2 - Mobile crowdsensing (MCS) is a cost-effective paradigm for gathering real-time and location-related urban sensing data. To complete MCS tasks, MCS platform needs to exploit the trajectory of participants (vehicles or individuals, etc.) for effectively choosing participants. On one hand, the existing works usually assume that platform has possessed the abundant historical movement trajectory for participant selection, or can accurately predict the movement of participant before selection, but this assumption is impractical for many MCS applications, for some candidates have just arrived without sufficient mobility profiles, so-called trajectory from-scratch, or cold-trajectory issue. On the other hand, most of works only considers the coverage ratio of the sensing area, while some hotspots should be sensed frequently, so-called coverage degree of hotspots. To solve the issue, this paper proposes a reinforcement learning (RL) based, i.e., an improved Q-learning based online participant selection scheme to incorporate both coverage ratio and degree, PSARE. First, to solve the explosion of state-value table in traditional tabular Q-learning, an improved two-level Q-learning method is proposed to select participants in online way so as to achieve high long-term return. Specifically, in each selection round, PSARE dynamically compresses all the real participants (RPs) into several virtual participants (VPs) using the available historical trajectories of RPs, and the VP-based state-value table is constructed and constantly updated (i.e., the first level). Then, after selecting the VP through looking up the table, PARSE chooses the RP with the largest expected reward in this VP using epsilon-greedy way to balance the effect of exploration and exploitation (i.e., the second level). Moreover, the reward function is designed to measure the MCS coverage quality, including both coverage degree of hotspots and coverage ratio of target area. Thorough experiments on real-world mobility data set demonstrate that PSARE outperforms than other RL based online participant selection schemes (including deep Q-learning network) and traditional offline selection methods.
AB - Mobile crowdsensing (MCS) is a cost-effective paradigm for gathering real-time and location-related urban sensing data. To complete MCS tasks, MCS platform needs to exploit the trajectory of participants (vehicles or individuals, etc.) for effectively choosing participants. On one hand, the existing works usually assume that platform has possessed the abundant historical movement trajectory for participant selection, or can accurately predict the movement of participant before selection, but this assumption is impractical for many MCS applications, for some candidates have just arrived without sufficient mobility profiles, so-called trajectory from-scratch, or cold-trajectory issue. On the other hand, most of works only considers the coverage ratio of the sensing area, while some hotspots should be sensed frequently, so-called coverage degree of hotspots. To solve the issue, this paper proposes a reinforcement learning (RL) based, i.e., an improved Q-learning based online participant selection scheme to incorporate both coverage ratio and degree, PSARE. First, to solve the explosion of state-value table in traditional tabular Q-learning, an improved two-level Q-learning method is proposed to select participants in online way so as to achieve high long-term return. Specifically, in each selection round, PSARE dynamically compresses all the real participants (RPs) into several virtual participants (VPs) using the available historical trajectories of RPs, and the VP-based state-value table is constructed and constantly updated (i.e., the first level). Then, after selecting the VP through looking up the table, PARSE chooses the RP with the largest expected reward in this VP using epsilon-greedy way to balance the effect of exploration and exploitation (i.e., the second level). Moreover, the reward function is designed to measure the MCS coverage quality, including both coverage degree of hotspots and coverage ratio of target area. Thorough experiments on real-world mobility data set demonstrate that PSARE outperforms than other RL based online participant selection schemes (including deep Q-learning network) and traditional offline selection methods.
KW - Coverage ratio and degree
KW - mobile crowdsensing
KW - online participant selection
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85132767406&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85132767406&partnerID=8YFLogxK
U2 - 10.1109/TVT.2022.3183607
DO - 10.1109/TVT.2022.3183607
M3 - Article
AN - SCOPUS:85132767406
VL - 71
SP - 10923
EP - 10933
JO - IEEE Transactions on Vehicular Communications
JF - IEEE Transactions on Vehicular Communications
SN - 0018-9545
IS - 10
ER -