P2P-Grid system provides a framework for converging Grid and peer-to-peer network to deploy large-scale distributed applications. However, working nodes with heterogeneous properties can freely join and leave in the middle of their computation. The nodes dynamic participation arbitrarily at any time according to user's decision can keep changing the topology of the network and also causing more common execution failures than in other systems. To this end, failure detection mechanisms and fault tolerance function typically as an integral part of P2P-Grid system have been well-studied. Our research aims to address the highly dynamic nature that arises in P2P-Grid systems by understanding nodes life time statistics in previous research. We are proposing a Check pointing-and-Recovery architecture for applications restarting as soon as possible on P2P-Grid systems. And failure-detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in P2P-Grid system. We also investigate how the design of various failure detection algorithms affects their performance in node average failure detection time. The evaluation shows our check pointing and restart paradigm and failure detection algorithm enables high reliability and performance with high node departure.