Fault tolerance in P2P-grid environments

Huan Wang*, Nakazato Hidenori

*この研究の対応する著者

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

P2P-Grid system provides a framework for converging Grid and peer-to-peer network to deploy large-scale distributed applications. However, working nodes with heterogeneous properties can freely join and leave in the middle of their computation. The nodes dynamic participation arbitrarily at any time according to user's decision can keep changing the topology of the network and also causing more common execution failures than in other systems. To this end, failure detection mechanisms and fault tolerance function typically as an integral part of P2P-Grid system have been well-studied. Our research aims to address the highly dynamic nature that arises in P2P-Grid systems by understanding nodes life time statistics in previous research. We are proposing a Check pointing-and-Recovery architecture for applications restarting as soon as possible on P2P-Grid systems. And failure-detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in P2P-Grid system. We also investigate how the design of various failure detection algorithms affects their performance in node average failure detection time. The evaluation shows our check pointing and restart paradigm and failure detection algorithm enables high reliability and performance with high node departure.

本文言語English
ホスト出版物のタイトルProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
ページ2482-2485
ページ数4
DOI
出版ステータスPublished - 2012 10月 18
イベント2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012 - Shanghai, China
継続期間: 2012 5月 212012 5月 25

出版物シリーズ

名前Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012

Conference

Conference2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
国/地域China
CityShanghai
Period12/5/2112/5/25

ASJC Scopus subject areas

  • ソフトウェア

フィンガープリント

「Fault tolerance in P2P-grid environments」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル