Fault tolerance in P2P-grid environments

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    P2P-Grid system provides a framework for converging Grid and peer-to-peer network to deploy large-scale distributed applications. However, working nodes with heterogeneous properties can freely join and leave in the middle of their computation. The nodes dynamic participation arbitrarily at any time according to user's decision can keep changing the topology of the network and also causing more common execution failures than in other systems. To this end, failure detection mechanisms and fault tolerance function typically as an integral part of P2P-Grid system have been well-studied. Our research aims to address the highly dynamic nature that arises in P2P-Grid systems by understanding nodes life time statistics in previous research. We are proposing a Check pointing-and-Recovery architecture for applications restarting as soon as possible on P2P-Grid systems. And failure-detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in P2P-Grid system. We also investigate how the design of various failure detection algorithms affects their performance in node average failure detection time. The evaluation shows our check pointing and restart paradigm and failure detection algorithm enables high reliability and performance with high node departure.

    Original languageEnglish
    Title of host publicationProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
    Pages2482-2485
    Number of pages4
    DOIs
    Publication statusPublished - 2012
    Event2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012 - Shanghai
    Duration: 2012 May 212012 May 25

    Other

    Other2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
    CityShanghai
    Period12/5/2112/5/25

    Fingerprint

    Fault tolerance
    Recovery
    Peer to peer networks
    Topology
    Statistics

    Keywords

    • failure detection
    • failure recovery
    • fault tolerance
    • P2P-Grid

    ASJC Scopus subject areas

    • Software

    Cite this

    Wang, H., & Nakazato, H. (2012). Fault tolerance in P2P-grid environments. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012 (pp. 2482-2485). [6270874] https://doi.org/10.1109/IPDPSW.2012.308

    Fault tolerance in P2P-grid environments. / Wang, Huan; Nakazato, Hidenori.

    Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012. 2012. p. 2482-2485 6270874.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Wang, H & Nakazato, H 2012, Fault tolerance in P2P-grid environments. in Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012., 6270874, pp. 2482-2485, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012, Shanghai, 12/5/21. https://doi.org/10.1109/IPDPSW.2012.308
    Wang H, Nakazato H. Fault tolerance in P2P-grid environments. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012. 2012. p. 2482-2485. 6270874 https://doi.org/10.1109/IPDPSW.2012.308
    Wang, Huan ; Nakazato, Hidenori. / Fault tolerance in P2P-grid environments. Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012. 2012. pp. 2482-2485
    @inproceedings{984d6e9e1f294cbb85ae4e9914313a31,
    title = "Fault tolerance in P2P-grid environments",
    abstract = "P2P-Grid system provides a framework for converging Grid and peer-to-peer network to deploy large-scale distributed applications. However, working nodes with heterogeneous properties can freely join and leave in the middle of their computation. The nodes dynamic participation arbitrarily at any time according to user's decision can keep changing the topology of the network and also causing more common execution failures than in other systems. To this end, failure detection mechanisms and fault tolerance function typically as an integral part of P2P-Grid system have been well-studied. Our research aims to address the highly dynamic nature that arises in P2P-Grid systems by understanding nodes life time statistics in previous research. We are proposing a Check pointing-and-Recovery architecture for applications restarting as soon as possible on P2P-Grid systems. And failure-detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in P2P-Grid system. We also investigate how the design of various failure detection algorithms affects their performance in node average failure detection time. The evaluation shows our check pointing and restart paradigm and failure detection algorithm enables high reliability and performance with high node departure.",
    keywords = "failure detection, failure recovery, fault tolerance, P2P-Grid",
    author = "Huan Wang and Hidenori Nakazato",
    year = "2012",
    doi = "10.1109/IPDPSW.2012.308",
    language = "English",
    isbn = "9780769546766",
    pages = "2482--2485",
    booktitle = "Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012",

    }

    TY - GEN

    T1 - Fault tolerance in P2P-grid environments

    AU - Wang, Huan

    AU - Nakazato, Hidenori

    PY - 2012

    Y1 - 2012

    N2 - P2P-Grid system provides a framework for converging Grid and peer-to-peer network to deploy large-scale distributed applications. However, working nodes with heterogeneous properties can freely join and leave in the middle of their computation. The nodes dynamic participation arbitrarily at any time according to user's decision can keep changing the topology of the network and also causing more common execution failures than in other systems. To this end, failure detection mechanisms and fault tolerance function typically as an integral part of P2P-Grid system have been well-studied. Our research aims to address the highly dynamic nature that arises in P2P-Grid systems by understanding nodes life time statistics in previous research. We are proposing a Check pointing-and-Recovery architecture for applications restarting as soon as possible on P2P-Grid systems. And failure-detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in P2P-Grid system. We also investigate how the design of various failure detection algorithms affects their performance in node average failure detection time. The evaluation shows our check pointing and restart paradigm and failure detection algorithm enables high reliability and performance with high node departure.

    AB - P2P-Grid system provides a framework for converging Grid and peer-to-peer network to deploy large-scale distributed applications. However, working nodes with heterogeneous properties can freely join and leave in the middle of their computation. The nodes dynamic participation arbitrarily at any time according to user's decision can keep changing the topology of the network and also causing more common execution failures than in other systems. To this end, failure detection mechanisms and fault tolerance function typically as an integral part of P2P-Grid system have been well-studied. Our research aims to address the highly dynamic nature that arises in P2P-Grid systems by understanding nodes life time statistics in previous research. We are proposing a Check pointing-and-Recovery architecture for applications restarting as soon as possible on P2P-Grid systems. And failure-detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in P2P-Grid system. We also investigate how the design of various failure detection algorithms affects their performance in node average failure detection time. The evaluation shows our check pointing and restart paradigm and failure detection algorithm enables high reliability and performance with high node departure.

    KW - failure detection

    KW - failure recovery

    KW - fault tolerance

    KW - P2P-Grid

    UR - http://www.scopus.com/inward/record.url?scp=84867407578&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84867407578&partnerID=8YFLogxK

    U2 - 10.1109/IPDPSW.2012.308

    DO - 10.1109/IPDPSW.2012.308

    M3 - Conference contribution

    AN - SCOPUS:84867407578

    SN - 9780769546766

    SP - 2482

    EP - 2485

    BT - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012

    ER -