Failure detection in P2P-grid environments

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    P2P-Grid system provides a framework for converging Grid and peer-to-peer network to deploy large-scale distributed applications. However, Nodes dynamic participation arbitrarily that makes failure more common than in other systems. As the most common technique for fault tolerance, Check pointing-and-Recovery saves application execution state during normal execution and restoring the saved state after a failure to reduce the amount of lost work. In this paper, we propose a Check pointing-and-Recovery architecture for applications restarting as soon as possible on P2P-Grid systems. And failure-detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in P2P-Grid system. To this end, failure-detection mechanisms as an integral part of P2P-Grid system have been well-studied. We investigate how the design of various failure detection algorithms affects their performance in node average failure detection time. We also provide numerical results based on both theoretical analysis and simulations. The evaluated results show improvement of the performance on the basis of the WP failure detection algorithm.

    Original languageEnglish
    Title of host publicationProceedings - 32nd IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2012
    Pages369-374
    Number of pages6
    DOIs
    Publication statusPublished - 2012
    Event32nd IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2012 - Macau
    Duration: 2012 Jun 182012 Jun 21

    Other

    Other32nd IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2012
    CityMacau
    Period12/6/1812/6/21

    Fingerprint

    Fault tolerance
    Recovery
    Peer to peer networks

    Keywords

    • failure detection
    • failure recovery
    • fault tolerance
    • P2P-Grid systems

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Control and Systems Engineering

    Cite this

    Huan, W., & Nakazato, H. (2012). Failure detection in P2P-grid environments. In Proceedings - 32nd IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2012 (pp. 369-374). [6258182] https://doi.org/10.1109/ICDCSW.2012.18

    Failure detection in P2P-grid environments. / Huan, Wang; Nakazato, Hidenori.

    Proceedings - 32nd IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2012. 2012. p. 369-374 6258182.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Huan, W & Nakazato, H 2012, Failure detection in P2P-grid environments. in Proceedings - 32nd IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2012., 6258182, pp. 369-374, 32nd IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2012, Macau, 12/6/18. https://doi.org/10.1109/ICDCSW.2012.18
    Huan W, Nakazato H. Failure detection in P2P-grid environments. In Proceedings - 32nd IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2012. 2012. p. 369-374. 6258182 https://doi.org/10.1109/ICDCSW.2012.18
    Huan, Wang ; Nakazato, Hidenori. / Failure detection in P2P-grid environments. Proceedings - 32nd IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2012. 2012. pp. 369-374
    @inproceedings{24d1c37e8a0b4bcfac22e7d72ed652b1,
    title = "Failure detection in P2P-grid environments",
    abstract = "P2P-Grid system provides a framework for converging Grid and peer-to-peer network to deploy large-scale distributed applications. However, Nodes dynamic participation arbitrarily that makes failure more common than in other systems. As the most common technique for fault tolerance, Check pointing-and-Recovery saves application execution state during normal execution and restoring the saved state after a failure to reduce the amount of lost work. In this paper, we propose a Check pointing-and-Recovery architecture for applications restarting as soon as possible on P2P-Grid systems. And failure-detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in P2P-Grid system. To this end, failure-detection mechanisms as an integral part of P2P-Grid system have been well-studied. We investigate how the design of various failure detection algorithms affects their performance in node average failure detection time. We also provide numerical results based on both theoretical analysis and simulations. The evaluated results show improvement of the performance on the basis of the WP failure detection algorithm.",
    keywords = "failure detection, failure recovery, fault tolerance, P2P-Grid systems",
    author = "Wang Huan and Hidenori Nakazato",
    year = "2012",
    doi = "10.1109/ICDCSW.2012.18",
    language = "English",
    pages = "369--374",
    booktitle = "Proceedings - 32nd IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2012",

    }

    TY - GEN

    T1 - Failure detection in P2P-grid environments

    AU - Huan, Wang

    AU - Nakazato, Hidenori

    PY - 2012

    Y1 - 2012

    N2 - P2P-Grid system provides a framework for converging Grid and peer-to-peer network to deploy large-scale distributed applications. However, Nodes dynamic participation arbitrarily that makes failure more common than in other systems. As the most common technique for fault tolerance, Check pointing-and-Recovery saves application execution state during normal execution and restoring the saved state after a failure to reduce the amount of lost work. In this paper, we propose a Check pointing-and-Recovery architecture for applications restarting as soon as possible on P2P-Grid systems. And failure-detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in P2P-Grid system. To this end, failure-detection mechanisms as an integral part of P2P-Grid system have been well-studied. We investigate how the design of various failure detection algorithms affects their performance in node average failure detection time. We also provide numerical results based on both theoretical analysis and simulations. The evaluated results show improvement of the performance on the basis of the WP failure detection algorithm.

    AB - P2P-Grid system provides a framework for converging Grid and peer-to-peer network to deploy large-scale distributed applications. However, Nodes dynamic participation arbitrarily that makes failure more common than in other systems. As the most common technique for fault tolerance, Check pointing-and-Recovery saves application execution state during normal execution and restoring the saved state after a failure to reduce the amount of lost work. In this paper, we propose a Check pointing-and-Recovery architecture for applications restarting as soon as possible on P2P-Grid systems. And failure-detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in P2P-Grid system. To this end, failure-detection mechanisms as an integral part of P2P-Grid system have been well-studied. We investigate how the design of various failure detection algorithms affects their performance in node average failure detection time. We also provide numerical results based on both theoretical analysis and simulations. The evaluated results show improvement of the performance on the basis of the WP failure detection algorithm.

    KW - failure detection

    KW - failure recovery

    KW - fault tolerance

    KW - P2P-Grid systems

    UR - http://www.scopus.com/inward/record.url?scp=84866364905&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84866364905&partnerID=8YFLogxK

    U2 - 10.1109/ICDCSW.2012.18

    DO - 10.1109/ICDCSW.2012.18

    M3 - Conference contribution

    AN - SCOPUS:84866364905

    SP - 369

    EP - 374

    BT - Proceedings - 32nd IEEE International Conference on Distributed Computing Systems Workshops, ICDCSW 2012

    ER -