Modeling and evaluating the time overhead induced by BER in COMA multiprocessors

Mohsen Sharifi*, Behrouz Zolfaghari

*この研究の対応する著者

研究成果: Article査読

2 被引用数 (Scopus)

抄録

Designing multiprocessors based on distributed shared memory (DSM) architecture considerably increases their scalability. But as the number of nodes in a multiprocessor increases, the probability of encountering failures in one or more nodes of the system raises as a serious problem. Thus, every large-scale multiprocessor should be equipped with mechanisms that tolerate node failures. Backward error recovery (BER) is one of the most feasible strategies to build fault tolerant multiprocessors and it can be shown that among various DSM-based architectures, cache only memory architecture (COMA) is the most suitable for implementing BER. The main reason is the existence of built-in mechanisms for data replication in COMA memory system. BER is applicable to COMA multiprocessors with minor hardware redundancy, but it will obviously cause some other kinds of overheads. The most important overhead induced by BER is the time required to produce and store recovery data. This paper introduces an analytical model for predicting the amount of this time overhead and then verifies the correctness of the model through comparing the results predicted from this model with the previously published simulation results. Both the analytical model and simulation results show that the overhead is nearly independent of the number of nodes. The immediate result is that BER is a cost-effective strategy for tolerating node failures in large-scale COMA multiprocessors with large numbers of nodes.

本文言語English
ページ(範囲)377-385
ページ数9
ジャーナルJournal of Systems Architecture
48
13-15
DOI
出版ステータスPublished - 2003 5月
外部発表はい

ASJC Scopus subject areas

  • ソフトウェア
  • ハードウェアとアーキテクチャ

フィンガープリント

「Modeling and evaluating the time overhead induced by BER in COMA multiprocessors」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル