Modeling and evaluating the time overhead induced by BER in COMA multiprocessors

Mohsen Sharifi*, Behrouz Zolfaghari

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


Designing multiprocessors based on distributed shared memory (DSM) architecture considerably increases their scalability. But as the number of nodes in a multiprocessor increases, the probability of encountering failures in one or more nodes of the system raises as a serious problem. Thus, every large-scale multiprocessor should be equipped with mechanisms that tolerate node failures. Backward error recovery (BER) is one of the most feasible strategies to build fault tolerant multiprocessors and it can be shown that among various DSM-based architectures, cache only memory architecture (COMA) is the most suitable for implementing BER. The main reason is the existence of built-in mechanisms for data replication in COMA memory system. BER is applicable to COMA multiprocessors with minor hardware redundancy, but it will obviously cause some other kinds of overheads. The most important overhead induced by BER is the time required to produce and store recovery data. This paper introduces an analytical model for predicting the amount of this time overhead and then verifies the correctness of the model through comparing the results predicted from this model with the previously published simulation results. Both the analytical model and simulation results show that the overhead is nearly independent of the number of nodes. The immediate result is that BER is a cost-effective strategy for tolerating node failures in large-scale COMA multiprocessors with large numbers of nodes.

Original languageEnglish
Pages (from-to)377-385
Number of pages9
JournalJournal of Systems Architecture
Issue number13-15
Publication statusPublished - 2003 May
Externally publishedYes


  • BER strategy
  • COMA
  • Distributed shared memory
  • Fault tolerance

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture


Dive into the research topics of 'Modeling and evaluating the time overhead induced by BER in COMA multiprocessors'. Together they form a unique fingerprint.

Cite this