Home Knowledge Base MVAPICH2

MVAPICH2

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

PartRePer-MPI: Combining Fault Tolerance and Performance for MPI Applications

arXiv:2310.16370v2 Announce Type: replace Abstract: As we have entered Exascale computing, the faults in high-performance systems are expected to increase considerably. To compensate for a higher failure rate, the standard checkpoint/restart technique would need to create checkpoints at a much higher frequency resulting in an excessive amount of overhead which would not be sustainable for many scientific applications. Replication allows for fast recovery from failures by simply dropping the...

arXiv CS 7d ago