Computer Architecture Today

Informing the broad computing community about current activities, advances and future directions in computer architecture.
Archive of posts tagged: fault tolerance
SDCs: A B C

SDCs: A B C

Data center hyperscalers (Meta, Google, Alibaba) have disclosed over the last four years an unexpectedly high number of CPUs (~1 in 1000) that produce Silent Data Corruptions (SDCs), i.e. program executions that produce wrong results without any observable indication....

Read more...