by Dimitris Gizopoulos on Sep 16, 2024 | Tags: fault tolerance, Reliability, silent data corruptions
Data center hyperscalers (Meta, Google, Alibaba) have disclosed over the last four years an unexpectedly high number of CPUs (~1 in 1000) that produce Silent Data Corruptions (SDCs), i.e. program executions that produce wrong results without any observable indication....
Read more...
by Caroline Trippel on Aug 15, 2022 | Tags: Datacenters, Errors, Reliability, Testing
Hyperscalers are reporting frequent silent data corruptions (SDCs)—a.k.a. silent errors or corrupt execution errors (CEEs)—in their cloud fleets caused by silicon manufacturing defects. Notably, SDCs at-scale exhibit error occurrence rates on the order of one fault...
Read more...
by Yunong Shi, Chris Chamberland, Andrew Cross, and Fred Chong on Sep 23, 2019 | Tags: Quantum Computing, Reliability
Continuing with our thread on looking past abstractions in quantum computing, guest bloggers Yunong Shi from EPiQC and Christopher Chamberland and Andrew Cross from IBM examine how to make qubits fault tolerant by exploiting more of the physical state space available...
Read more...