Science
eAID: Elastic Asynchronous Information Dispersal with Post-Dissemination Pruning
Key Points
arXiv:2603.24761v2 Announce Type: replace Abstract: Spreading and storing erasure-coded data effectively in distributed systems is challenging in practical settings. The dissemination of erasure-coded information is typically designed to complete only after receiving messages from $(N-F)$ nodes, thereby preparing for the worst-case, but rare, scenario of $F$ failures. In steady state, the remaining $F$ nodes may in fact be healthy, but their resources are not counted.
arXiv:2603.24761v2 Announce Type: replace
Abstract: Spreading and storing erasure-coded data effectively in distributed systems is challenging in practical settings. The dissemination of erasure-coded information is typically designed to complete only after receiving messages from $(N-F)$ nodes, thereby preparing for the worst-case, but rare, scenario of $F$ failures. In steady state, the remaining $F$ nodes may in fact be healthy, but their resources are not counted. This leads to over-provisioning of storage for encoded data.
This paper introduces eAID, a novel elastic information dispersal algorithm that addresses this conundrum through a two-stage approach.
First, the core protocol estimates the actual number $f$ of faulty nodes, rather than assuming the worst-case bound $F$. Dissemination completes quickly when messages are received from $(N-f)$ nodes, and more gradually when fewer nodes respond. Second, after initial dissemination completes, eAID continues monitoring for additional responses. As responses arrive from up to $N$ nodes, the system prunes the information stored at responding nodes accordingly.
A key technique enabling this seamless elasticity is an agile encoding scheme that varies the number of disseminated fragments while keeping both fragment size and the recovery threshold $(F+1)$ fixed. Not only does this enable varying the number of disseminated fragments on the fly, it also allows nodes to discard encoded fragments autonomously. Crucially, this is achieved without maintaining complex metadata, without requiring nodes to reconstruct or re-encode information, and without global coordination for storage decisions.
We demonstrate the practicality of eAID by integrating it with a replicated key-value store, and evaluating it in network environments with unpredictable latencies. The results show that eAID improves overall performance while significantly reducing long-term storage consumption.