Environment
Assessing the Energy and Carbon Emissions of Neural Speaker Verification Model in Training and Inference
Key Points
arXiv:2606.08087v1 Announce Type: new Abstract: Deep-learning speaker verification (SV) increasingly relies on deep neural network backbones, whose environmental impact remains largely undocumented. In this paper, we conduct an evaluation of ResNet architectures trained on VoxCeleb2, varying depth, channel width, and stage distribution, and measure energy consumption and carbon footprint using node-level sensors. Results show a clear point of diminishing returns: deeper or wider models bring...
arXiv:2606.08087v1 Announce Type: new
Abstract: Deep-learning speaker verification (SV) increasingly relies on deep neural network backbones, whose environmental impact remains largely undocumented. In this paper, we conduct an evaluation of ResNet architectures trained on VoxCeleb2, varying depth, channel width, and stage distribution, and measure energy consumption and carbon footprint using node-level sensors. Results show a clear point of diminishing returns: deeper or wider models bring only marginal accuracy gains while energy consumption grows steeply. In contrast, mid-sized networks such as ResNet-50 and stage-concentrated variants achieve favorable trade-offs between performance and environmental impact. These findings provide actionable guidelines for designing energy-efficient SV systems.