GELU
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Activation Concentration: Characterizing Column-Level Output Sparsity Across Diffusion Model Architectures
Announce Type: replace Abstract: Recent diffusion accelerators exploit activation sparsity by skipping near-zero GELU outputs, reporting 52--85% element-level sparsity. However, systolic-array hardware processes activations at column granularity, where a single non-zero element forces the entire column to be computed. We present the first systematic column-level sparsity characterization across seven diffusion workloads spanning three workload groups and four modalities.
SaluNet: Enabling Total Plasticity in Normalization-Free Deep Networks
Announce Type: new Abstract: Normalization layers such as BatchNorm and LayerNorm have long been considered essential for stable training in deep networks. This work demonstrates that they can be fully replaced by a single learnable activation mechanism. We identify a plasticity suppression effect induced by standard normalization: learnable activation parameters rapidly lose adaptability when paired with normalization layers.