Home Knowledge Base GELU

GELU

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Activation Concentration: Characterizing Column-Level Output Sparsity Across Diffusion Model Architectures

Announce Type: replace Abstract: Recent diffusion accelerators exploit activation sparsity by skipping near-zero GELU outputs, reporting 52--85% element-level sparsity. However, systolic-array hardware processes activations at column granularity, where a single non-zero element forces the entire column to be computed. We present the first systematic column-level sparsity characterization across seven diffusion workloads spanning three workload groups and four modalities.

arXiv CS 7d ago

SaluNet: Enabling Total Plasticity in Normalization-Free Deep Networks

Announce Type: new Abstract: Normalization layers such as BatchNorm and LayerNorm have long been considered essential for stable training in deep networks. This work demonstrates that they can be fully replaced by a single learnable activation mechanism. We identify a plasticity suppression effect induced by standard normalization: learnable activation parameters rapidly lose adaptability when paired with normalization layers.

arXiv CS 7d ago