Home Knowledge Base von Oswald et al.

von Oswald et al.

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

arXiv:2506.05233v2 Announce Type: replace Abstract: Sequence modeling is currently dominated by causal transformer architectures that use softmax self-attention. Although widely adopted, transformers require scaling memory and compute linearly during inference. A recent stream of work linearized the softmax operation, resulting in powerful recurrent neural network (RNN) models with constant memory and compute costs such as DeltaNet, Mamba or xLSTM.

arXiv CS 6d ago