Home › Knowledge Base › Layer (L

Layer (L

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

L$^3$: Large Lookup Layers

arXiv:2601.21461v3 Announce Type: replace Abstract: Modern sparse language models typically achieve sparsity through Mixture-of-Experts (MoE) layers, which dynamically route tokens to dense MLP "experts." However, dynamic hard routing has a number of drawbacks, such as potentially poor hardware efficiency and needing auxiliary losses for stable training. In contrast, the tokenizer embedding table, which is natively sparse, largely avoids these issues by selecting a single embedding per token...

arXiv CS 6d ago

$p$-Robust Trace Liftings for Discrete Harmonic Extensions and Boundary-Preserving $hp$ Interpolation on Tetrahedral Meshes

arXiv:2606.02086v1 Announce Type: new Abstract: We construct p-robust polynomial trace liftings on three-dimensional tetrahedral meshes. The prescribed trace is a continuous piecewise polynomial function on a boundary face patch; the tetrahedra touching this patch have one common degree, while the interior degrees may be arbitrary. The lifting is degree-preserving, supported in the corresponding boundary layer, and satisfies both an H^1 estimate and a scaled boundary-layer L^2 estimate with...

arXiv CS 8d ago

Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

arXiv:2604.08304v3 Announce Type: replace Abstract: Retrieval-augmented generation (RAG) extends large language models (LLMs) with external knowledge, but this access path also introduces security risks that existing work often conflates with inherent LLM flaws. We frame secure RAG as securing external knowledge access and organize the literature with SLOT, a taxonomy along four axes: the attack Surface (S) where an adversary acts, the defense Layer (L) that controls the same point, the...

arXiv CS 1d ago

Bayesian Inference with Shaped Deep Non-linear MLPs

arXiv:2605.30860v1 Announce Type: cross Abstract: A central aim of deep learning theory is to characterize how neural networks make predictions in the regime of simultaneously large model and training set size. Since the limits of diverging number of model parameters and dataset size do not commute it is not clear a priori what limits exist. In this work, we shed new light on these questions by studying Bayesian inference in deep non-linear MLPs in the regime where the number of training...

arXiv CS 9d ago

Neural Attention Search Linear: Towards Adaptive Token-Level Hybrid Attention Models

arXiv:2602.03681v2 Announce Type: replace Abstract: The quadratic computational complexity of softmax transformers has become a bottleneck in long-context scenarios. In contrast, linear attention model families provide a promising direction towards a more efficient sequential model. These linear attention models compress past KV values into a single hidden state, thereby efficiently reducing complexity during both training and inference.

arXiv CS 7d ago

Qift: Shift-Friendly No-Zero W2 Post-Training Quantization for Rotated W2A4/KV4 LLM Inference

arXiv:2606.02823v1 Announce Type: new Abstract: Two-bit weight quantization is attractive for memory-efficient LLM inference, but the standard W2 level set {-2,-1,0,+1} often collapses under aggressive W2A4/KV4 settings. We study the scalar level-set geometry of two-bit weights in a Hadamard-rotated quantization pipeline. Conventional asymmetric W2 substantially improves over the standard level set, indicating that W2A4 failure is not only a bit-width problem but also a reconstruction-level...

arXiv CS 7d ago

Efficient Approximation for Encoder--Decoder Neural Operators via Variation Spaces

Announce Type: cross Abstract: We study operator learning using encoder--decoder neural networks. Inspired by the function-space theory of neural networks, we introduce a variation space as an infinite-dimensional structural class for nonlinear operators. This space is defined through vector-valued measures directly on the input and output spaces.

arXiv CS 8d ago

Two-component exciton condensates in an electron–hole bilayer

Abstract Macroscopic quantum coherence emerges when bosons condense into a Bose–Einstein condensate (BEC)1,2,3,4,5. Excitons are a long-sought solid-state route to high-temperature BECs with strong interactions, electrical tunability and potentially multicomponent spinor order, but conclusive evidence for equilibrium condensation has remained elusive. Here we report evidence for two-component exciton BECs in MoSe2/hBN/WSe2 electron–hole bilayers6,7,8,9 by probing the spin–valley...

Nature 20h ago

A thalamus–brainstem attractor network drives history-biased decisions

Abstract Natural environments often change gradually, making it adaptive to bias decisions on the basis of the recent past — a phenomenon known as serial dependence1,2,3. Large-scale recordings during behaviour have identified that serial dependence is a common motif for decision-making, with neural representations of past experiences found throughout the brain4,5,6,7,8,9,10,11. However, it remains unclear whether this bias arises from dedicated neural circuits with history-specific...

Nature 20h ago

Neural Spectral Element Methods for stiff multiphysics PDEs with electrochemical transport benchmarks

arXiv:2606.02335v1 Announce Type: cross Abstract: The Neural Spectral Element Method (NSEM) evaluates each network only at fixed Legendre-Gauss-Lobatto quadrature nodes and replaces all derivative calls with precomputed spectral differentiation matrices. The resulting deterministic loss enables limited-memory BFGS (L-BFGS) to reach residuals of 10^-9 to 10^-10. A Kosloff-Tal-Ezer coordinate map resolves electrochemical boundary layers, while a mesh-free neural mortar framework couples...

arXiv Physics 8d ago