d_{model}$
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Modeling hepatitis D virus kinetics during bulevirtide monotherapy: challenges and solutions
Announce Type: replace Abstract: The entry inhibitor Bulevirtide (BLV) was recently approved in Europe for treatment of chronic hepatitis D virus (HDV) infection, which is considered the most severe viral hepatitis infection. Theory indicates that models that account for free virus and infected cells, but do not include target cell dynamics (historically called the two-equation model) are limited to predicting a monophasic viral decline for antiviral agents that act only to block viral...
Central auditory decline precedes cochlear deficits in a D-galactose mimetic model of aging
Age-related hearing loss reflects a mixture of concurrent peripheral cochlear and central auditory pathway degeneration. Disentangling their relative contributions has remained challenging because both decline together with natural aging. Here, we used systemic D-galactose (D-gal) administration to selectively accelerate central auditory aging while preserving peripheral cochlear function.
Overclocking Electrostatic Generative Models
arXiv:2509.22454v2 Announce Type: replace Abstract: Electrostatic generative models such as PFGM++ have recently emerged as a powerful framework, achieving competitive performance in image synthesis. PFGM++ operates in an extended data space with auxiliary dimensionality $D$, recovering the diffusion model framework as $D\to\infty$, while yielding superior empirical results for finite $D$. Like diffusion models, PFGM++ relies on expensive ODE simulations to generate samples, making it...
Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models
Announce Type: new Abstract: Model dimension ($d_{model}$) is a fundamental hyperparameter in transformer language models, yet its role in setting the geometric limits of feature representation remains under-explored. Grounded in the Linear Representation and Superposition Hypotheses - which propose that models encode features as near-orthogonal directions in latent space - we develop a framework for estimating how many such directions a model can support. We first establish the embedding...
Towards Persistent Case-Based Memory for Autonomous Data Science: A CBR-Augmented R&D-Agent with a Locally Deployable Small Language Model
Announce Type: new Abstract: Most top-performing autonomous data-science agents rely on frontier cloud models and lack persistent, cross-session memory. This paper addresses two open gaps: (1) the underexplored use of formally structured, quality-controlled Case-Based Reasoning (CBR) case bases coupling symbolic case records with executable code artefacts; and (2) the untested viability of Small Language Models (SLMs) as locally deployable agent backbones. We present CBR-augmented...
Effective Dimensionality as an Operator Invariant for Physics-Preserving Constraint Adaptation in Physics-Informed Neural Networks
Announce Type: cross Abstract: Physics-Informed Neural Networks inherently suffer from task interference because they rely on a shared parameter space to satisfy both governing differential equations and boundary conditions. We analyze this structural conflict using the Fisher Information Matrix to quantify the effective degrees of freedom ($d_{eff}$) in a physics-constrained model. Unlike the classical $d_{eff}$ which measures how many parameter directions are informed by data against a...
Effective Dimensionality as an Operator Invariant for Physics-Preserving Constraint Adaptation in Physics-Informed Neural Networks
Announce Type: cross Abstract: Physics-Informed Neural Networks inherently suffer from task interference because they rely on a shared parameter space to satisfy both governing differential equations and boundary conditions. We analyze this structural conflict using the Fisher Information Matrix to quantify the effective degrees of freedom ($d_{eff}$) in a physics-constrained model. Unlike the classical $d_{eff}$ which measures how many parameter directions are informed by data against a...
Learning Self-Interpretation from Interpretability Artifacts: Training Lightweight Adapters on Vector-Label Pairs
Announce Type: replace Abstract: Self-interpretation methods prompt language models to describe their own internal states, but remain unreliable due to hyperparameter sensitivity. We show that training lightweight adapters on interpretability artifacts, while keeping the LM entirely frozen, yields reliable self-interpretation across tasks and model families. A scalar affine adapter with just $d_\text{model}+1$ parameters suffices: trained adapters generate sparse autoencoder feature labels...
An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers
Announce Type: new Abstract: Transformers consuming multi-channel scalar signals must embed $C$ simultaneous values into one $d_{\text{model}}$-dimensional vector per time step. We empirically audit eight input encoders -- spanning a shared-scalar baseline, per-channel linear projections, an orthogonality regulariser, a nonlinear MLP stem, block-partitioned concatenation, channel-independent and channel-as-token architectures, and a projected positional encoding -- on a synthetic benchmark...
D^2SD: Accelerating Speculative Decoding with Dual Diffusion Draft Models
arXiv:2606.04446v1 Announce Type: new Abstract: Speculative decoding accelerates autoregressive large language model inference by drafting multiple tokens and verifying them in a single target-model forward pass. Recent diffusion-based drafters generate an entire block of tokens in parallel but usually commit to a single draft sequence per verification: once the first mismatch occurs, all subsequent draft tokens are discarded, resulting in a limited acceptance rate. Naively batching more...