Home Knowledge Base d_{model}$

d_{model}$

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Modeling hepatitis D virus kinetics during bulevirtide monotherapy: challenges and solutions

Announce Type: replace Abstract: The entry inhibitor Bulevirtide (BLV) was recently approved in Europe for treatment of chronic hepatitis D virus (HDV) infection, which is considered the most severe viral hepatitis infection. Theory indicates that models that account for free virus and infected cells, but do not include target cell dynamics (historically called the two-equation model) are limited to predicting a monophasic viral decline for antiviral agents that act only to block viral...

arXiv Physics 2d ago

Central auditory decline precedes cochlear deficits in a D-galactose mimetic model of aging

Age-related hearing loss reflects a mixture of concurrent peripheral cochlear and central auditory pathway degeneration. Disentangling their relative contributions has remained challenging because both decline together with natural aging. Here, we used systemic D-galactose (D-gal) administration to selectively accelerate central auditory aging while preserving peripheral cochlear function.

bioRxiv 8d ago

Overclocking Electrostatic Generative Models

arXiv:2509.22454v2 Announce Type: replace Abstract: Electrostatic generative models such as PFGM++ have recently emerged as a powerful framework, achieving competitive performance in image synthesis. PFGM++ operates in an extended data space with auxiliary dimensionality $D$, recovering the diffusion model framework as $D\to\infty$, while yielding superior empirical results for finite $D$. Like diffusion models, PFGM++ relies on expensive ODE simulations to generate samples, making it...

arXiv CS 6d ago

Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models

Announce Type: new Abstract: Model dimension ($d_{model}$) is a fundamental hyperparameter in transformer language models, yet its role in setting the geometric limits of feature representation remains under-explored. Grounded in the Linear Representation and Superposition Hypotheses - which propose that models encode features as near-orthogonal directions in latent space - we develop a framework for estimating how many such directions a model can support. We first establish the embedding...

arXiv CS 7d ago

Towards Persistent Case-Based Memory for Autonomous Data Science: A CBR-Augmented R&D-Agent with a Locally Deployable Small Language Model

Announce Type: new Abstract: Most top-performing autonomous data-science agents rely on frontier cloud models and lack persistent, cross-session memory. This paper addresses two open gaps: (1) the underexplored use of formally structured, quality-controlled Case-Based Reasoning (CBR) case bases coupling symbolic case records with executable code artefacts; and (2) the untested viability of Small Language Models (SLMs) as locally deployable agent backbones. We present CBR-augmented...

arXiv CS 5d ago

Effective Dimensionality as an Operator Invariant for Physics-Preserving Constraint Adaptation in Physics-Informed Neural Networks

Announce Type: cross Abstract: Physics-Informed Neural Networks inherently suffer from task interference because they rely on a shared parameter space to satisfy both governing differential equations and boundary conditions. We analyze this structural conflict using the Fisher Information Matrix to quantify the effective degrees of freedom ($d_{eff}$) in a physics-constrained model. Unlike the classical $d_{eff}$ which measures how many parameter directions are informed by data against a...

arXiv Physics 5d ago

Effective Dimensionality as an Operator Invariant for Physics-Preserving Constraint Adaptation in Physics-Informed Neural Networks

Announce Type: cross Abstract: Physics-Informed Neural Networks inherently suffer from task interference because they rely on a shared parameter space to satisfy both governing differential equations and boundary conditions. We analyze this structural conflict using the Fisher Information Matrix to quantify the effective degrees of freedom ($d_{eff}$) in a physics-constrained model. Unlike the classical $d_{eff}$ which measures how many parameter directions are informed by data against a...

arXiv CS 5d ago

Learning Self-Interpretation from Interpretability Artifacts: Training Lightweight Adapters on Vector-Label Pairs

Announce Type: replace Abstract: Self-interpretation methods prompt language models to describe their own internal states, but remain unreliable due to hyperparameter sensitivity. We show that training lightweight adapters on interpretability artifacts, while keeping the LM entirely frozen, yields reliable self-interpretation across tasks and model families. A scalar affine adapter with just $d_\text{model}+1$ parameters suffices: trained adapters generate sparse autoencoder feature labels...

arXiv CS 7d ago

An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers

Announce Type: new Abstract: Transformers consuming multi-channel scalar signals must embed $C$ simultaneous values into one $d_{\text{model}}$-dimensional vector per time step. We empirically audit eight input encoders -- spanning a shared-scalar baseline, per-channel linear projections, an orthogonality regulariser, a nonlinear MLP stem, block-partitioned concatenation, channel-independent and channel-as-token architectures, and a projected positional encoding -- on a synthetic benchmark...

arXiv CS 6d ago

D^2SD: Accelerating Speculative Decoding with Dual Diffusion Draft Models

arXiv:2606.04446v1 Announce Type: new Abstract: Speculative decoding accelerates autoregressive large language model inference by drafting multiple tokens and verifying them in a single target-model forward pass. Recent diffusion-based drafters generate an entire block of tokens in parallel but usually commit to a single draft sequence per verification: once the first mismatch occurs, all subsequent draft tokens are discarded, resulting in a limited acceptance rate. Naively batching more...

arXiv CS 6d ago