iteration~$k$
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
On Parallel and Batch-Cutting Strategies for Norm-Minimization-Based Convex Vector Optimization
arXiv:2606.05617v1 Announce Type: cross Abstract: We develop parallel and batch-cutting variants of the norm-minimization-based outer approximation algorithm for convex vector optimization. The standard algorithm solves $N_k$ independent subproblems at each iteration~$k$ to evaluate all vertices of the current polyhedral approximation, but processes only the single best cut. We propose two improvements.
Subtraction Gets You More: Gap-Aware Retrieval for Multimodal Multi-Hop QA
arXiv:2605.28641v2 Announce Type: replace Abstract: In multimodal multi-hop question answering, we focus on the initial retrieval stage via two distinct tasks: (1) evidence set completion, retrieving missing evidence given context, and (2) sequential pool construction, iteratively building the top-$K$ pool from the scratch. Under these settings, we point out that conventional iterative retrieval frameworks often suffer from Semantic Anchoring, where previously fetched evidence traps the...
CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability
arXiv:2606.01495v2 Announce Type: replace Abstract: We present CART (Context-Anchored Recurrent Transformer), a parameter-efficient language model that reuses a single shared core block R times across depth. Unlike prior looped transformers that recompute key-value tensors at every iteration, CART computes K and V once from a multi-layer prelude and has the recurrent core cross-attend to those frozen tensors via multi-head latent attention. A learned Linear Time-Invariant (LTI) gate keeps...
CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability
new Abstract: We present CART (Context-Anchored Recurrent Transformer), a parameter-efficient language model that reuses a single shared core block R times across depth. Unlike prior looped transformers that recompute key-value tensors at every iteration, CART computes K and V once from a multi-layer prelude and has the recurrent core cross-attend to those frozen tensors via multi-head latent attention. A learned Linear Time-Invariant (LTI) gate keeps the recurrence stable: its spectral...
Rossi-alpha Benchmark Validation of a Static Alpha Eigenvalue Capability in OpenMC
arXiv:2606.00907v1 Announce Type: new Abstract: A static alpha eigenvalue capability was implemented in a modified version of the open-source Monte Carlo radiation transport code OpenMC and validated against Rossi-alpha measurements from 21 delayed-critical benchmark experiments and 33 subcritical configurations spanning fast, intermediate, and thermal systems with U-233, HEU, IEU, LEU, and plutonium fuels. The effective delayed neutron fraction was calculated using the k-prompt method, and...
A Machine Learning-Based Framework for Discovering Huntington's Disease Stages: Integrating Graph Representation Learning and clustering to Uncover Progression Dynamics in Longitudinal Enroll-HD Dataset
arXiv:2606.06196v1 Announce Type: new Abstract: Huntington's disease (HD) is a progressive brain disorder that gradually affects movement, cognitive function, and behavior. Identifying the stage of the disease accurately and consistently is important for understanding its course, grouping patients, personalized care, and discovering treatment. Existing clinical staging frameworks rely primarily on predefined clinical measurement thresholds and clinical expert decisions, yet these discrete...
Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples
arXiv:2606.05967v2 Announce Type: replace-cross Abstract: In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method.
Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples
arXiv:2606.05967v1 Announce Type: cross Abstract: In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method.
How Accurately Can a Gaussian Approximate Stochastic Approximation Iterates?
arXiv:2602.13906v2 Announce Type: replace-cross Abstract: Stochastic approximation (SA) is a method for finding the root of an operator perturbed by noise. The focus of this paper is studying the distribution of SA iterates in finite time. In general, it is not possible to characterize the exact distribution, and therefore our goal is to find an approximation which can yield useful tail bounds.
Pseudospectral Bounds for Transient Amplification in Coupled Gradient Descent
arXiv:2606.04031v1 Announce Type: new Abstract: Coupled gradient descent--where the update of one parameter block depends on another--underlies bilevel optimization, two-time-scale stochastic approximation, and adversarial training. When the coupled Jacobian is block-triangular, asymptotic stability is governed by the spectral radii of the diagonal blocks, yet transient amplification before convergence can be arbitrarily large due to non-normality. We develop a sharp pseudospectral theory...