LN
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models
arXiv:2601.09719v3 Announce Type: replace Abstract: Pre-Layer Normalization (Pre-LN) is the de facto choice for large language models (LLMs) and is crucial for stable pretraining and effective transfer learning. However, Pre-LN incurs repeated statistical-computation overhead and remains vulnerable to the curse of depth, where hidden-state magnitudes and variances grow as the number of layers increases, destabilizing training. Efficiency-oriented normalization-free methods such as Dynamic...
Silurus/ooxml: Pixel-faithful Office documents, rendered in the browser
This entire codebase — Rust parsers, TypeScript renderers, tests, and tooling — was implemented by Claude (Anthropic's AI assistant) through iterative prompting. No human-written application code exists in this repository. A browser-based viewer for Office Open XML documents that renders to an HTML Canvas element.
Proven Advantage of Multiobjective Evolutionary Algorithms for Problems with Different Degrees of Conflict
arXiv:2408.04207v3 Announce Type: replace Abstract: The field of multiobjective evolutionary algorithms (MOEAs) often emphasizes its popularity for optimization problems with conflicting objectives. However, it is still theoretically unknown how MOEAs perform compared with typical approaches outside this field. This paper conducts such a systematic theoretical comparison on problem classes with different degrees of conflict.
P-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8
arXiv:2606.06521v1 Announce Type: new Abstract: FP8 (E4M3) acceleration for attention computation offers significant throughput gains, but the 3-bit mantissa introduces precision challenges when the softmax probability matrix P is cast to FP8 before the P*V matrix multiplication. We analyze two implementation choices that affect output precision under the Attention Sink phenomenon: (1) the KV block iteration order, and (2) the static scaling factor applied to P before casting.
Alpine Linux 3.24.0 Released
Released We are pleased to announce the release of Alpine Linux 3.24.0, the first release in the v3.24 stable series. Highlights Significant changes Python setuptools 82.0.0 removed pkg_resources py3-setuptools has been upgraded to 82.0.0, which removed the deprecated pkg_resources module. Projects that still depend on it will no longer work and should migrate to its successors.
After Jeff Bezos' 'very rough day', Amazon satellite chief says satellites remain secure
Amazon’s satellite internet division has sought to calm employee concern after last week’s Blue Origin New Glenn rocket explosion, which founder Jeff Bezos described as a ‘very rough day’. The incident took place during a hot-fire test at Cape Canaveral, when the rocket erupted into a massive fireball ahead of a mission expected to carry Amazon’s operational internet satellites. While the company reported no injuries, the blast caused a significant amount of damage to Blue Origin’s launch...
Exact Unlearning in Reinforcement Learning
arXiv:2606.04182v1 Announce Type: new Abstract: We formulate the problem of \emph{exact unlearning} in reinforcement learning, where the goal is to design an efficient framework that enables the removal of any user's data upon deletion request, i.e., the online learner's output after unlearning is \emph{indistinguishable} from what would have been produced had the deleted user never interacted with the learner. For any $\rho >0$, we show that there exists a reinforcement learning (RL)...
Online Span Minimization for Flexible Uniform Jobs
arXiv:2606.06681v1 Announce Type: new Abstract: Motivated by the critical need for energy-efficient scheduling in cloud computing, this paper investigates Span Minimization, a fundamental variant of the well-studied BusyTime problem. In the general BusyTime problem, $n$ jobs characterized by release times, deadlines, and processing times must be partitioned into bundles of capacity $B$, where the objective is to minimize the total active duration of the virtual machines. Span minimization...
MechLens: Late Crystallization of Factual Knowledge Explains Intervention Effectiveness in Language Models
arXiv:2606.07978v1 Announce Type: new Abstract: Understanding where LLMs store factual knowledge is critical for hallucination mitigation. We systematically quantify Late Crystallization: factual knowledge does not gradually emerge across layers but "crystallizes" abruptly at the final layers.
Optimality of quasi-Monte Carlo methods and suboptimality of the sparse-grid Gauss--Hermite rule in Gaussian Sobolev spaces
Announce Type: replace Abstract: Optimality of several quasi-Monte Carlo methods and suboptimality of the sparse-grid quadrature based on the univariate Gauss--Hermite rule is proved in the Sobolev spaces of mixed dominating smoothness of order $\alpha$, where the optimality is in the sense of worst-case convergence rate. For sparse-grid Gauss--Hermite quadrature, lower and upper bounds are established, with rates coinciding up to a logarithmic factor. The dominant rate is found to be only...