Home Knowledge Base LN

LN

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models

arXiv:2601.09719v3 Announce Type: replace Abstract: Pre-Layer Normalization (Pre-LN) is the de facto choice for large language models (LLMs) and is crucial for stable pretraining and effective transfer learning. However, Pre-LN incurs repeated statistical-computation overhead and remains vulnerable to the curse of depth, where hidden-state magnitudes and variances grow as the number of layers increases, destabilizing training. Efficiency-oriented normalization-free methods such as Dynamic...

arXiv CS 6d ago

Silurus/ooxml: Pixel-faithful Office documents, rendered in the browser

This entire codebase — Rust parsers, TypeScript renderers, tests, and tooling — was implemented by Claude (Anthropic's AI assistant) through iterative prompting. No human-written application code exists in this repository. A browser-based viewer for Office Open XML documents that renders to an HTML Canvas element.

Hacker News 3d ago

Proven Advantage of Multiobjective Evolutionary Algorithms for Problems with Different Degrees of Conflict

arXiv:2408.04207v3 Announce Type: replace Abstract: The field of multiobjective evolutionary algorithms (MOEAs) often emphasizes its popularity for optimization problems with conflicting objectives. However, it is still theoretically unknown how MOEAs perform compared with typical approaches outside this field. This paper conducts such a systematic theoretical comparison on problem classes with different degrees of conflict.

arXiv CS 1d ago

P-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8

arXiv:2606.06521v1 Announce Type: new Abstract: FP8 (E4M3) acceleration for attention computation offers significant throughput gains, but the 3-bit mantissa introduces precision challenges when the softmax probability matrix P is cast to FP8 before the P*V matrix multiplication. We analyze two implementation choices that affect output precision under the Attention Sink phenomenon: (1) the KV block iteration order, and (2) the static scaling factor applied to P before casting.

arXiv CS 2d ago

Alpine Linux 3.24.0 Released

Released We are pleased to announce the release of Alpine Linux 3.24.0, the first release in the v3.24 stable series. Highlights Significant changes Python setuptools 82.0.0 removed pkg_resources py3-setuptools has been upgraded to 82.0.0, which removed the deprecated pkg_resources module. Projects that still depend on it will no longer work and should migrate to its successors.

Hacker News 23h ago

After Jeff Bezos' 'very rough day', Amazon satellite chief says satellites remain secure

Amazon’s satellite internet division has sought to calm employee concern after last week’s Blue Origin New Glenn rocket explosion, which founder Jeff Bezos described as a ‘very rough day’. The incident took place during a hot-fire test at Cape Canaveral, when the rocket erupted into a massive fireball ahead of a mission expected to carry Amazon’s operational internet satellites. While the company reported no injuries, the blast caused a significant amount of damage to Blue Origin’s launch...

Times of India 6d ago

Exact Unlearning in Reinforcement Learning

arXiv:2606.04182v1 Announce Type: new Abstract: We formulate the problem of \emph{exact unlearning} in reinforcement learning, where the goal is to design an efficient framework that enables the removal of any user's data upon deletion request, i.e., the online learner's output after unlearning is \emph{indistinguishable} from what would have been produced had the deleted user never interacted with the learner. For any $\rho >0$, we show that there exists a reinforcement learning (RL)...

arXiv CS 6d ago

Online Span Minimization for Flexible Uniform Jobs

arXiv:2606.06681v1 Announce Type: new Abstract: Motivated by the critical need for energy-efficient scheduling in cloud computing, this paper investigates Span Minimization, a fundamental variant of the well-studied BusyTime problem. In the general BusyTime problem, $n$ jobs characterized by release times, deadlines, and processing times must be partitioned into bundles of capacity $B$, where the objective is to minimize the total active duration of the virtual machines. Span minimization...

arXiv CS 2d ago

MechLens: Late Crystallization of Factual Knowledge Explains Intervention Effectiveness in Language Models

arXiv:2606.07978v1 Announce Type: new Abstract: Understanding where LLMs store factual knowledge is critical for hallucination mitigation. We systematically quantify Late Crystallization: factual knowledge does not gradually emerge across layers but "crystallizes" abruptly at the final layers.

arXiv CS 1d ago

Optimality of quasi-Monte Carlo methods and suboptimality of the sparse-grid Gauss--Hermite rule in Gaussian Sobolev spaces

Announce Type: replace Abstract: Optimality of several quasi-Monte Carlo methods and suboptimality of the sparse-grid quadrature based on the univariate Gauss--Hermite rule is proved in the Sobolev spaces of mixed dominating smoothness of order $\alpha$, where the optimality is in the sense of worst-case convergence rate. For sparse-grid Gauss--Hermite quadrature, lower and upper bounds are established, with rates coinciding up to a logarithmic factor. The dominant rate is found to be only...

arXiv CS 1d ago