Home Knowledge Base Rotary Position Embeddings

Rotary Position Embeddings

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

GridPE: A Grid Cell-Inspired Unified Position Embedding for Arbitrary-Dimensional Spaces

arXiv:2406.07049v3 Announce Type: replace Abstract: Understanding spatial relationships across all dimensions is fundamental for intelligent systems. However, existing positional embeddings, such as Rotary Positional Embedding (RoPE), lack theoretical guarantees for high-dimensional spatiotemporal tasks like video understanding and robotic navigation. Inspired by the hexagonal periodic coding of grid cells in mammalian spatial cognition, we propose GridPE -- a novel positional embedding...

arXiv CS 5d ago

Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings

Announce Type: replace Abstract: The attention mechanism in a Transformer architecture matches key to query based on both content -- the what -- and position in a sequence -- the where. We present an analysis indicating that what and where are entangled in the popular RoPE rotary position embedding. This entanglement can impair performance particularly when decisions require independent matches on these two factors.

arXiv CS 1d ago

Beyond Sinusoids: A Morlet Wavelet Framework for Transformer Positional Encoding

arXiv:2606.01258v1 Announce Type: new Abstract: Standard positional encodings for transformers - sinusoidal and rotary (RoPE) - treat every position as equally local: they encode where a token is, but not how far its positional influence should extend. We propose that the Morlet wavelet, which simultaneously minimises uncertainty in position and frequency, is the natural basis for positional encoding, and introduce Morlet Positional Encoding (MoPE): each embedding dimension learns its own...

arXiv CS 8d ago

EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

Announce Type: replace Abstract: Extending the context window of large language models typically requires training on sequences at the target length, incurring quadratic memory and computational costs that make long-context adaptation expensive and difficult to reproduce. We propose EndPrompt, a method that achieves effective context extension using only short training sequences. The core insight is that exposing a model to long-range relative positional distances does not require...

arXiv CS 7d ago

PRISM: Position-encoded Regressive Inverse Spectral Model for Multilayer Thin-Film Design

arXiv:2605.26502v2 Announce Type: replace Abstract: The inverse problem of multilayer thin-film optical coatings design represents a complex combinatorial-continuous optimization challenge. We present PRISM (Position-encoded Regressive Inverse Spectral Model), a unified decoder-only autoregressive transformer that streamlines this process by jointly predicting discrete material selection and continuous thickness regression within a single backbone. PRISM introduces two primary architectural...

arXiv CS 9d ago

PRISM: Position-encoded Regressive Inverse Spectral Model for Multilayer Thin-Film Design

arXiv:2605.26502v2 Announce Type: replace-cross Abstract: The inverse problem of multilayer thin-film optical coatings design represents a complex combinatorial-continuous optimization challenge. We present PRISM (Position-encoded Regressive Inverse Spectral Model), a unified decoder-only autoregressive transformer that streamlines this process by jointly predicting discrete material selection and continuous thickness regression within a single backbone. PRISM introduces two primary...

arXiv Physics 9d ago

HyperDiT: Hyper-Connected Transformers for High-Fidelity Pixel-Space Diffusion

arXiv:2605.15741v2 Announce Type: replace Abstract: Pixel-space diffusion models bypass the reconstruction bottleneck of Variational Autoencoders (VAEs) but face a fundamental "granularity dilemma": capturing global semantics favors large patch scales, while generating high-fidelity details demands fine-grained inputs. To address this issue, we propose HyperDiT, a unified framework establishing Hyper-Connected Cross-Scale Interactions to bridge the semantic and pixel manifold. Diverging from...

arXiv CS 6d ago

Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

arXiv:2606.02288v1 Announce Type: new Abstract: Massive activation spikes in Large Language Models (LLMs) severely degrade quantization by stretching dynamic ranges. While prior hypotheses characterize these as high-level scalar biases, we argue that they are merely the scalar intermediates of rigid, structural vector biases in the spike-carrying tokens. We show that these tokens converge to constant vectors after normalization that drive the attention sink and value-state drain mechanisms.

arXiv CS 8d ago

PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration

arXiv:2502.00527v2 Announce Type: replace Abstract: The KV cache in large language models is a dominant factor in memory usage, limiting their broader applicability. Quantizing the cache to lower bit widths is an effective way to reduce computational costs; however, previous methods struggle with quantizing key vectors due to outliers, resulting in excessive overhead. We propose a novel quantization approach called PolarQuant, which efficiently addresses the outlier challenge.

arXiv CS 2d ago

Ahoy, DECmate II the little PDP-8 that could

Now, that's a lot of word processing. But under the hood it's still at least PDP-8 adjacent, even considering its oddities and incompatibilities, and you can make it do many of the things a full-size Eight can. We'll take this basic unit, convert the floppy drives to solid state, tap the video output, and put it through its paces.

Hacker News 10d ago