Rotary Position Embeddings
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
GridPE: A Grid Cell-Inspired Unified Position Embedding for Arbitrary-Dimensional Spaces
arXiv:2406.07049v3 Announce Type: replace Abstract: Understanding spatial relationships across all dimensions is fundamental for intelligent systems. However, existing positional embeddings, such as Rotary Positional Embedding (RoPE), lack theoretical guarantees for high-dimensional spatiotemporal tasks like video understanding and robotic navigation. Inspired by the hexagonal periodic coding of grid cells in mammalian spatial cognition, we propose GridPE -- a novel positional embedding...
Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings
Announce Type: replace Abstract: The attention mechanism in a Transformer architecture matches key to query based on both content -- the what -- and position in a sequence -- the where. We present an analysis indicating that what and where are entangled in the popular RoPE rotary position embedding. This entanglement can impair performance particularly when decisions require independent matches on these two factors.
Beyond Sinusoids: A Morlet Wavelet Framework for Transformer Positional Encoding
arXiv:2606.01258v1 Announce Type: new Abstract: Standard positional encodings for transformers - sinusoidal and rotary (RoPE) - treat every position as equally local: they encode where a token is, but not how far its positional influence should extend. We propose that the Morlet wavelet, which simultaneously minimises uncertainty in position and frequency, is the natural basis for positional encoding, and introduce Morlet Positional Encoding (MoPE): each embedding dimension learns its own...
EndPrompt: Efficient Long-Context Extension via Terminal Anchoring
Announce Type: replace Abstract: Extending the context window of large language models typically requires training on sequences at the target length, incurring quadratic memory and computational costs that make long-context adaptation expensive and difficult to reproduce. We propose EndPrompt, a method that achieves effective context extension using only short training sequences. The core insight is that exposing a model to long-range relative positional distances does not require...
PRISM: Position-encoded Regressive Inverse Spectral Model for Multilayer Thin-Film Design
arXiv:2605.26502v2 Announce Type: replace Abstract: The inverse problem of multilayer thin-film optical coatings design represents a complex combinatorial-continuous optimization challenge. We present PRISM (Position-encoded Regressive Inverse Spectral Model), a unified decoder-only autoregressive transformer that streamlines this process by jointly predicting discrete material selection and continuous thickness regression within a single backbone. PRISM introduces two primary architectural...
PRISM: Position-encoded Regressive Inverse Spectral Model for Multilayer Thin-Film Design
arXiv:2605.26502v2 Announce Type: replace-cross Abstract: The inverse problem of multilayer thin-film optical coatings design represents a complex combinatorial-continuous optimization challenge. We present PRISM (Position-encoded Regressive Inverse Spectral Model), a unified decoder-only autoregressive transformer that streamlines this process by jointly predicting discrete material selection and continuous thickness regression within a single backbone. PRISM introduces two primary...
HyperDiT: Hyper-Connected Transformers for High-Fidelity Pixel-Space Diffusion
arXiv:2605.15741v2 Announce Type: replace Abstract: Pixel-space diffusion models bypass the reconstruction bottleneck of Variational Autoencoders (VAEs) but face a fundamental "granularity dilemma": capturing global semantics favors large patch scales, while generating high-fidelity details demands fine-grained inputs. To address this issue, we propose HyperDiT, a unified framework establishing Hyper-Connected Cross-Scale Interactions to bridge the semantic and pixel manifold. Diverging from...
Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization
arXiv:2606.02288v1 Announce Type: new Abstract: Massive activation spikes in Large Language Models (LLMs) severely degrade quantization by stretching dynamic ranges. While prior hypotheses characterize these as high-level scalar biases, we argue that they are merely the scalar intermediates of rigid, structural vector biases in the spike-carrying tokens. We show that these tokens converge to constant vectors after normalization that drive the attention sink and value-state drain mechanisms.
PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration
arXiv:2502.00527v2 Announce Type: replace Abstract: The KV cache in large language models is a dominant factor in memory usage, limiting their broader applicability. Quantizing the cache to lower bit widths is an effective way to reduce computational costs; however, previous methods struggle with quantizing key vectors due to outliers, resulting in excessive overhead. We propose a novel quantization approach called PolarQuant, which efficiently addresses the outlier challenge.
Ahoy, DECmate II the little PDP-8 that could
Now, that's a lot of word processing. But under the hood it's still at least PDP-8 adjacent, even considering its oddities and incompatibilities, and you can make it do many of the things a full-size Eight can. We'll take this basic unit, convert the floppy drives to solid state, tap the video output, and put it through its paces.