Home Knowledge Base the Theoretical Limitations of Embedding

the Theoretical Limitations of Embedding

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

On the Theoretical Limitations of Embedding-based Link Prediction

Announce Type: replace Abstract: Neural networks often map low-dimensional embeddings to high-dimensional output spaces. Usually, the output layer is linear, which can create a "rank bottleneck" that limits the functions a model can represent.

arXiv CS 8d ago

$\mathbb{R}^{2k}$ is Theoretically Large Enough for Embedding-based Top-$k$ Retrieval

Announce Type: replace Abstract: This paper studies the Minimal Embeddable Dimension (MED): the least dimension in which there exists a configuration of $m$ object vectors so that every subset of size at most $k$ is exactly retrieved by score comparison. Our result shows MED is $\Theta(k)$, independent of $m$, for inner product, Euclidean distance, and cosine similarity. We then consider Robust MED (RMED), where all vectors are unit normed and an $\epsilon$ gap of scores is required.

arXiv CS 7d ago

Generative Quantum Data Embeddings for Supervised Learning

Announce Type: cross Abstract: Many practically relevant applications of quantum machine learning involve classical data, for which performance depends critically on how inputs are embedded into quantum states. Yet the use of a fixed embedding circuit ansatz remains standard practice.

arXiv CS 9d ago

ColBERTSaR: Sparsified ColBERT Index via Product Quantization

Announce Type: new Abstract: While ColBERT is an effective neural retrieval architecture, it requires a heavy index structure to support candidate set retrieval based on approximated token embeddings, gathering and decompressing document token embeddings, and applying the MaxSim operation. Indexes in PLAID and similar ColBERT implementations require five to ten times the disk storage of the original raw text, which limits their scalability. Furthermore, prior work has identified that the...

arXiv CS 5d ago

GJDNet: Robust Graph Neural Networks via Joint Disentangled Learning Against Adversarial Attacks

arXiv:2606.01560v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) are vulnerable to adversarial attacks, which inherently invert connectivity patterns by introducing disassortative edges in assortative graphs and assortative edges in disassortative graphs. This structural inversion creates structure-feature mismatches that disrupt neighborhood aggregation across different graph types.

arXiv CS 8d ago

Beyond Sinusoids: A Morlet Wavelet Framework for Transformer Positional Encoding

arXiv:2606.01258v1 Announce Type: new Abstract: Standard positional encodings for transformers - sinusoidal and rotary (RoPE) - treat every position as equally local: they encode where a token is, but not how far its positional influence should extend. We propose that the Morlet wavelet, which simultaneously minimises uncertainty in position and frequency, is the natural basis for positional encoding, and introduce Morlet Positional Encoding (MoPE): each embedding dimension learns its own...

arXiv CS 8d ago

SpikeWFM: Spiking-Aided Wireless Foundation Model for Robust Channel Prediction

Announce Type: cross Abstract: This paper proposes SpikeWFM, a novel hybrid architecture that integrates spiking neural networks (SNNs) with conventional artificial neural network (ANN)-based transformers for wireless foundation models (WFMs). Inspired by the noise-robust and energy-efficient information processing in the human brain, SpikeWFM aims to enhance the resilience of WFMs against noise and interference while maintaining strong generalization capabilities across diverse wireless...

arXiv CS 8d ago

Beyond Semantic Understanding: Preserving Collaborative Frequency Components in LLM-based Recommendation

arXiv:2508.10312v2 Announce Type: replace Abstract: Recommender systems in concert with Large Language Models (LLMs) present promising avenues for generating semantically-informed recommendations. However, LLM-based recommenders exhibit a tendency to overemphasize semantic correlations within users' interaction history. When taking pretrained collaborative ID embeddings as input, LLM-based recommenders progressively weaken the inherent collaborative signals as the embeddings propagate...

arXiv CS 8d ago

InfoShield: Privacy-Preserving Speech Representations for Mental Health Screening via Information-Theoretic Optimization

arXiv:2606.05561v1 Announce Type: new Abstract: Speech-based mental health screening offers scalable depression detection, yet clinical deployment faces a significant barrier: users' privacy concerns about demographic information exposure. Current techniques struggle to resolve this conflict. Adversarial training often fails against unseen threats, whereas Differential Privacy tends to compromise diagnostic performance by injecting noise across all features.

arXiv CS 5d ago

SteerFace: Debiasing Synthetic Face Generation via Adaptive Residue Perturbation

arXiv:2605.30894v1 Announce Type: new Abstract: The shortage of legally compliant data for face recognition training has sparked growing interest in using synthetic data as an alternative. While recent diffusion-based methods enable the generation of photorealistic face images with strong identity adherence and data diversity, their downstream recognition performance still exhibits a significant synthetic-real gap. This paper identifies visual tendency as a previously underexplored...

arXiv CS 9d ago