Embeddings
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
jina-embeddings-v5-omni: Geometry-preserving Embeddings via Locked Aligned Towers
Announce Type: replace Abstract: In this work, we introduce GELATO (Geometry-preserving Embeddings via Locked Aligned TOwers), a novel approach to multimodal embedding models. We build on the VLM-style architecture, in which non-text encoders are adapted to produce input for a language model, which in turn generates embeddings for all varieties of input. We present the result: the jina-embeddings-v5-omni suite, a pair of models that encode text, image, audio, and video input into a single...
SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia
Announce Type: new Abstract: Text embeddings are fundamental to many downstream applications, making robustness important for real-world NLP. However, most recent state-of-the-art embedding models are not reproducible because they rely on closed or undisclosed training data, and they remain insufficiently robust for Southeast Asian languages. We present SEA-Embedding, a fully open and reproducible text-embedding pipeline for Southeast Asian languages trained only on publicly available data,...
Conan-embedding-v3: Fusing Modality-Specific Models for Omni-Modal Embedding
arXiv:2606.09331v1 Announce Type: new Abstract: Omni-modal retrieval promises a single embedding space for text, image, video, document, and audio inputs, but building such a unified retriever is difficult since these modalities differ in data distribution, architecture, and optimization dynamics. In this work, we present Conan-embedding-v3, a decouple--fuse--recover framework for omni-modal retrieval. Conan-embedding-v3 first trains modality specialists independently and fuses their task...
Don't Forget Your Embeddings: Robust Knowledge Erasure via Precise Editing of Embeddings
arXiv:2606.03695v1 Announce Type: new Abstract: As language models are increasingly deployed in real-world applications, the ability to erase specific knowledge from them becomes critical for safety and compliance. Prominent methods seek persistent removal by updating the model's parameters, yet the target knowledge often can be recovered through adversarial prompting or relearning. In this work, we hypothesize this limitation stems in part from existing methods overlooking the embedding layer.
Embedding linear codes over Z4 into self-orthogonal codes
Announce Type: new Abstract: The purpose of this paper is to investigate the self-orthogonal embedding problem for linear codes over Z4. We propose several tight bounds on the length of the shortest self-orthogonal embedding over Z4, and determine the exact shortest self-orthogonal embedding length under specific conditions. As an example satisfying these conditions, we establish the exact length of the shortest self-orthogonal embedding for the quaternary Preparata codes.
Private Embedding Lookup with Encrypted Compact Queries under Fully Homomorphic Encryption
Announce Type: replace Abstract: Many NLP or recommendation models begin by mapping discrete client inputs to embedding vectors. Since inputs can reveal sensitive information, the embedding step must be protected in privacy-preserving inference. Fully Homomorphic Encryption (FHE) enables inference over encrypted client data, but turns embedding lookup from simple table access into homomorphic computation.
Private Embedding Lookup with Encrypted Compact Queries under Fully Homomorphic Encryption
arXiv:2606.03191v1 Announce Type: new Abstract: Many NLP or recommendation models begin by mapping discrete client inputs to embedding vectors. Since inputs can reveal sensitive information, the embedding step must be protected in privacy-preserving inference. Fully Homomorphic Encryption (FHE) enables inference over encrypted client data, but turns embedding lookup from simple table access into homomorphic computation.
Private Embedding Lookup with Encrypted Compact Queries under Fully Homomorphic Encryption
arXiv:2606.03191v2 Announce Type: replace Abstract: Many NLP or recommendation models begin by mapping discrete client inputs to embedding vectors. Since inputs can reveal sensitive information, the embedding step must be protected in privacy-preserving inference. Fully Homomorphic Encryption (FHE) enables inference over encrypted client data, but turns embedding lookup from simple table access into homomorphic computation.
Polynomial Trajectory Compression for Protein Language Model Embeddings
Protein language models (PLMs) generate rich, layer-wise embeddings that capture diverse biological information but are expensive in terms of storage and computation at scale. In this work, we propose a compact surrogate representation for PLM embeddings across transformer layers using low-dimensional PCA projections and cubic polynomial trajectories. This approach enables efficient storage and on-demand reconstruction of these protein-level embeddings at any layer without rerunning the PLM.
SAILRec: Steering LLM Attention to Dual-Side Semantically Aligned Collaborative Embeddings for Recommendation
arXiv:2606.04514v1 Announce Type: new Abstract: Recent LLM-based recommenders enhance language models with collaborative embeddings from user-item interactions, but making such embeddings available does not ensure their proper use during inference. Through a diagnostic attention analysis, we find that the utilization of collaborative embeddings is depth-dependent and alignment-sensitive, suggesting that LLMs need to balance their internal semantic knowledge with external collaborative...