Home Knowledge Base HNSW

HNSW

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Self-Conditioned Positional HNSW for Overlap-Aware Retrieval in Chunked-Document RAG Systems: Method and Industrial Evidence-Quality Audit

Announce Type: new Abstract: Chunked-document retrieval is a common component of retrieval-augmented generation (RAG) systems. Documents are split into overlapping chunks, embedded, and indexed with approximate nearest-neighbor search such as hierarchical navigable small world graphs (HNSW). Overlap improves boundary coverage but induces a practical failure mode: top-k retrieval often returns near-adjacent chunks that repeat evidence and waste prompt budget.

arXiv CS 8d ago

HNSW-MS: Hierarchical Graph Indexing Enables Accurate Real-Time Mass Spectral Similarity Search at Repository Scale

Spectral similarity search is the basis of mass spectrometry-based metabolomics, underpinning library matching, molecular networks construction, and repository searches such as MASST. Until recently, dataset sizes were limited, making exhaustive pairwise comparison tractable. This is no longer true.

bioRxiv 2d ago

Aperon Technical Report: Hierarchical No-Pointer Tangent-Local Search for High-Dimensional Approximate Nearest Neighbors

arXiv:2606.08813v1 Announce Type: new Abstract: We present HNTL (Hierarchical No-pointer Tangent-Local), the core vector indexing and candidate generation framework of the Aperon vector memory system. Proximity graphs (e.g., HNSW) incur a heavy pointer tax in memory overhead and induce irregular memory accesses that stall CPU pipelines. HNTL resolves this by partitioning the high-dimensional space into local, coherent grains, representing vectors as low-dimensional coordinates on local...

arXiv CS 1d ago

FOLD: Fuzzy Online Deduplication for Very Large Evolving Datasets via Approximate Nearest Neighbor Search

Announce Type: new Abstract: Fuzzy deduplication is key to constructing large language model training corpora. However, classic Locality-Sensitive Hashing pipelines scale poorly as corpora grow and are ill-suited to continuous ingestion. We present FOLD (Fuzzy Online Deduplication), an online fuzzy deduplication system that delivers high recall and throughput for evolving datasets.

arXiv CS 7d ago

Auditing Privacy in Multi-Tenant RAG under Account Collusion

arXiv:2605.19847v2 Announce Type: replace Abstract: Multi-tenant RAG services often treat the account as the privacy boundary: each account receives an $(\varepsilon_{\text{acc}},\delta_{\text{acc}})$-DP retrieval guarantee against the tenant index. We show that this framing understates leakage under same-index account collusion. For Gaussian noise-then-select retrieval, $k$ coordinated same-tenant accounts compose to joint leakage $\Theta(\sqrt{k}\,\varepsilon_{\text{acc}})$, not...

arXiv CS 8d ago

ACRONYM: Accelerated Approximate Nearest Neighbor Search in Memory for Dynamic Vector Databases

arXiv:2606.03151v1 Announce Type: new Abstract: Vector database search with frequent updates is increasingly critical in applications such as retrieval augmented generation, recommendation systems, and large-scale embedding retrieval. Existing solutions, such as graph-based and partition-based approximate nearest neighbor search (ANNS), suffer from frequent index rebuilding due to data distribution-dependent indexing that impacts continuous deployment and causes long rebuilding latency.

arXiv CS 7d ago

Don't Stir the Pot! Authorized Vector Data Retrieval via Access-Aware Indexing

arXiv:2605.01342v3 Announce Type: replace Abstract: Vector databases increasingly enforce role-based access control, where each top-k approximate nearest neighbor query must return only vectors the querying role is authorized to access. Two extremes bracket the design space. A single global index built over all vectors avoids duplication but wastes search effort on unauthorized vectors and degrades recall, while an oracle index, built with all authorized vectors to the query roles, searches...

arXiv CS 6d ago