Business & Finance
Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era
Key Points
arXiv:2510.04127v2 Announce Type: replace Abstract: Approximate nearest neighbour (ANN) search underpins large-scale retrieval, increasingly within the retrieval-augmented generation pipelines that ground large language models, yet the methods that address it have multiplied across communities until they are seldom read as a single field. We argue they form one field with three design choices, and develop the projection-quantisation-organisation (PQO) lens, under which locality-sensitive...
arXiv:2510.04127v2 Announce Type: replace
Abstract: Approximate nearest neighbour (ANN) search underpins large-scale retrieval, increasingly within the retrieval-augmented
generation pipelines that ground large language models, yet the methods that address it have multiplied across communities
until they are seldom read as a single field. We argue they form one field with three design choices, and develop the
projection-quantisation-organisation (PQO) lens, under which locality-sensitive hashing, learned binary hashing, deep
end-to-end hashing, product quantisation, graph-based indexes, and the binary embeddings of modern vector databases are all
settings of three coupled questions: where to place the projections, where to place the quantisation thresholds, and how to
organise the resulting codes. The projection-then-quantisation reading is established; our contribution is the third,
co-equal organisation stage, a demonstration that the three run unbroken from the field's origins to the deep,
product-quantisation, graph, and retrieval-augmented eras, and a reproducible measurement that turns the lens from
classifying methods to predicting them. The measurement yields three findings. First, memory is won on the quantisation axis:
a one-bit code is a thirty-second the size of the float, and a single full-precision re-ranking pass over a short candidate
list recovers uncompressed quality in full. Second, the trade-off orderings the lens anticipates recur unchanged as the
embedding grows. Third, where supervision is available, an eight-byte code more than doubles the quality of the two-kilobyte
float it replaces. We release these measurements as BitBudget, an extensible benchmark with a live leaderboard, recast
generative retrieval's "semantic identifiers" as quantisation codes, and identify the open problems that follow as compact
codes return to the centre of large-scale retrieval.