Home › Business & Finance › Projection and Quantisation: A Unifying View of Learning...

Business & Finance

Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Sean Moran 1 min read

Key Points

arXiv:2510.04127v2 Announce Type: replace Abstract: Approximate nearest neighbour (ANN) search underpins large-scale retrieval, increasingly within the retrieval-augmented generation pipelines that ground large language models, yet the methods that address it have multiplied across communities until they are seldom read as a single field. We argue they form one field with three design choices, and develop the projection-quantisation-organisation (PQO) lens, under which locality-sensitive hashing, learned binary hashing, deep end-to-end hashing, product quantisation, graph-based indexes, and the binary embeddings of modern vector databases are all settings of three coupled questions: where to place the projections, where to place the quantisation thresholds, and how to organise the resulting codes. The projection-then-quantisation reading is established; our contribution is the third, co-equal organisation stage, a demonstration that the three run unbroken from the field's origins to the deep, product-quantisation, graph, and retrieval-augmented eras, and a reproducible measurement that turns the lens from classifying methods to predicting them. The measurement yields three findings. First, memory is won on the quantisation axis: a one-bit code is a thirty-second the size of the float, and a single full-precision re-ranking pass over a short candidate list recovers uncompressed quality in full. Second, the trade-off orderings the lens anticipates recur unchanged as the embedding grows. Third, where supervision is available, an eight-byte code more than doubles the quality of the two-kilobyte float it replaces. We release these measurements as BitBudget, an extensible benchmark with a live leaderboard, recast generative retrieval's "semantic identifiers" as quantisation codes, and identify the open problems that follow as compact codes return to the centre of large-scale retrieval.

Random Projections (ORG) ANN (PERSON) BitBudget (ORG) quantisation (ORG)

Originally published by arXiv CS Read original →

Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era

Related Stories

Unique SpaceX IPO is hedging challenge for Wall Street: 'What are you going to do, short NASA?'

Less props for prediction markets under proposal

Musk Stock Fans Say ‘The More, The Better’ in SpaceX IPO Frenzy

Investment Committee's Message: Don't Overfocus on SpaceX IPO