Home › Knowledge Base › quantisation

quantisation

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era

arXiv:2510.04127v2 Announce Type: replace Abstract: Approximate nearest neighbour (ANN) search underpins large-scale retrieval, increasingly within the retrieval-augmented generation pipelines that ground large language models, yet the methods that address it have multiplied across communities until they are seldom read as a single field. We argue they form one field with three design choices, and develop the projection-quantisation-organisation (PQO) lens, under which locality-sensitive...

arXiv CS 1d ago

Tiny Collaborative Inference for Occlusion-Robust Object Detection

arXiv:2606.02894v2 Announce Type: replace Abstract: Edge AI nodes for search and rescue are increasingly expected to run computer vision locally, yet ultra-low-end hardware imposes hard constraints on memory, compute, and inter-device communication. This work addresses occlusion-robust object detection on devices with less than 1 MB SRAM by combining an MCUNet backbone, a YOLOv2 detection head, and Lite quantisation. Two collaborative inference strategies are evaluated: feature-level fusion,...

arXiv CS 6d ago

Data Compression with Stochastic Codes

arXiv:2602.07635v2 Announce Type: replace Abstract: Machine learning has had a major impact on data compression over the last decade and opened up many new theoretical and applied fields of inquiry. This paper describes one such direction -- relative entropy coding -- which focuses on constructing stochastic codes, mainly as an alternative to quantisation and entropy coding in lossy source coding. Our primary aim is to provide a broad overview of the topic, with an emphasis on the...

arXiv CS 1d ago

Deep Psychovisual Image Representations

Announce Type: replace Abstract: Psychovisual models suggest human vision decouples low-level feature extraction from higher cognition by first forming intermediate abstractions. In contrast, deep learning-based vision models routinely extract and aggregate features using homogeneous stacks of spatial layers, rendering their decision-making processes opaque. In this paper, we propose Deep Visual Coding, a learned frequency-domain representation inspired by 1990s image codes that quantised...

arXiv CS 8d ago

LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load

arXiv:2603.23640v2 Announce Type: replace Abstract: Deploying large language models on-device for always-on personal agents demands sustained inference from hardware tightly constrained in power, thermal envelope, and memory. We benchmark Qwen 2.5 1.5B (4-bit quantised) across four platforms: a Raspberry Pi 5 with Hailo-10H NPU, a Samsung Galaxy S24 Ultra, an iPhone 16 Pro, and a laptop NVIDIA RTX 4050 GPU. Using a fixed 258-token prompt over 20 warm-condition iterations per device, we...

arXiv CS 1d ago

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

arXiv:2605.30571v1 Announce Type: new Abstract: Physical AI systems, including robots, autonomous vehicles, embodied agents and edge copilots, often run a different inference workload from cloud LLM serving: single-stream, batch-1 autoregressive decode, where one robot, camera feed or user session waits on the next token. This workload is usually described as memory-bandwidth-bound. Each decode step streams model weights and the active KV cache, so latency should scale with peak HBM bandwidth.

arXiv CS 9d ago

Solving Inverse Problems with Flow-based Models via Model Predictive Control

arXiv:2601.23231v2 Announce Type: replace-cross Abstract: Flow-based generative models provide strong unconditional priors for inverse problems, but guiding their dynamics for conditional generation remains challenging. Recent work casts training-free conditional generation in flow models as an optimal control problem; however, solving the resulting trajectory optimisation is computationally and memory intensive, requiring differentiation through the flow dynamics or adjoint solves. We...

arXiv CS 1d ago

Hierarchical Certified Semantic Commitment for Byzantine-Resilient LLM-Agent Collaboration

arXiv:2606.07316v1 Announce Type: new Abstract: Byzantine collaboration among large-language-model agents requires a finality-control primitive: given delivered stochastic, structured natural-language proposals, the protocol must decide whether the round supports a commit, what kind of commit, or a typed safe abort. Naive aggregation hides this choice behind a single verdict; classical Byzantine fault tolerance hides it behind byte-identity that LLM proposals do not satisfy. We introduce...

arXiv CS 2d ago

Bringing Up DeepSeek-V4-Flash on AMD MI300X

Bringing up DeepSeek-V4-Flash on AMD MI300X At Doubleword we are building an inference cloud designed for volume. To do that we have to reckon with the enveloping compute shortage. AMD’s MI300X launched in December 2023At AMD’s “Advancing AI” event, 6 December 2023.

Hacker News 8d ago

Tiny Collaborative Inference for Occlusion-Robust Object Detection

Announce Type: new Abstract: Small edge devices such as IoT surveillance nodes and search-and-rescue (SAR) platforms are increasingly expected to run computer vision locally. On ultra-low-end hardware, however, object detection is limited by available memory and compute, by communication costs when several devices cooperate, and by the loss of accuracy caused by occlusion. The work evaluates occlusion-robust object detection on devices with less than 1 MB SRAM by combining an MCUNet...

arXiv CS 7d ago