Home › Knowledge Base › Quantized Tensor Trains

Quantized Tensor Trains

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Stable full-field simulation of a multiscale elliptic equation by means of Quantized Tensor Trains

Announce Type: replace Abstract: In this article, we design an original solver based on Quantized Tensor Trains (QTT) for linear elliptic equations with heterogeneous coefficient field, that allows for extremely fine meshes. It can achieve full-field simulations in dimensions $d=2$ and $d=3$ with a number of Degrees of Freedom (DoFs) up to $20$ orders of magnitude beyond the classical solvers, recovering accurately the solution as well as its gradient in the $\LL^2$ norm. For treating such...

arXiv CS 2d ago

Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency Since releasing Gemma 4 two months ago, we've been continuously working to expand its capabilities. First, we introduced Multi-Token Prediction (MTP) to accelerate inference, and just a couple of days ago, we released a 12B model to bridge the gap between our E4B and 26B MOE models. Today, we are releasing new checkpoints optimized with Quantization-Aware Training (QAT) to make Gemma 4 even more efficient, so...

Hacker News 5d ago

ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models

arXiv:2605.24011v2 Announce Type: replace Abstract: Vision-Language-Action (VLA) models exhibit remarkable action generation for embodied intelligence, but their heavy compute make deployment on edge platforms impractical. Aggressive, sub-4-bit weight quantization is the natural solution, yet existing post-training quantization (PTQ) methods suffer severe performance degradation in this regime. To address this, we introduce ActQuant, an action-guided mixed-precision PTQ framework that...

arXiv CS 2d ago

Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

arXiv:2606.02288v1 Announce Type: new Abstract: Massive activation spikes in Large Language Models (LLMs) severely degrade quantization by stretching dynamic ranges. While prior hypotheses characterize these as high-level scalar biases, we argue that they are merely the scalar intermediates of rigid, structural vector biases in the spike-carrying tokens. We show that these tokens converge to constant vectors after normalization that drive the attention sink and value-state drain mechanisms.

arXiv CS 8d ago

Learning Fine-grained Parameter Sharing via Sparse Tensor Decomposition

Announce Type: replace Abstract: Large neural networks achieve state-of-the-art performance on many tasks, yet their sheer size hinders deployment on resource-constrained devices. Among existing compression approaches, cross-layer parameter sharing remains relatively unexplored for transformer models.

arXiv CS 8d ago

AI Level of Detail: Distance-Aware ML Model Precision Selection for Real-Time Human Motion Prediction in Games

Announce Type: new Abstract: Modern game engines spend significant compute animating NPCs with learned motion models. This paper proposes AI Level of Detail (AI LOD), a framework in which machine learning inference precision is adapted based on the distance between each NPC and the player camera. The core idea mirrors classical geometry LOD: substitute a cheaper approximation where the difference is imperceptible.

arXiv CS 2d ago

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

arXiv:2605.30409v1 Announce Type: new Abstract: Real-time streaming video-to-video editing (V2V) is critical for interactive applications such as live broadcasting and gaming, yet it remains a formidable challenge due to the stringent requirements for temporal consistency and inference throughput. In this paper, we present SANA-Streaming, a system-algorithm co-designed framework for high-resolution, real-time streaming video editing on consumer GPUs, with the following three core designs:...

arXiv CS 9d ago

Nvidia Cosmos 3

Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what’s happening in their world, predict what’s likely to happen next, and generate actions for specific environments, embodiments, and tasks. NVIDIA Cosmos 3 is a frontier foundation model for physical AI that combines physical reasoning, world generation, and action generation within a single open model.

Hacker News 9d ago