Home › Knowledge Base › VRAM

VRAM

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Use your Nvidia GPU's VRAM as swap space on Linux

Use your NVIDIA GPU's VRAM as swap space on Linux. Built for laptops with soldered memory and no upgrade path. If you have an RTX card sitting there with 8GB of VRAM and you're getting swapped to SSD, this puts that VRAM to work.

Hacker News 7d ago

Rotary GPU: Exploring Local Execution for Large MoE Models Under Limited VRAM

Performance [Submitted on 27 May 2026] Title:Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory View PDF HTML (experimental)Abstract:Large language models have achieved remarkable capabilities through scaling, and this paper does not challenge that. It instead investigates a different question: once large models already exist, can they become more accessible to environments with substantially smaller hardware resources?

Hacker News 10d ago

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

Announce Type: new Abstract: The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episode on bandwidth-limited edge hardware, where high-bandwidth memory and flash are scarce, flash has finite write endurance, and memory writes rather than compute can become the binding constraint.

arXiv CS 7d ago

I Put a Datacenter GPU in My Gaming PC for £200

I Put a Datacenter GPU in My Gaming PC for £200 I already had an RTX 4080. Good enough for gaming, not good enough for the models I wanted to run locally. The next step up in GPU land is either spend a fortune on a card with more VRAM, or find another way.

Hacker News 10d ago

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

Announce Type: new Abstract: Serial LLM inference backends -- such as Ollama -- process requests one at a time under FCFS admission, causing Head-of-Line Blocking (HOLB) under mixed workloads at high utilisation: short factual queries can be delayed by minutes behind long generation jobs. While cloud-scale deployments mitigate HOLB via continuous batching (vLLM, Orca), these solutions require tens of GB of VRAM for concurrent KV-caches -- infeasible for memory-constrained edge and local...

arXiv CS 2d ago

PlayStation Architecture

Supporting imagery A quick introduction Sony knew that 3D hardware could get very messy to develop for. Thus, their debuting console will keep its design simple and practical… Although this may come at a cost!

Hacker News 7d ago

Nvidia's entrance into the PC market gives investors another reason to own the stock

Nvidia has added another leg to its investment case, planted far away from the data center. It's on your desk at the office and at home. At the influential Computex conference in Taiwan, CEO Jensen Huang focused the first half of his keynote address on the data center and the wonders of Nvidia's Vera computing platform for agentic AI workloads.

CNBC 9d ago

Fixed-Point Masked Generative Modeling

new Abstract: Masked Generative Models (MGMs) enable parallel decoding and achieve strong performance across modalities, but require full-sequence bidirectional transformers at every step, making training costly and degrading quality under low sampling budgets. Existing work improves efficiency via better samplers or cheaper fixed-depth denoisers, but they still allocate a fixed amount of denoiser computation to each refinement step. We introduce Fixed-Point Masked Generative Models...

arXiv CS 9d ago

Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency Since releasing Gemma 4 two months ago, we've been continuously working to expand its capabilities. First, we introduced Multi-Token Prediction (MTP) to accelerate inference, and just a couple of days ago, we released a 12B model to bridge the gap between our E4B and 26B MOE models. Today, we are releasing new checkpoints optimized with Quantization-Aware Training (QAT) to make Gemma 4 even more efficient, so...

Hacker News 5d ago

AMD Radeon RX 9070 GRE review: A cheaper GPU for a wildly expensive era

GRE review: A cheaper GPU for a wildly expensive era It may be a lesser RX 9070, but it’s still a solid 1440p gaming performer. If you haven't noticed yet, it's a pretty bad time to buy hardware, PCs and anything that needs RAM. You can thank the AI companies for that.

Engadget 8d ago