Home Knowledge Base Ollama

Ollama

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

Announce Type: new Abstract: Serial LLM inference backends -- such as Ollama -- process requests one at a time under FCFS admission, causing Head-of-Line Blocking (HOLB) under mixed workloads at high utilisation: short factual queries can be delayed by minutes behind long generation jobs. While cloud-scale deployments mitigate HOLB via continuous batching (vLLM, Orca), these solutions require tens of GB of VRAM for concurrent KV-caches -- infeasible for memory-constrained edge and local...

arXiv CS 2d ago

Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)

Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval — no cloud required. Most LLMs forget everything the moment a conversation ends.

Hacker News 6d ago

A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)

A 10 year old Xeon is all you need 17 minutes read The previous post covered getting Gemma 4’s MTP drafters quantized and paired with a verifier. This one is about running the result on a machine that has no business running it. I have a recycled server.

Hacker News 9d ago

Karpathy LLM Wiki pattern integrated into Obsidian agenic workflow

An autonomous AI agent inside your Obsidian vault. You describe a task, it plans, searches, reads, writes, and reports back. Every action is visible.

Hacker News 9d ago

Odysseus – self-hosted AI workspace

─────────────────────────────────────────────── ⊹ ࣪ ˖ ૮( ˶ᵔ ᵕ ᵔ˶ )っ Odysseus vers. 1.0 ─────────────────────────────────────────────── A self-hosted AI workspace -- meant to be the self-hosted version of the UI experience you get from ChatGPT and Claude. But with more jank and fun.

Hacker News 10d ago

Do You Actually Need to Pay for Transcription Software?

I'm constantly seeing ads for Wispr Flow, an AI-powered transcription tool. The pitch—that you'll be able to write faster by talking out loud instead of typing—is compelling, especially if you're a slow typist. The marketing promises you'll be able to "write at the speed of thought, 4x faster than your keyboard."

Wired 11d ago

Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency Since releasing Gemma 4 two months ago, we've been continuously working to expand its capabilities. First, we introduced Multi-Token Prediction (MTP) to accelerate inference, and just a couple of days ago, we released a 12B model to bridge the gap between our E4B and 26B MOE models. Today, we are releasing new checkpoints optimized with Quantization-Aware Training (QAT) to make Gemma 4 even more efficient, so...

Hacker News 5d ago

Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

arXiv:2606.01338v1 Announce Type: new Abstract: Biopharmaceutical manufacturing organizations operate under regulatory frameworks such as FDA guidance, EU Good Manufacturing Practice (GMP), and the EU AI Act, which can restrict the use of cloud-based artificial intelligence systems. Locally deployed large language models (LLMs) offer a privacy-preserving alternative, but their suitability for pharmaceutical manufacturing tasks remains underexplored. This study evaluates four open-source LLMs...

arXiv CS 8d ago

Show HN: Nightwatch, The open-source, read-only AI SRE

nightwatch is a local-first, read-only layer on top of your monitoring. it groups alert storm into incidents, flags noisy checks and has an agent that can investigate for you live systems. You can e.g. jump from the incident into the agent directly.the reason for this weekend project is that we had a kubernetes upgrade that went wrong, and at some point a rollback wasn't possible anymore, so it had to be fixed live during the night while several problems came together.

Hacker News 2d ago

Radxa Dragon Q8B: A Laptop Cosplaying as an SBC?

The Radxa Q6A was Radxa’s first Qualcomm-based SBC release last year, and even though we’re still deep in the midst of RAMageddon, Radxa are announcing the Dragon Q8B today, the very same day you’re reading this. Even I don’t know how I’ve managed that in 2026. At its core, a Qualcomm Snapdragon 8cx Gen 3 SoC running at 3GHz provides the horsepower, and provide it does.

Hacker News 9d ago