Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)

Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval — no cloud required. Most LLMs forget everything the moment a conversation ends. mnemo fixes that. mnemo is a sidecar service that watches every conversation you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts — automatically, in under 50ms. It works with Ollama (fully local, free), OpenAI, Anthropic, or any OpenAI-compatible API. It ships as a single static binary with zero cloud dependency. your app │ ▼ POST /ingest ──► entity extraction (LLM) ──► knowledge graph (SQLite + petgraph) │ POST /retrieve ◄── scoring + ranking ◄── graph traversal + full-text search │ ▼ context_prompt ──► inject into your LLM prompt - You POST raw text to /ingest (a conversation turn, a document, a note). - mnemo sends it to your configured LLM and extracts entities (people, tools, places, concepts) and the relationships between them. - Entities are deduplicated by name+type, aliases are merged, and everything is written to SQLite. The in-memory petgraph is updated atomically. - On POST /retrieve , mnemo runs a 6-stage pipeline: full-text chunk search → entity name search → graph expansion (BFS over the knowledge graph) → relation filter → score+rank → assemble acontext_prompt string. - You inject context_prompt into your LLM's system prompt. Done. git clone https://github.com/zaydmulani09/mnemo cd mnemo docker compose up -d # Pull the llama3 model the first time (~4 GB) docker exec mnemo-ollama ollama pull llama3 # Verify everything is healthy curl http://localhost:8080/health cargo install --path crates/mnemo-api # With Ollama export MNEMO_LLM_BASE_URL=http://localhost:11434/v1 mnemo-api # With OpenAI export MNEMO_LLM_BASE_URL=https://api.openai.com/v1 export MNEMO_LLM_API_KEY=sk-... export MNEMO_LLM_MODEL=gpt-4o-mini export MNEMO_LLM_PROVIDER=openai mnemo-api pip install mnemo-sdk from mnemo import MnemoClient client = MnemoClient() # server at http://localhost:8080 # Store a memory client.ingest("I'm building a Rust vector database called vecdb") # Get context for injection into your next LLM prompt print(client.get_context("what am I working on?")) All endpoints accept and return application/json . Base URL: http://localhost:8080 . | Method | Path | Description | Request body | Response | |---|---|---|---|---| GET | /health | Server + DB + LLM status | — | HealthResponse | POST | /ingest | Store text, extract entities | IngestRequest | IngestResponse | POST | /retrieve | Retrieve ranked memory context | RetrievalQuery | RetrievalResult | GET | /entities | List entities (paginated) | ?limit&offset | Entity[] | GET | /entities/:id | Get entity by UUID | — | Entity | DELETE | /entities/:id | Delete entity (cascades) | — | {"deleted":true} | GET | /entities/:id/neighbors | Knowledge graph neighbors | ?depth (max 5) | GraphNode[] | GET | /chunks | List memory chunks (paginated) | ?limit&offset&session_id | MemoryChunk[] | GET | /chunks/:id | Get chunk by UUID | — | MemoryChunk | DELETE | /chunks/:id | Delete chunk | — | {"deleted":true} | POST | /search | Full-text search entities + chunks | {"query","limit"} | {"entities","chunks"} | DELETE | /wipe | Delete all memory (irreversible) | header: X-Confirm-Wipe: true | {"wiped":true} | GET | /stats | Entity/chunk/graph counts + uptime | — | StatsResponse | Key request/response types: // IngestRequest { "content": "string", // required — text to store "source": "string", // required — e.g. "chat", "email", "cli" "session_id": "string|null", // optional — group related chunks "metadata": {} // optional — arbitrary JSON } // RetrievalQuery { "text": "string", // required — query text "session_id": "string|null", // optional — filter by session "max_chunks": 10, // default 10 "max_entities": 20, // default 20 "min_confidence": 0.5, // default 0.5 "include_graph": true, // default true — expand via knowledge graph "graph_depth": 2 // default 2 — BFS depth for graph expansion } Full endpoint documentation with curl examples: docs/api.md | Variable | Default | Description | |---|---|---| MNEMO_DB_PATH | mnemo.db | SQLite database file path | MNEMO_PORT | 8080 | API server port | MNEMO_LLM_BASE_URL | http://localhost:11434/v1 | OpenAI-compatible LLM base URL | MNEMO_LLM_MODEL | llama3 | Model name for entity extraction | MNEMO_LLM_API_KEY | ollama | API key (any value works for Ollama) | MNEMO_LLM_PROVIDER | ollama | Provider type: ollama , openai , anthropic , custom | Pass --config path/to/config.toml to mnemo-api . See mnemo.example.toml : db_path = "mnemo.db" port = 8080 [llm] provider = "ollama" base_url = "http://localhost:11434/v1" model = "llama3" api_key = "ollama" timeout_secs = 30 max_retries = 3 max_tokens = 2048 temperature = 0.1 Environment variables take precedence over TOML values. The active config source is reported in GET /health → config_source . Install: cargo install --path crates/mnemo-cli Usage: # Store a memory mnemo ingest "I use Neovim and prefer dark mode" # Retrieve relevant context mnemo search "what editor do I use?" # List all extracted entities mnemo entities # Show entity detail + graph neighbors mnemo entity --neighbors # List memory chunks mnemo chunks # Server health mnemo health # Memory statistics mnemo stats # Delete everything (prompts for confirmation) mnemo wipe # Skip confirmation prompt mnemo wipe --yes # Point at a non-default server mnemo --server http://192.168.1.10:8080 stats Install: pip install mnemo-sdk See sdk/python/README.md for the full API reference. Async example: import asyncio from mnemo import AsyncMnemoClient async def main(): async with AsyncMnemoClient() as client: await client.ingest( "Alice is a principal engineer at Stripe working on payment infrastructure.", session_id="session-001", ) context = await client.get_context( "what does Alice work on?", session_id="session-001", ) print(context) asyncio.run(main()) A working standalone example: examples/basic_usage.py Four Rust crates wired together: | Crate | Type | Role | |---|---|---| mnemo-core | lib | Entity extraction, graph ops, retrieval engine, DB layer | mnemo-api | bin | Axum REST API — thin handler layer over mnemo-core | mnemo-cli | bin | CLI tool using blocking reqwest against the API | mnemo-bench | bin | Performance benchmarks (12 suites) | Full architecture documentation: docs/architecture.md Benchmarked on Apple M2, SQLite WAL mode, in-memory petgraph. Debug build numbers — release build (--release ) is 3–5× faster. | Operation | Avg latency | Throughput | |---|---|---| | Entity insert (SQLite) | ~0.12 ms | ~8,300 ops/s | | Entity lookup by ID | ~0.08 ms | ~12,500 ops/s | | Chunk insert | ~0.14 ms | ~7,100 ops/s | | Full-text chunk search | ~0.28 ms | ~3,500 ops/s | | Graph neighbor (depth=1) | ~0.21 ms | ~4,700 ops/s | | Graph neighbor (depth=2) | ~0.89 ms | ~1,100 ops/s | | Full retrieval pipeline | ~4.2 ms | ~238 ops/s | Run cargo run -p mnemo-bench to benchmark on your hardware. cargo test --workspace # run all 122 tests make coverage # HTML coverage report (requires cargo-llvm-cov) make coverage-summary # summary to stdout cd sdk/python && pytest tests/ -v cargo run -p mnemo-bench # all 12 benchmarks cargo run -p mnemo-bench -- --filter graph # graph benchmarks only cargo run -p mnemo-bench -- --json out.json # save results to JSON Current test counts: 122 Rust tests · 21 Python tests · 12 benchmarks PRs welcome. Please run make fmt && make lint before submitting. Open an issue first for large changes. See CONTRIBUTING.md for full setup instructions, code style guide, and how to add a new LLM provider. MIT — see LICENSE

Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)

Related Stories

Google will save your Lens photos, Search Live recordings, and Translate audio for AI training

ASML to Cut Fewer Jobs Than Planned After Union Negotiations

Engadget Podcast: WWDC 2026 thoughts from Apple Park

German court holds Google liable for false AI Overview answers