Home Technology Show HN: Mnemo – local-first AI memory layer for any LLM...
Technology

Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)

Key Points

Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval — no cloud required. Most LLMs forget everything the moment a conversation ends.

Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval — no cloud required. Most LLMs forget everything the moment a conversation ends. mnemo fixes that. mnemo is a sidecar service that watches every conversation you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts — automatically, in under 50ms. It works with Ollama (fully local, free), OpenAI, Anthropic, or any OpenAI-compatible API. It ships as a single static binary with zero cloud dependency. your app │ ▼ POST /ingest ──► entity extraction (LLM) ──► knowledge graph (SQLite + petgraph) │ POST /retrieve ◄── scoring + ranking ◄── graph traversal + full-text search │ ▼ context_prompt ──► inject into your LLM prompt - You POST raw text to /ingest (a conversation turn, a document, a note). - mnemo sends it to your configured LLM and extracts entities (people, tools, places, concepts) and the relationships between them. - Entities are deduplicated by name+type, aliases are merged, and everything is written to SQLite. The in-memory petgraph is updated atomically. - On POST /retrieve , mnemo runs a 6-stage pipeline: full-text chunk search → entity name search → graph expansion (BFS over the knowledge graph) → relation filter → score+rank → assemble acontext_prompt string. - You inject context_prompt into your LLM's system prompt. Done. git clone https://github.com/zaydmulani09/mnemo cd mnemo docker compose up -d # Pull the llama3 model the first time (~4 GB) docker exec mnemo-ollama ollama pull llama3 # Verify everything is healthy curl http://localhost:8080/health cargo install --path crates/mnemo-api # With Ollama export MNEMO_LLM_BASE_URL=http://localhost:11434/v1 mnemo-api # With OpenAI export MNEMO_LLM_BASE_URL=https://api.openai.com/v1 export MNEMO_LLM_API_KEY=sk-... export MNEMO_LLM_MODEL=gpt-4o-mini export MNEMO_LLM_PROVIDER=openai mnemo-api pip install mnemo-sdk from mnemo import MnemoClient client = MnemoClient() # server at http://localhost:8080 # Store a memory client.ingest("I'm building a Rust vector database called vecdb") # Get context for injection into your next LLM prompt print(client.get_context("what am I working on?")) All endpoints accept and return application/json . Base URL: http://localhost:8080 . | Method | Path | Description | Request body | Response | |---|---|---|---|---| GET | /health | Server + DB + LLM status | — | HealthResponse | POST | /ingest | Store text, extract entities | IngestRequest | IngestResponse | POST | /retrieve | Retrieve ranked memory context | RetrievalQuery | RetrievalResult | GET | /entities | List entities (paginated) | ?limit&offset | Entity[] | GET | /entities/:id | Get entity by UUID | — | Entity | DELETE | /entities/:id | Delete entity (cascades) | — | {"deleted":true} | GET | /entities/:id/neighbors | Knowledge graph neighbors | ?depth (max 5) | GraphNode[] | GET | /chunks | List memory chunks (paginated) | ?limit&offset&session_id | MemoryChunk[] | GET | /chunks/:id | Get chunk by UUID | — | MemoryChunk | DELETE | /chunks/:id | Delete chunk | — | {"deleted":true} | POST | /search | Full-text search entities + chunks | {"query","limit"} | {"entities","chunks"} | DELETE | /wipe | Delete all memory (irreversible) | header: X-Confirm-Wipe: true | {"wiped":true} | GET | /stats | Entity/chunk/graph counts + uptime | — | StatsResponse | Key request/response types: // IngestRequest { "content": "string", // required — text to store "source": "string", // required — e.g. "chat", "email", "cli" "session_id": "string|null", // optional — group related chunks "metadata": {} // optional — arbitrary JSON } // RetrievalQuery { "text": "string", // required — query text "session_id": "string|null", // optional — filter by session "max_chunks": 10, // default 10 "max_entities": 20, // default 20 "min_confidence": 0.5, // default 0.5 "include_graph": true, // default true — expand via knowledge graph "graph_depth": 2 // default 2 — BFS depth for graph expansion } Full endpoint documentation with curl examples: docs/api.md | Variable | Default | Description | |---|---|---| MNEMO_DB_PATH | mnemo.db | SQLite database file path | MNEMO_PORT | 8080 | API server port | MNEMO_LLM_BASE_URL | http://localhost:11434/v1 | OpenAI-compatible LLM base URL | MNEMO_LLM_MODEL | llama3 | Model name for entity extraction | MNEMO_LLM_API_KEY | ollama | API key (any value works for Ollama) | MNEMO_LLM_PROVIDER | ollama | Provider type: ollama , openai , anthropic , custom | Pass --config path/to/config.toml to mnemo-api . See mnemo.example.toml : db_path = "mnemo.db" port = 8080 [llm] provider = "ollama" base_url = "http://localhost:11434/v1" model = "llama3" api_key = "ollama" timeout_secs = 30 max_retries = 3 max_tokens = 2048 temperature = 0.1 Environment variables take precedence over TOML values. The active config source is reported in GET /health → config_source . Install: cargo install --path crates/mnemo-cli Usage: # Store a memory mnemo ingest "I use Neovim and prefer dark mode" # Retrieve relevant context mnemo search "what editor do I use?" # List all extracted entities mnemo entities # Show entity detail + graph neighbors mnemo entity --neighbors # List memory chunks mnemo chunks # Server health mnemo health # Memory statistics mnemo stats # Delete everything (prompts for confirmation) mnemo wipe # Skip confirmation prompt mnemo wipe --yes # Point at a non-default server mnemo --server http://192.168.1.10:8080 stats Install: pip install mnemo-sdk See sdk/python/README.md for the full API reference. Async example: import asyncio from mnemo import AsyncMnemoClient async def main(): async with AsyncMnemoClient() as client: await client.ingest( "Alice is a principal engineer at Stripe working on payment infrastructure.", session_id="session-001", ) context = await client.get_context( "what does Alice work on?", session_id="session-001", ) print(context) asyncio.run(main()) A working standalone example: examples/basic_usage.py Four Rust crates wired together: | Crate | Type | Role | |---|---|---| mnemo-core | lib | Entity extraction, graph ops, retrieval engine, DB layer | mnemo-api | bin | Axum REST API — thin handler layer over mnemo-core | mnemo-cli | bin | CLI tool using blocking reqwest against the API | mnemo-bench | bin | Performance benchmarks (12 suites) | Full architecture documentation: docs/architecture.md Benchmarked on Apple M2, SQLite WAL mode, in-memory petgraph. Debug build numbers — release build (--release ) is 3–5× faster. | Operation | Avg latency | Throughput | |---|---|---| | Entity insert (SQLite) | ~0.12 ms | ~8,300 ops/s | | Entity lookup by ID | ~0.08 ms | ~12,500 ops/s | | Chunk insert | ~0.14 ms | ~7,100 ops/s | | Full-text chunk search | ~0.28 ms | ~3,500 ops/s | | Graph neighbor (depth=1) | ~0.21 ms | ~4,700 ops/s | | Graph neighbor (depth=2) | ~0.89 ms | ~1,100 ops/s | | Full retrieval pipeline | ~4.2 ms | ~238 ops/s | Run cargo run -p mnemo-bench to benchmark on your hardware. cargo test --workspace # run all 122 tests make coverage # HTML coverage report (requires cargo-llvm-cov) make coverage-summary # summary to stdout cd sdk/python && pytest tests/ -v cargo run -p mnemo-bench # all 12 benchmarks cargo run -p mnemo-bench -- --filter graph # graph benchmarks only cargo run -p mnemo-bench -- --json out.json # save results to JSON Current test counts: 122 Rust tests · 21 Python tests · 12 benchmarks PRs welcome. Please run make fmt && make lint before submitting. Open an issue first for large changes. See CONTRIBUTING.md for full setup instructions, code style guide, and how to add a new LLM provider. MIT — see LICENSE
LLM (ORG) SQLite (ORG) mnemo (PERSON) Ollama (PERSON) API (ORG) name+type (ORG) BFS (ORG) GB (LOCATION) MNEMO_LLM_BASE_URL (LOCATION) MnemoClient (ORG) HealthResponse (ORG) IngestRequest (ORG) RetrievalResult (ORG)
Originally published by Hacker News Read original →