Memory | Antoine Weill--Duflos

cortexmd: a long-term memory and code-navigation brain for AI agents

Wed, 03 Jun 2026 00:00:00 +0000

cortexmd is a long-term memory and code-navigation brain for AI agents, exposed over the Model Context Protocol. It started as a private project on my homelab called obsidian-mcp, a server that let Claude read, search, and write notes in my Obsidian vault. I built it for myself, then cleaned it up to share.

It does two things.

The first is memory. Agents forget everything between sessions. cortexmd gives them somewhere to put what they learn: memories auto-categorised into kinds like observation, decision, insight, and plan, with a heat lifecycle where reading a memory warms it and inactivity cools it down. Recall is hybrid, fusing full-text and semantic search, boosted by temperature and links. At the start of a session the agent does a wakeup that surfaces the hottest, most relevant memories, so it picks up where it left off.

The second is code navigation. A Rust indexer walks a repo, parses it with tree-sitter, and builds a SQLite symbol database recording each symbol’s name, kind, signature, docstring, file range, and call graph. That index is exposed as cheap MCP tools: symbol search, file outline, callers and callees, change-impact, call-chain, dead code, import cycles, and copy-paste duplicates. The design goal is that an agent navigates code by querying the index, at roughly 60 tokens per result, instead of reading whole files. There is an opt-in shell hook that rewrites things like grep and cat on an indexed repo into the equivalent code-nav call.

The piece that made it shippable is the brain-vault model. cortexmd owns a separate brain vault that is the only thing it ever writes to. Your own vaults are attached as read-only sources, indexed for search and code-nav, never modified, with a default-deny allowlist so private subtrees stay out. Data flows one way, so there is no shared mutable file and no merge race.

  SOURCE_VAULTS[]  (read-only, opt-in, allowlisted)
  ┌───────────┐  ┌───────────┐  ┌───────────┐
  │  notes/   │  │  code/    │  │  docs/    │
  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘
        │  index (one-way, read)      │
        └──────────────┼──────────────┘
                       ▼
              ┌──────────────────┐
              │     cortexmd     │   <- sole writer
              │   (MCP server)   │
              └────────┬─────────┘
                       │ writes
                       ▼
              ┌──────────────────┐
              │   BRAIN_VAULT    │   memories · journal · diaries
              │ (own dir, not    │   tasks · KG notes · code-repos.json
              │  your vault)     │
              └──────────────────┘

It runs in two modes: a local-stdio mode with no Docker, no auth, and no network, recommended for one person; and a self-hosted HTTP mode with auth for multi-client setups. The repo is a polyglot monorepo, a TypeScript MCP server and a single Rust binary, kept honest by a shared contract and a CI parity check.

cortexmd is pre-alpha and MIT licensed. APIs and config names are still in flux.

The full story is in a four-part blog series. Start with Giving an AI Agent a Second Brain.

The Memory Engine: Heat, Decay, and Dreams

Fri, 05 Jun 2026 00:00:00 +0000

In part one I described two problems that kept biting me while working with AI agents. The first was that they forget everything between sessions. The second was that they burn tokens re-reading code they have already seen. This post is about the first problem, and the part of cortexmd I am most attached to: the memory engine. The overall approach is inspired by mempalace, a memory-palace project for AI agents; what follows is how cortexmd builds its own version.

The naive fix for forgetting is to dump everything into context. Keep a big file of notes, paste it in at the start of every session, and hope the agent reads it. I tried versions of that, and it falls apart fast. The file grows without bound. Old, stale facts sit next to the one thing that actually matters today, with equal weight. You pay for the whole pile on every turn, and the signal you care about gets buried in noise you have long since stopped caring about. A human memory does not work like that, and it should not. So the design goal was simple to state and harder to build: the agent should remember the way a person does, where the things you use stay sharp and the things you stop touching fade.

Eight kinds of memory

When an agent stores something, cortexmd does not treat it as an undifferentiated blob of text. Each memory is auto-categorised into one of eight kinds: observation, decision, insight, conversation, fact, preference, plan, and reflection. The distinction matters because these things behave differently over time and want to be retrieved differently. A preference (I always want British spelling, I hate em dashes) is a long-lived fact about how I work, and it should keep surfacing. A conversation snippet is contextual and mostly useful soon after it happened. A decision is something you want to be able to find again months later when you ask yourself why on earth you did that. Tagging the kind up front gives the rest of the system something to reason with, instead of forcing every later step to guess from raw text.

Heat: hot, warm, cold

The core idea is that every memory has a temperature, and temperature decays. A fresh or recently used memory is hot. Leave it untouched and it cools to warm, then after roughly a month of inactivity it drifts to cold, and colder memories are eventually archived rather than kept in the front of the agent’s mind.

The crucial detail is promote-on-access: reading a memory heats it back up. This is the whole trick. You do not have to manually curate what is important. Importance is revealed by use. The memories you and the agent keep reaching for stay hot precisely because you keep reaching for them, and the ones you never touch sink on their own. It is the same instinct as a least-recently-used cache, except the thing being cached is the agent’s sense of what currently matters, and the eviction is graceful: cold and archived, not deleted.

Why bother with all this instead of one flat store? Because temperature gives recall a prior. When the agent goes looking for something, it does not face a flat sea of equally plausible notes. It has a built-in sense of what has been live lately, and that signal costs nothing extra to maintain because it falls out of normal use.

Consolidation: tidying the cold drawer

Letting memories cool is only half the story. If you simply let cold memories pile up, you end up with a drawer full of near-duplicate scraps: five slightly different notes about the same long-finished task, each a little stale, none worth reading on its own. So cortexmd consolidates. Related cold memories get folded together into summaries, so the gist survives in one coherent place while the redundant fragments stop cluttering things. The detail is not thrown away carelessly, it is compressed into something you would actually want to read later. Cooling decides what is no longer urgent; consolidation decides what to do with it.

Hybrid recall

Storing memory well is pointless if you cannot get it back. Recall in cortexmd is hybrid. It runs a lexical full-text search (the keyword match, good at exact terms and names) and fuses it with a semantic search over embeddings (the meaning match, good when you remember the idea but not the words). Lexical alone misses anything phrased differently from your query. Semantic alone can drift toward things that are vaguely on-topic but not what you meant. Fusing the two covers for the weaknesses of each.

On top of the fused score, the ranking is boosted by three things: temperature (hotter memories rank higher, because recency of use is a signal), importance (some memories are simply weightier), and links (a memory connected to other relevant memories is more likely to be the one you want). The result is a ranking that reflects not just textual similarity but how live and how connected a memory is. That is much closer to how you actually recall things than a plain similarity score.

Waking up

All of this comes together at the start of a session in what I call the wakeup. Instead of beginning every conversation as a blank slate, the agent does a memory wakeup that surfaces the hottest, most relevant memories. It is the difference between a colleague who walks in already knowing where you left off yesterday and one you have to brief from scratch every single morning. The wakeup leans on everything above: the heat model decides what is currently live, hybrid recall decides what is relevant, and the agent starts the session already oriented. This is the moment where the whole engine earns its keep, because it is the moment you feel the agent remembering you.

The smarter-brain round: links and dreams

The pieces above were the heart of the v2.0 memory system. A later round, which I think of as the smarter-brain work, added a few things that make the brain feel less like a database and more like something that thinks while you are away.

The Intelligence tab of the dashboard: vault health, dream insights, theme clusters, and entity and knowledge-graph counts. Demo data from the project’s seeded sample vault.

The first is automatic knowledge-graph links. As data is stored, cortexmd draws links between related notes on its own, instead of waiting for me to wire them up by hand. Manual linking is exactly the kind of bookkeeping that sounds nice and never actually happens, so having the connections form automatically as a side effect of storing things means the link signal in recall keeps getting richer without any effort from me.

The second is the dream. cortexmd runs a scheduled consolidation pass, on a quiet schedule, that I named the dream because of what it does and when it does it. It reconciles similar notes, with particular attention to older, cooled-down ones, and folds them into project notes. It is the background gardener of the brain: while nothing is happening, it walks through the cooled corners, notices that these three half-finished thoughts are really one thing, and tidies them into a coherent project note. You wake the agent up the next day and the brain is a little more organised than you left it, without you having done anything.

The third is something I borrowed straight from Obsidian: a vault graph view, rendered on a canvas in the web dashboard. Because the knowledge graph is real, you can look at it. Seeing the brain as a constellation of linked notes, with the dense clusters and the lonely orphans laid out in front of you, makes the whole thing feel concrete in a way a list of rows never does.

The vault graph view in the dashboard. Each dot is a note, each line a link. This is the project’s self-contained demo vault, so the note names are seeded sample data, not my own notes.

Click a node and the note opens in the side panel with its links. Same seeded demo data.

Why a heat model wins

Pulling it together: the reason a heat model beats dumping everything into context is that attention is the scarce resource, for an agent exactly as for a person. A flat store treats a note from eight months ago and a decision from this morning as equals, makes you pay for both on every turn, and forces the agent to rediscover what matters each time. The heat model encodes what matters as a property of the data itself, keeps it current for free through ordinary use, compresses what has gone cold instead of hoarding it, and surfaces the live, relevant slice at wakeup. The agent carries less, and what it carries is the right stuff.

That handles forgetting. The other half of the original problem, the agent burning tokens re-reading code it has already seen, needs a completely different mechanism. That is a Rust indexer and a symbol database, and it is the subject of part three: The Token Killer.

cortexmd is pre-alpha and MIT licensed. The code, including the memory engine described here, lives on the project page and on GitHub at github.com/Leicas/cortexmd. Names and config are still in flux, so treat the specifics as a snapshot rather than a contract.

Series

This is part two of a four-part series on cortexmd:

Giving an AI Agent a Second Brain
The Memory Engine: Heat, Decay, and Dreams (you are here)
The Token Killer: Navigating Code Without Reading It
Open-Sourcing the Brain: the Brain-Vault Model

Giving an AI Agent a Second Brain

Thu, 04 Jun 2026 00:00:00 +0000

I work with a coding agent most days now. It is genuinely good. It reads my code, reasons about it, proposes changes, runs the tests, fixes what it broke. And every single time I open a fresh session, it has the memory of a goldfish.

It does not remember the decision we made last week about why a module is structured the way it is. It does not remember that I prefer commas to dashes, or that one corner of the codebase is load-bearing and fragile. It does not remember the conversation where we ruled out an approach for good reasons. All of that context lived in the previous session, and the previous session is gone. So I re-explain. Then I re-explain again the next day.

That is the first problem. The agent forgets.

And it is not only the code. The moment I ask it to help with anything human, the same hole opens up. Ask it to draft an email and it has no idea who the recipient is to me, whether this is a close friend, a colleague, or a partner I need to handle with care, so it has no idea what tone to take, because the tone lived in past conversations it can no longer see. It does a poor job of linking one session to the next, so every thread starts cold. And the way I keep my life split makes it worse: personal in one account, work in another, the way most people do. The moment I cross from one to the other, whatever the agent had learned about me is simply gone. Poof. No memory.

Two problems, not one

The second problem is quieter but it shows up on every invoice. To do anything useful, the agent has to understand the code, and the way it understands code is by reading it. So it reads files. Whole files. To answer a small question about one function, it will pull an entire module into context, and often the modules that call that module too. Multiply that across a working session and you are paying, in tokens, to load the same source over and over, most of which is irrelevant to the question at hand.

Both problems come from the same place: the agent has no persistent store of what it has learned, and no cheap way to look things up. It only has the context window in front of it, and the context window is both forgetful and expensive to fill.

I decided to do something about both. Not because I had a product idea, but because it was annoying me on a daily basis and I had a homelab sitting there asking to be useful.

There was also a personal reason the shape of the solution felt obvious. A while ago, after reading a friend’s long write-up of his own personal-knowledge-management journey, I started keeping notes in Obsidian. Building that second brain for myself changed how I thought about the problem. If a vault of linked notes works as external memory for me, it should work as external memory for the agent too. I could let it read mine to get started, as a read-only source, and then let it build its own brain, one that I could actually open, navigate, and understand. Not a black box of embeddings somewhere, but notes, in a vault, that I own.

The homelab origin

For a while now I have run a small MCP server on my homelab. MCP, the Model Context Protocol, is the standard way to give an AI client tools and data it can reach out to. The server I built was called obsidian-mcp, and its first job was simple: give Claude the ability to read, search, and write notes in my Obsidian vault.

It ran in a Docker container behind a reverse proxy, my notes were already there, and suddenly the agent could reach into them. That alone was useful. But it also turned the vault into a natural place to put the answers to my two problems, because a vault is just structured text that an agent can read and write, and that is exactly what both a memory and a code index need to be backed by.

So the server grew two new capabilities, one for each problem.

The first capability is a memory system, inspired by mempalace, a memory-palace project for AI agents. Instead of letting everything evaporate at the end of a session, the agent can store what it learns: an observation, a decision, an insight, a preference I stated out loud. Those memories do not just pile up forever in a flat list. They have a lifecycle. The ones that get used stay warm and easy to surface, the ones nobody touches cool off and eventually get folded into summaries, and at the start of a new session the agent does a wakeup that brings the hottest, most relevant memories back to the surface. The point is continuity. The agent picks up roughly where it left off instead of from zero. That is the subject of part two.

The second capability is a code index. Rather than read whole files to understand a repository, the agent queries an index of it. A Rust indexer walks the repo, parses it, and records the things you actually want to look up: what symbols exist, their signatures, where they live, and crucially who calls whom. Then the agent asks targeted questions. What does this function look like? Who calls it? What breaks if I change it? Each answer is small and cheap, on the order of a lookup rather than a full read, instead of dragging the entire file into context. The design goal is blunt: a code-nav lookup should cost roughly sixty tokens per result and be many times cheaper than reading the file it came from. That is the subject of part three.

From a private tool to cortexmd

For months this was a personal thing. It ran on my hardware, over my own private Obsidian vault, the one that holds both personal and work notes. I am not going to quote any of it here, and the tool itself is deliberately built so that the data stays mine. But the point stands: it was a tool I made for myself, and I used it every day.

Then I hit a different kind of wall, one that came precisely from how well it worked for me. I will tell it properly in part four, but the short version is that the whole thing was tuned to my own setup, my vault, mounted and synced my way, so it was a great personal tool and impossible for anyone else to run. Making it shareable meant a redesign, and that redesign is what finally turned it into something other people could use.

That redesign became cortexmd. It is open source, MIT licensed, and public at github.com/Leicas/cortexmd. It is honestly pre-alpha. The APIs and the config names are still in flux, and I would not bet a production workflow on it yet. The honest framing is the right one: I built this for myself, then cleaned it up to share. The cleanup is real work and it is most of part four.

What it became: the cortexmd control panel. This screenshot is from the project’s self-contained demo, so the data is seeded samples, not my own vault.

So that is the shape of the series. There were two problems, an agent that forgets and an agent that burns tokens re-reading code. There are two answers, a memory system and a code index, both born inside a homelab MCP server. And there is the redesign that turned a private tool into something you can run yourself.

What is coming

Part 2, the memory engine. Heat, decay, and dreams. The eight categories a memory can fall into, the hot to warm to cold lifecycle, promote-on-access, consolidation, hybrid recall that fuses full-text and semantic search, the session wakeup, and the auto-linking graph that wires notes together as they are stored.
Part 3, the token killer. The Rust and tree-sitter indexer, the SQLite symbol database, the code-nav tools, the roughly sixty tokens per result idea, the opt-in shell hook that rewrites things like grep and cat on an indexed repo into the cheap equivalent, and what it was like to dogfood all of it on the project’s own source.
Part 4, open-sourcing the brain. Why a tool that only worked for me had to be redesigned to share, the brain-vault model that generalises it, the two deployment modes, the polyglot monorepo held together by a shared contract, the rename, and why I care about owning my own data.

If you want to skip ahead to the code, the project page is over here and the repo is on GitHub. Otherwise, part two is where the agent starts to remember.

Series

This is Part 1: Giving an AI Agent a Second Brain (you are here).

Part 1: Giving an AI Agent a Second Brain (this post)
Part 2: The Memory Engine: Heat, Decay, and Dreams
Part 3: The Token Killer: Navigating Code Without Reading It
Part 4: Open-Sourcing the Brain: the Brain-Vault Model

Project page: cortexmd. Source: github.com/Leicas/cortexmd.