Continuum
A multi-layer agent memory manager modeled after human memory architecture — Short-Term, Mid-Term, and Long-Term — giving AI agents the same kind of persistent, queryable, lifecycle-aware memory that humans naturally use to reason across time.
⬡ Inspired by cognitive science · Targeting human-like memory for AI agents
The Problem
LLMs Have No Native Memory
Every LLM call starts fresh. In long-running agentic workflows — research assistants, autonomous coding agents, multi-step task executors — the agent loses context between turns and sessions. Re-injecting the full conversation history hits token limits fast and becomes prohibitively expensive.
What Continuum Does
Continuum implements a three-tier memory hierarchy — Short-Term, Mid-Term, and Long-Term — with a routing layer that decides where to store and retrieve each memory based on recency, task relevance, and durability requirements. Agents get persistent, queryable context without hitting token or cost limits.
Why Not Just Use a DB?
A flat database can store facts but can't rank them by relevance to the current task. Retrieval without semantic understanding returns noise. Continuum combines vector embeddings for semantic search with structured lifecycle routing — so the agent always gets the right context, not just the stored one.
Why Not Just Use a Bigger Context?
Larger context windows are expensive, slow, and still finite. More importantly, attention degrades across very long contexts — models "lose" facts buried in the middle. Structured retrieval outperforms brute-force context stuffing for both cost and recall accuracy in production systems.
The Human Memory Parallel
Humans don't remember everything with equal fidelity — and that's a feature, not a bug. We have working memory for what we're focused on right now, short-term memory for recent context, and long-term memory for durable knowledge recalled semantically, not by exact address. Continuum is an attempt to give AI agents the same architecture — so they can reason the way humans do: with selective recall, recency awareness, and persistent knowledge that doesn't vanish between sessions.
Architecture
Memory Lifecycle Flow
Design Decisions
Pluggable Vector Store Interface
The LTM backend is abstracted behind a VectorStore protocol. Any embedding backend — Chroma, Pinecone, Weaviate, Qdrant — plugs in without changing routing logic. This also allows swapping embedding models (text-embedding-ada-002 → custom) without rewriting retrieval code.
Kafka for MTM → LTM Consolidation
LTM writes are expensive — embedding generation + vector upsert is slow and synchronous if done inline. A Kafka topic decouples the consolidation step from the agent loop: MTM memories are published to a topic, consumed asynchronously, embedded, and written to the vector store without blocking the agent.
Scoring, Not Rules
Early versions used if/else rules (age > 1h → promote). This broke on varied workloads. The router now computes a composite score from recency decay, embedding cosine distance to current task, and a durability flag set by the agent. Score thresholds determine tier placement — more robust to different agent types.
Tiered Retrieval, Not Union Search
On read, Continuum checks STM first (O(1) Redis lookup), then MTM (indexed), then LTM (vector similarity) — stopping at the first sufficient match. This avoids flooding the agent with every vaguely-related memory from LTM. Configurable relevance thresholds control when to fall through to the next tier.
Docker + AWS from the Start
Each component — the router service, the Kafka consumer, the Redis cache — runs in its own container. This makes local development identical to production (no "works on my machine" problems) and makes AWS deployment a docker-compose → ECS/Fargate swap rather than a rewrite.
OpenAI API for Embeddings (Swappable)
text-embedding-ada-002 provides strong baseline embedding quality with minimal setup. The embedding call is wrapped in a provider interface — the same abstraction as the vector store — so switching to a local model (e.g., sentence-transformers) or a different provider is a config change, not a code change.
Roadmap
Core Memory Architecture
- ✓ STM / MTM / LTM three-tier hierarchy
- ✓ Lifecycle-aware memory router
- ✓ Pluggable vector store interface
- ✓ OpenAI embeddings + similarity search
- ✓ Recency, relevance, durability scoring
- ✓ Redis-backed STM with TTL expiry
- ✓ Basic retrieval with tier fallthrough
Pipeline & Agent Tooling
- ◐ Kafka MTM → LTM consolidation pipeline
- ◐ Docker + AWS (ECS/Fargate) deployment
- ◐ CRUD memory tools for agent use
- ◐ Graph-based memory navigation
- ◐ Memory pruning and compression
- ◐ Multi-agent shared memory namespace
- ◐ Observability dashboard (MTTD/recall metrics)
Interested in Continuum?
The repository is public. Issues, PRs, and questions are welcome. Or reach out directly if you're working on similar problems in agentic AI infrastructure.