AI Infrastructure · 2025 – Present

Continuum

A multi-layer agent memory manager modeled after human memory architecture — Short-Term, Mid-Term, and Long-Term — giving AI agents the same kind of persistent, queryable, lifecycle-aware memory that humans naturally use to reason across time.

Inspired by cognitive science · Targeting human-like memory for AI agents

Python OpenAI API Vector DBs Redis Kafka AWS RAG Embeddings

The Problem

LLMs Have No Native Memory

Every LLM call starts fresh. In long-running agentic workflows — research assistants, autonomous coding agents, multi-step task executors — the agent loses context between turns and sessions. Re-injecting the full conversation history hits token limits fast and becomes prohibitively expensive.

What Continuum Does

Continuum implements a three-tier memory hierarchy — Short-Term, Mid-Term, and Long-Term — with a routing layer that decides where to store and retrieve each memory based on recency, task relevance, and durability requirements. Agents get persistent, queryable context without hitting token or cost limits.

🔗

Why Not Just Use a DB?

A flat database can store facts but can't rank them by relevance to the current task. Retrieval without semantic understanding returns noise. Continuum combines vector embeddings for semantic search with structured lifecycle routing — so the agent always gets the right context, not just the stored one.

Why Not Just Use a Bigger Context?

Larger context windows are expensive, slow, and still finite. More importantly, attention degrades across very long contexts — models "lose" facts buried in the middle. Structured retrieval outperforms brute-force context stuffing for both cost and recall accuracy in production systems.

🧠

The Human Memory Parallel

Humans don't remember everything with equal fidelity — and that's a feature, not a bug. We have working memory for what we're focused on right now, short-term memory for recent context, and long-term memory for durable knowledge recalled semantically, not by exact address. Continuum is an attempt to give AI agents the same architecture — so they can reason the way humans do: with selective recall, recency awareness, and persistent knowledge that doesn't vanish between sessions.

STM
Human working memory
What you're focused on right now · Volatile · Limited capacity
MTM
Human short-term memory
Recent context · Fades without reinforcement
LTM
Human long-term memory
Durable knowledge · Retrieved by meaning, not address

Architecture

memory_router.route(memory_item)
Routes each memory write based on three signals:
recency_score task_relevance durability_flag ttl_threshold embedding_distance
STM
Short-Term Memory
Hot context · Active session
Backend Redis
TTL < 1 hour
Access O(1) key lookup
Use for Current task context
Promotion On relevance score ↑
MTM
Mid-Term Memory
Session scope · Working memory
Backend In-process store
TTL Session lifetime
Access Indexed retrieval
Use for Multi-turn reasoning
Promotion Via Kafka pipeline
LTM
Long-Term Memory
Persistent · Cross-session
Backend Vector DB
TTL Indefinite
Access Embedding similarity
Use for Cross-session recall
Backend Pluggable (any VDB)

Memory Lifecycle Flow

step 1
Agent writes memory_item
step 2
Memory Router scores and routes
recency · relevance · durability
STM
Stored in Redis · TTL < 1h · hot context
MTM · on relevance score ↑
Promoted to in-process store · session scope
LTM · async via Kafka
Consolidated to Vector DB · embedded · persistent
step 6
Agent retrieves via retrieve(query)
STM → MTM → LTM fallthrough · stops at first relevant match

Design Decisions

01 / storage

Pluggable Vector Store Interface

The LTM backend is abstracted behind a VectorStore protocol. Any embedding backend — Chroma, Pinecone, Weaviate, Qdrant — plugs in without changing routing logic. This also allows swapping embedding models (text-embedding-ada-002 → custom) without rewriting retrieval code.

02 / messaging

Kafka for MTM → LTM Consolidation

LTM writes are expensive — embedding generation + vector upsert is slow and synchronous if done inline. A Kafka topic decouples the consolidation step from the agent loop: MTM memories are published to a topic, consumed asynchronously, embedded, and written to the vector store without blocking the agent.

03 / routing

Scoring, Not Rules

Early versions used if/else rules (age > 1h → promote). This broke on varied workloads. The router now computes a composite score from recency decay, embedding cosine distance to current task, and a durability flag set by the agent. Score thresholds determine tier placement — more robust to different agent types.

04 / retrieval

Tiered Retrieval, Not Union Search

On read, Continuum checks STM first (O(1) Redis lookup), then MTM (indexed), then LTM (vector similarity) — stopping at the first sufficient match. This avoids flooding the agent with every vaguely-related memory from LTM. Configurable relevance thresholds control when to fall through to the next tier.

05 / deployment

Docker + AWS from the Start

Each component — the router service, the Kafka consumer, the Redis cache — runs in its own container. This makes local development identical to production (no "works on my machine" problems) and makes AWS deployment a docker-compose → ECS/Fargate swap rather than a rewrite.

06 / api

OpenAI API for Embeddings (Swappable)

text-embedding-ada-002 provides strong baseline embedding quality with minimal setup. The embedding call is wrapped in a provider interface — the same abstraction as the vector store — so switching to a local model (e.g., sentence-transformers) or a different provider is a config change, not a code change.

Roadmap

✓ v1 · Shipped

Core Memory Architecture

  • STM / MTM / LTM three-tier hierarchy
  • Lifecycle-aware memory router
  • Pluggable vector store interface
  • OpenAI embeddings + similarity search
  • Recency, relevance, durability scoring
  • Redis-backed STM with TTL expiry
  • Basic retrieval with tier fallthrough
⟳ v2 · In Progress

Pipeline & Agent Tooling

  • Kafka MTM → LTM consolidation pipeline
  • Docker + AWS (ECS/Fargate) deployment
  • CRUD memory tools for agent use
  • Graph-based memory navigation
  • Memory pruning and compression
  • Multi-agent shared memory namespace
  • Observability dashboard (MTTD/recall metrics)

Interested in Continuum?

The repository is public. Issues, PRs, and questions are welcome. Or reach out directly if you're working on similar problems in agentic AI infrastructure.