Module 06 · ~70 min

Memory & state

Right now Jarvis forgets you between turns. We fix that. Short-term memory (this conversation), long-term memory (facts about you and the org), and retrieval memory (the knowledge base) — three distinct mechanisms in LangGraph, each with its own job.

1. Two kinds of memory

	Short-term (thread)	Long-term (store)
Lives in	One conversation (a "thread").	Across all conversations.
Scope	Per thread_id.	Per user, per org, per "namespace".
Contains	The full message history of this chat.	Facts ("Priya prefers Slack DMs"), past summaries, learned procedures.
Backed by	Checkpointer (saves the graph state).	Store (a key-value, optionally with embeddings).

Mental model

Short-term memory = what's in scrollback right now. Long-term memory = your personal notes about the user that survive every restart. A great assistant needs both.

2. Short-term memory — checkpointers and threads

A checkpointer saves the graph state after every node runs, keyed by a thread_id. Resume a thread later — even days later, even after a server restart — and the graph picks up where it left off, with the full message history.

from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver

# Persistent checkpointer (SQLite file). In production use Postgres.
async with AsyncSqliteSaver.from_conn_string("jarvis.sqlite") as checkpointer:
    agent = graph.compile(checkpointer=checkpointer)

    cfg = {"configurable": {"thread_id": "priya-2026-05-23"}}

    await agent.ainvoke({"messages": [("user", "Open a ticket: AC unit floor 2 is broken.")]}, config=cfg)
    # ... model files the ticket ...

    # Later in the day, same thread_id — Jarvis remembers:
    await agent.ainvoke({"messages": [("user", "What did I report this morning?")]}, config=cfg)
    # → "You reported the AC unit on floor 2; ticket TICK-4830 is in progress."

import { SqliteSaver } from "@langchain/langgraph-checkpoint-sqlite";

const checkpointer = SqliteSaver.fromConnString("jarvis.sqlite");
const agent = graph.compile({ checkpointer });

const cfg = { configurable: { thread_id: "priya-2026-05-23" } };

await agent.invoke(
  { messages: [{ role: "user", content: "Open a ticket: AC unit floor 2 is broken." }] },
  cfg,
);

// Later — same thread_id, Jarvis remembers the morning:
await agent.invoke(
  { messages: [{ role: "user", content: "What did I report this morning?" }] },
  cfg,
);

Production tip

SQLite is fine for dev and small deployments. For production, use the Postgres checkpointer (langgraph-checkpoint-postgres) — it's the same API, just with a connection string. The LangGraph Platform gives you a checkpointer for free; you don't even have to configure it.

3. Trimming and summarising — when threads get long

A long thread blows your context window and your token bill. Two strategies, often combined:

Trim — keep only the last N messages or the last K tokens. Simple, lossy.
Summarise — periodically run a node that replaces the older messages with a short LLM-written summary, kept in state.

from langchain_core.messages import trim_messages

trimmer = trim_messages(
    max_tokens=4000,
    strategy="last",
    token_counter=model,
    include_system=True,
)

def call_model(state):
    trimmed = trimmer.invoke([SYSTEM] + state["messages"])
    return {"messages": [model_with_tools.invoke(trimmed)]}

import { trimMessages } from "@langchain/core/messages";

const trimmer = trimMessages({
  maxTokens: 4000,
  strategy: "last",
  tokenCounter: model,
  includeSystem: true,
});

async function callModel(state: any) {
  const trimmed = await trimmer.invoke([SYSTEM, ...state.messages]);
  return { messages: [await modelWithTools.invoke(trimmed)] };
}

For Jarvis, threads are short (one work-day per thread), so trimming is enough. For long-running threads (a customer support session that runs for weeks), add a summariser node that triggers when the thread exceeds, say, 50 messages.

4. Long-term memory — the Store

The store is a separate, cross-thread persistence layer. Anything you put in it is keyed by a namespace tuple (e.g. ("users", "priya@acme.com", "prefs")) and a key. You can attach it to your compiled graph and tools can read/write it.

from langgraph.store.memory import InMemoryStore           # dev
# from langgraph.store.postgres import PostgresStore        # prod

store = InMemoryStore()

# Compile with both a checkpointer and a store:
agent = graph.compile(checkpointer=checkpointer, store=store)

import { InMemoryStore } from "@langchain/langgraph";
// import { PostgresStore } from "@langchain/langgraph-store-postgres"; // prod

const store = new InMemoryStore();
const agent = graph.compile({ checkpointer, store });

Writing memories from a tool or node

Tools and nodes receive the store via the same RunnableConfig trick:

from langgraph.config import get_store

@tool
async def remember_preference(key: str, value: str, *, config) -> str:
    """Save a long-term preference about the current user."""
    store = get_store()
    user = config["configurable"]["user_email"]
    await store.aput(("users", user, "prefs"), key, {"value": value})
    return f"Saved {key}={value}"

import { getStore } from "@langchain/langgraph";

export const rememberPreference = tool(
  async ({ key, value }, config) => {
    const store = getStore();
    const user = config?.configurable?.user_email as string;
    await store.put(["users", user, "prefs"], key, { value });
    return `Saved ${key}=${value}`;
  },
  { name: "remember_preference", description: "Save a long-term preference about the current user.",
    schema: z.object({ key: z.string(), value: z.string() }) },
);

Reading memories at the start of every turn

The simplest pattern is a "load memories" node that runs before model, fetches everything in the user's namespace, and injects it as a system message:

async def load_user_memories(state, config):
    store = get_store()
    user = config["configurable"]["user_email"]
    items = await store.asearch(("users", user, "prefs"))   # all prefs
    if not items:
        return {}
    memory_text = "\n".join(f"- {it.key}: {it.value['value']}" for it in items)
    sys = SystemMessage(content=f"Known facts about this user:\n{memory_text}")
    return {"messages": [sys]}

graph.add_node("memories", load_user_memories)
graph.add_edge(START, "memories")
graph.add_edge("memories", "model")

async function loadUserMemories(state: any, config: any) {
  const store = getStore();
  const user = config?.configurable?.user_email as string;
  const items = await store.search(["users", user, "prefs"]);
  if (!items.length) return {};
  const text = items.map(i => `- ${i.key}: ${i.value.value}`).join("\n");
  return { messages: [new SystemMessage(`Known facts about this user:\n${text}`)] };
}

graph.addNode("memories", loadUserMemories).addEdge(START, "memories").addEdge("memories", "model");

5. Three flavours of long-term memory

Borrowing from cognitive science, it's useful to distinguish:

Kind	What it stores	Example for Jarvis
Semantic	Facts about the world / user.	"Priya is in Engineering. Her manager is Anuj. She prefers Slack DMs over email."
Episodic	Past events / experiences.	"Last Friday Priya asked Jarvis to book Room 4B at 3pm; she preferred small rooms."
Procedural	How to do things.	"Standard playbook for printer issues at Acme: check toner → check network → file ticket → email facilities."

All three live in the same store; what varies is the namespace:

Semantic → ("users", user, "facts")
Episodic → ("users", user, "episodes")
Procedural → ("org", "playbooks")

When to write what

Procedural memories you mostly seed manually (or with an offline job). Semantic facts can be extracted automatically by a small post-turn node ("did the user reveal a new fact about themselves?"). Episodic memories grow naturally as summaries of recent sessions.

6. Semantic search over memories (embeddings)

Once memories pile up you can't dump them all in the prompt. Use a store with embeddings and retrieve the most relevant ones for the current turn:

from langgraph.store.memory import InMemoryStore
from langchain_openai import OpenAIEmbeddings

store = InMemoryStore(index={"dims": 1536, "embed": OpenAIEmbeddings()})

# Write
await store.aput(("users", "priya@acme.com", "facts"), "manager", {"text": "Priya's manager is Anuj."})

# Read by similarity to the user's current message:
hits = await store.asearch(("users", "priya@acme.com", "facts"), query="Who is Priya's boss?")
for h in hits: print(h.value["text"])

import { InMemoryStore } from "@langchain/langgraph";
import { OpenAIEmbeddings } from "@langchain/openai";

const store = new InMemoryStore({ index: { dims: 1536, embeddings: new OpenAIEmbeddings() } });

await store.put(["users", "priya@acme.com", "facts"], "manager", { text: "Priya's manager is Anuj." });

const hits = await store.search(["users", "priya@acme.com", "facts"], { query: "Who is Priya's boss?" });
for (const h of hits) console.log(h.value.text);

7. RAG — when memory is documents

When the "memory" is actually the company's documents (HR policy PDFs, IT runbooks, product wiki), that's retrieval-augmented generation (RAG). The pattern:

Chunk your documents (~500–1000 tokens each).
Embed each chunk and store in a vector DB (pgvector, Pinecone, Chroma, Qdrant, …).
Expose retrieval as a tool the agent calls when it needs to look something up.

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

vectordb = Chroma(persist_directory="./kb", embedding_function=OpenAIEmbeddings())

@tool
def lookup_policy(question: str) -> str:
    """Search Acme's policy documents (HR + IT). Returns the most relevant 3 snippets."""
    docs = vectordb.similarity_search(question, k=3)
    return "\n\n".join(f"[{d.metadata['source']}] {d.page_content}" for d in docs)

import { Chroma } from "@langchain/community/vectorstores/chroma";
import { OpenAIEmbeddings } from "@langchain/openai";

const vectordb = await Chroma.fromExistingCollection(new OpenAIEmbeddings(), { collectionName: "kb" });

export const lookupPolicy = tool(
  async ({ question }) => {
    const docs = await vectordb.similaritySearch(question, 3);
    return docs.map(d => `[${d.metadata.source}] ${d.pageContent}`).join("\n\n");
  },
  { name: "lookup_policy", description: "Search Acme's policy documents.", schema: z.object({ question: z.string() }) },
);

RAG is just a tool

In an agent world, you don't "build a RAG app". You build an agent and one of its tools happens to be retrieval. The model decides when to retrieve and when not to. This is much more flexible than the old "always-retrieve-first" RAG pipeline.

8. Jarvis with memory

★ Jarvis status

Jarvis now: (a) remembers any conversation by thread_id across crashes and days; (b) loads user facts at the start of each turn from a long-term store; (c) can look up Acme's policy docs on demand via a retrieval tool. Next: split this growing single-agent into a multi-agent supervisor + specialists.

1. Two kinds of memory

2. Short-term memory — checkpointers and threads

3. Trimming and summarising — when threads get long

4. Long-term memory — the Store

Writing memories from a tool or node

Reading memories at the start of every turn

5. Three flavours of long-term memory

6. Semantic search over memories (embeddings)

7. RAG — when memory is documents

8. Jarvis with memory

Quick check