Capstone: build Jarvis end-to-end
Everything you've learned, in one repo. By the end of this module you'll have a deployable, observed, hardened Jarvis you can demo. We'll lay out the project structure, build it module by module, run it locally, then deploy.
1. The architecture you're building
control + data plane
2. Repository layout
jarvis/
├── langgraph.json # deployment manifest
├── pyproject.toml # deps
├── .env # secrets (gitignored)
├── src/
│ ├── jarvis.py # exports `graph` for the platform
│ ├── state.py # JarvisState typed dict + reducers
│ ├── supervisor.py # supervisor node + routing model
│ ├── agents/
│ │ ├── it.py
│ │ ├── hr.py
│ │ ├── calendar.py
│ │ └── knowledge.py
│ ├── tools/
│ │ ├── helpdesk.py # open_it_ticket, ticket_status
│ │ ├── hris.py # leave_balance, policy_lookup
│ │ ├── calendar.py # find_room, schedule_meeting
│ │ └── kb.py # lookup_policy (retrieval over Chroma)
│ ├── memory/
│ │ ├── load.py # load_user_memories node
│ │ └── extract.py # post-turn fact extractor
│ ├── guards/
│ │ ├── injection.py # input guardrail
│ │ └── rate_limit.py # per-user quotas
│ └── auth.py # JWT verification → config.configurable
├── tests/
│ ├── test_tools.py
│ ├── test_routing.py # FakeListChatModel routing tests
│ └── eval/
│ └── run_dataset.py # LangSmith eval runner used in CI
└── scripts/
└── seed_kb.py # one-time: ingest policy PDFs into Chroma
jarvis/
├── langgraph.json
├── package.json
├── tsconfig.json
├── .env
├── src/
│ ├── jarvis.ts // exports `graph`
│ ├── state.ts // JarvisState (Annotation.Root) + reducers
│ ├── supervisor.ts
│ ├── agents/
│ │ ├── it.ts
│ │ ├── hr.ts
│ │ ├── calendar.ts
│ │ └── knowledge.ts
│ ├── tools/
│ │ ├── helpdesk.ts
│ │ ├── hris.ts
│ │ ├── calendar.ts
│ │ └── kb.ts
│ ├── memory/
│ │ ├── load.ts
│ │ └── extract.ts
│ ├── guards/
│ │ ├── injection.ts
│ │ └── rateLimit.ts
│ └── auth.ts
├── tests/
│ ├── tools.test.ts
│ ├── routing.test.ts
│ └── eval/runDataset.ts
└── scripts/
└── seedKb.ts
3. The build, in order
Build it in this sequence — each step gives you something you can run before moving on.
- State. Define
JarvisStatewithmessages(add_messages reducer),next(supervisor's pick),user_email,org_id. (Module 4.) - Tools. Implement the four tool modules with real or mocked APIs. Add validation, error returns, identity from config. (Module 5.)
- Specialists. One ReAct agent per domain, each with its own tools, prompt, and a small fast model. (Modules 3, 7.)
- Supervisor. Routing model returning
{next: ...}; supervisor node that calls it; conditional edges to specialists; edges back. (Module 7.) - Memory. Add the
load_user_memoriesnode before the supervisor; add theremember_preferencetool to relevant specialists; configure store with embeddings. (Module 6.) - Human-in-the-loop. Wrap
send_emailandgrant_accessininterrupt(). Add an UI flow in your test client to approve/edit/decline. (Module 8.) - Guardrails. Input guardrail node ahead of memory; per-user rate limit in tools; allow-list of tools per specialist. (Module 11.)
- Persistence. Compile with the platform's default checkpointer + Postgres store. Locally use SQLite. (Modules 6, 9.)
- Deployment.
langgraph.json+langgraph devlocally;langgraph deployto the Platform. (Module 9.) - Observability. Tracing on. Build a small gold dataset (20 cases to start). Wire offline evals into your CI. Sample 10% of prod traces for online evals. (Module 10.)
- Client. Minimal web chat that calls the SDK: get/create thread, stream a run, render updates, prompt for approvals on interrupts. (Module 9.)
If you're stuck at any step, scroll back to that module — the code there is the source of truth.
4. The wired-up src/jarvis.py — what it looks like
from langgraph.graph import StateGraph, START, END
from .state import JarvisState
from .memory.load import load_user_memories
from .guards.injection import injection_check
from .supervisor import supervisor_node, route_after_supervisor, SPECIALISTS
def build():
g = StateGraph(JarvisState)
# Pipeline nodes
g.add_node("inject_check", injection_check) # guardrail
g.add_node("memories", load_user_memories) # load long-term facts
g.add_node("supervisor", supervisor_node) # picks next worker
# Specialist sub-agents (each itself a compiled ReAct graph)
for name, agent in SPECIALISTS.items():
g.add_node(name, agent)
# Wire it
g.add_edge(START, "inject_check")
g.add_edge("inject_check", "memories")
g.add_edge("memories", "supervisor")
g.add_conditional_edges(
"supervisor",
route_after_supervisor,
{**{n: n for n in SPECIALISTS}, "FINISH": END},
)
for name in SPECIALISTS:
g.add_edge(name, "supervisor") # specialists report back
return g
# Exported for langgraph deploy — Platform compiles with its own checkpointer/store.
graph = build()
import { StateGraph, START, END } from "@langchain/langgraph";
import { JarvisState } from "./state";
import { loadUserMemories } from "./memory/load";
import { injectionCheck } from "./guards/injection";
import { supervisorNode, routeAfterSupervisor, SPECIALISTS } from "./supervisor";
function build() {
const g = new StateGraph(JarvisState)
.addNode("inject_check", injectionCheck)
.addNode("memories", loadUserMemories)
.addNode("supervisor", supervisorNode);
for (const [name, agent] of Object.entries(SPECIALISTS)) g.addNode(name as any, agent);
g.addEdge(START, "inject_check")
.addEdge("inject_check", "memories")
.addEdge("memories", "supervisor")
.addConditionalEdges("supervisor", routeAfterSupervisor, {
...Object.fromEntries(Object.keys(SPECIALISTS).map(n => [n, n])),
FINISH: END,
});
for (const name of Object.keys(SPECIALISTS)) g.addEdge(name as any, "supervisor");
return g;
}
export const graph = build();
Every concept from the course shows up here in exactly one place. You should now be able to read this file front-to-back and explain every line to a coworker.
5. Running it
# Local — gives you the Studio UI at http://localhost:2024
langgraph dev
# Deploy
langgraph deploy --name jarvis-prod
# Hit it from your client (Python)
from langgraph_sdk import get_client
client = get_client(url="https://...langgraph.app")
thread = await client.threads.create()
async for chunk in client.runs.stream(
thread["thread_id"], "jarvis",
input={"messages": [{"role":"user","content":"Printer floor 3 jammed; also lunch with Anuj tomorrow 1pm"}]},
config={"configurable": {"user_email":"priya@acme.com", "org_id":"acme"}},
stream_mode="updates",
):
print(chunk.event, chunk.data)
npx @langchain/langgraph-cli dev
npx @langchain/langgraph-cli deploy --name jarvis-prod
import { Client } from "@langchain/langgraph-sdk";
const client = new Client({ apiUrl: "https://...langgraph.app" });
const thread = await client.threads.create();
for await (const chunk of client.runs.stream(thread.thread_id, "jarvis", {
input: { messages: [{ role: "user", content: "..." }] },
config: { configurable: { user_email: "priya@acme.com", org_id: "acme" } },
streamMode: "updates",
})) console.log(chunk.event, chunk.data);
6. Self-assessment — can you do all of this?
If you can answer "yes" to every item, you can ship multi-agent systems on your own.
- I can explain in one sentence what an LLM agent is, and pick agent vs. workflow correctly.
- I can place LangChain / LangGraph / LangSmith / LangGraph Platform on the right layer of the stack.
- I can build a working tool-calling agent from primitives (model, messages, prompt, tools) without a prebuilt.
- I can express the agent loop as a LangGraph
StateGraphwith conditional edges, and explain reducers. - I can design good tools — names, descriptions, schemas, error returns, identity from config.
- I can add short-term memory (checkpointer/threads) and long-term memory (store + namespaces).
- I can pick the right multi-agent pattern (supervisor / network / hierarchical / swarm) for a given problem.
- I can add a human-in-the-loop approval gate with
interrupt()and resume withCommand(resume=…). - I can deploy a graph to LangGraph Platform and explain control plane vs. data plane.
- I can wire LangSmith tracing and run offline + online evals with a gold dataset.
- I can list the production hardening checklist from Module 11 and explain why each item is on it.
7. What to learn next
- Agent UX — building chat surfaces that surface streaming, interrupts, and approvals well. The Vercel AI SDK + LangChain's
ai-sdkintegration on the TS side is a good starting point. - Voice agents — wrap your graph with a realtime voice layer (Vapi, Deepgram, ElevenLabs). The graph stays the same; only the I/O changes.
- Agent evals at scale — read up on trajectory evaluation (grading the path, not just the final answer), pairwise preference evals, and human-in-the-loop annotation queues.
- Self-improving agents — agents that update their own procedural memory ("next time, remember to check toner first") based on success/failure feedback. The store + a post-turn reflection node is enough to start.
- Specialised orchestration patterns — research the plan-and-execute, reflexion, tree-of-thoughts, and code-act agents. Most are 2–3 nodes added to a base ReAct graph.
- Cost-conscious frontier-model use — read each provider's prompt-caching, batch, and structured-output docs annually. The economics shift every six months.
8. You're done
You built Jarvis. More importantly, you now understand the entire stack underneath it — from the ReAct loop to the control plane. Take this same architecture and apply it to whatever your real product is. Every other multi-agent system you'll build is a permutation of what you've already done.
Bookmark the glossary and come back to specific modules whenever you need a refresher.
Final check
1. What's the most important thing you should take away from this course?
2. When LangGraph's API names change in 6 months, you will: