The orchestration ecosystem
LangChain. LangGraph. LangSmith. LangGraph Platform. Same vendor, four different jobs. By the end of this module you'll know exactly what each one is for and never mix them up again.
1. The "layer cake" mental model
Most agent stacks have four jobs that need doing. The LangChain ecosystem ships one library per job:
LangChain = the Lego bricks (a model, a tool, a prompt). LangGraph = the instruction booklet (how the bricks plug together at runtime). LangSmith = the security camera (records every move so you can debug and evaluate). LangGraph Platform = the factory (hosts your built thing as a scalable service with a control plane).
2. LangChain — the building blocks
LangChain is a library of uniform interfaces over LLM-adjacent things:
- Chat models — one API over Anthropic, OpenAI, Gemini, local Ollama, etc. Swap providers with a one-line change.
- Messages —
HumanMessage,AIMessage,SystemMessage,ToolMessage. The vocabulary every chat model understands. - Prompts / prompt templates — string templates with variables, plus chat-prompt templates.
- Tools — typed function descriptors the LLM can call.
- Output parsers — structured-output helpers (give me a Pydantic / Zod object back, not a string).
- Retrievers / vector stores — for RAG.
LangChain alone is enough to build a single-shot "ask the model, get a reply" app or a simple chain. It is not the right tool when you have loops, branching, or multiple agents — that's LangGraph's job.
You'll see old tutorials use AgentExecutor from LangChain itself to run an agent loop. That works, but the LangChain team now recommends LangGraph for any agent more involved than the simplest single-turn case. We follow the modern recommendation throughout this course.
3. LangGraph — the orchestration layer
LangGraph is a framework for building stateful, multi-step, possibly cyclic applications around LLMs. You describe your program as a graph:
- State — a shared bag of data that flows through the graph (e.g. the message history, the user id, any intermediate results).
- Nodes — units of work. A node is just a function. Some nodes call the LLM, some call tools, some are pure Python/TS logic.
- Edges — wires that say "after node A, go to node B." Edges can be conditional: "after the LLM, go to the tool node if it asked for a tool, otherwise end."
Why a graph and not just code? Because graphs give you, basically for free:
- Persistence — pause and resume after every node. Survive crashes. Time-travel debugging.
- Streaming — token-by-token, node-by-node, state-update-by-state-update.
- Human-in-the-loop — interrupt mid-execution to wait for an approval, then resume.
- Concurrency — fan out to multiple branches in parallel.
- Visibility — auto-traced into LangSmith; visualisable as an actual diagram.
The reason the world settled on a graph abstraction for agents isn't aesthetic — it's that you need checkpoints between steps for everything that matters in production (resume after crash, human approval, time-travel debugging). The graph gives you a natural checkpoint boundary at every edge.
4. LangSmith — observability & evaluation
LangSmith is a hosted product (with a free tier) that captures a trace for every run of your LangGraph/LangChain code. A trace is a tree of every LLM call, every tool call, every node, with full inputs, outputs, latencies, and token counts.
You'll use LangSmith for:
- Debugging — "why did my agent loop 14 times?" Click the trace. See exactly what the model said at each step.
- Datasets — collect real production runs into a dataset of (input, expected-output) pairs.
- Evaluators — run your agent against the dataset and grade each output (with another LLM, with a hand-written check, or with a human).
- Online monitoring — alert when error rate or cost spikes in production.
Setting LANGSMITH_TRACING=true and a project name is enough — every LangChain/LangGraph call automatically becomes a trace. No code changes. You should do this from Module 3 onward; you'll thank yourself the first time something goes weird.
5. LangGraph Platform — the control plane
You wrote a LangGraph app. It runs on your laptop. Now what? Production needs:
- A web server exposing your graph as an API.
- A database to persist conversation threads and long-term memory across machines.
- Background workers for long-running runs.
- A queue so concurrent users don't trample each other's state.
- A way to deploy a new version without dropping in-flight runs.
- A UI to inspect, manage, and replay runs in production.
That's what LangGraph Platform provides. It is split — like every modern cloud service — into a control plane and a data plane:
UI + API to deploy, configure, monitor
where your agents actually execute
Concretely, the control plane lets you deploy a graph and gives you a managed runtime that exposes assistants, threads, and runs over HTTP. We go deep on this in Module 9; for now just know it exists and that "agent with a control plane" maps to "LangGraph app deployed on LangGraph Platform" (or self-hosted LangGraph Server).
You can build with LangGraph and never use the Platform — package it yourself with FastAPI/Express and host on whatever. But the Platform exists because everyone who built agents in 2023–24 ended up writing the same persistence + queueing + thread-management code. The Platform is that code, productised.
6. The picture, one more time
From here on, every module sits on one of these layers. Modules 3–8 are LangChain + LangGraph. Module 9 is the Platform. Module 10 is LangSmith. Module 11 layers in production concerns that touch all of them.
Quick check
1. You need to call a chat model, parse its output into a struct, and connect it to a vector store. Which library?
2. Your agent must loop until it has gathered enough info, then branch into one of three follow-ups. Which library?
3. Your agent looped 14 times in production and you have no idea why. What do you reach for?
4. The "control plane" your CTO mentioned maps to which layer?