Module 09 · ~80 min

The control plane & deployment

A LangGraph app on your laptop is not a service. To run agents for real users you need persistence, queueing, scaling, deploy-without-downtime, an API to call them from, and a UI to manage them at runtime. That is what a control plane gives you. Here is how it works.

1. Control plane vs. data plane

The same split every modern cloud system uses:

Control plane

A management surface (UI + API) where you operate:

  • Deploy a new graph version.
  • Create/configure "assistants".
  • Inspect threads, replay runs, fork from history.
  • Set environment variables, secrets, scaling policy.
  • Read traces and metrics.

Data plane

Where user traffic actually executes:

  • HTTP API the user-facing app calls.
  • Worker processes running graph nodes.
  • Task queue for background runs.
  • Postgres for checkpoints, threads, store.
  • Streaming (SSE/WebSocket) for live updates.
You manage things on the control plane. Users hit the data plane. They scale independently.
Why the split matters

Your CTO can deploy a new version, your dev can replay a buggy production run, your ops can scale workers — none of which blocks or affects users in flight. That separation is the whole reason "control plane" is a concept worth a separate word.

2. Three vocabulary words: assistants, threads, runs

The LangGraph runtime exposes your graph through three nouns:

NounIsLives forFor Jarvis
Assistant A graph + a specific config (prompt, model, feature flags). Multiple assistants can share one graph. As long as you want it deployed. "jarvis-prod" assistant; "jarvis-spanish" assistant pointing at the same graph but with a Spanish system prompt.
Thread A persistent conversation — the checkpointed state for one user/session. Forever (or until you delete it). One per (user, chat session). Resumes across days.
Run One invocation of an assistant on a thread. Seconds to minutes. One per user turn. Streams output back as it happens.

3. Deployment options — pick one

OptionYou manageBest for
LangGraph CloudNothing — fully managed SaaS.Starting fast, small teams.
Hybrid / self-hosted control plane, your data planeRun the data plane in your VPC; control plane stays managed.Enterprises with data residency requirements.
Fully self-hosted (LangGraph Server)Everything: containers, Postgres, scaling.Air-gapped environments; strict control.
DIY (FastAPI/Express around your compiled graph)Everything plus persistence, queue, threads API.Tiny apps that don't need any of the above. You will outgrow this.

4. Anatomy of a deployable LangGraph project

To deploy via the LangGraph CLI / Platform, a project needs two files at its root:

# langgraph.json — the deployment manifest
{
  "dependencies": ["."],
  "graphs": {
    "jarvis": "./src/jarvis.py:graph"
  },
  "env": ".env",
  "python_version": "3.11"
}
// langgraph.json — the deployment manifest
{
  "dependencies": ["."],
  "graphs": {
    "jarvis": "./src/jarvis.ts:graph"
  },
  "env": ".env",
  "node_version": "20"
}

And in code, you export the uncompiled StateGraph (the Platform adds its own checkpointer + store):

# src/jarvis.py
from langgraph.graph import StateGraph
# ... define nodes / edges as in earlier modules ...
graph = build_jarvis_graph()    # returns a StateGraph, NOT compiled
# The Platform calls graph.compile(checkpointer=..., store=...) for you.
// src/jarvis.ts
import { StateGraph } from "@langchain/langgraph";
export const graph = buildJarvisGraph();

Local dev with the same runtime

# In your project root:
pip install langgraph-cli
langgraph dev          # starts the LangGraph Server locally on :2024
# Visit http://localhost:2024 for LangGraph Studio (control-plane UI for dev).
npx @langchain/langgraph-cli dev
// Opens LangGraph Studio at http://localhost:2024

Same command everywhere: langgraph dev locally, langgraph deploy to ship. The graph code does not change between dev and prod.

5. Calling a deployed Jarvis — the SDK

Once deployed, you hit the data plane over HTTP. The official SDK wraps it:

from langgraph_sdk import get_client

client = get_client(url="https://your-deployment.langgraph.app", api_key=os.environ["LANGGRAPH_API_KEY"])

# 1. Get or create a thread for this user/chat session:
thread = await client.threads.create()

# 2. Start a run on the "jarvis" assistant:
async for chunk in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="jarvis",
    input={"messages": [{"role": "user", "content": "Printer floor 3 jammed."}]},
    config={"configurable": {"user_email": "priya@acme.com"}},
    stream_mode="updates",
):
    print(chunk.event, chunk.data)

# 3. Same thread later — Jarvis remembers:
await client.runs.wait(thread["thread_id"], "jarvis",
    input={"messages": [{"role": "user", "content": "What's the status?"}]})
import { Client } from "@langchain/langgraph-sdk";

const client = new Client({
  apiUrl: "https://your-deployment.langgraph.app",
  apiKey: process.env.LANGGRAPH_API_KEY,
});

const thread = await client.threads.create();

for await (const chunk of client.runs.stream(thread.thread_id, "jarvis", {
  input: { messages: [{ role: "user", content: "Printer floor 3 jammed." }] },
  config: { configurable: { user_email: "priya@acme.com" } },
  streamMode: "updates",
})) {
  console.log(chunk.event, chunk.data);
}

That's the entire user-facing integration. Your chat UI's only code is roughly: get/create thread → stream a run → render chunks.

6. Scaling and the data plane in practice

  • Horizontal scale. Add more worker pods. Threads are sticky-but-not-pinned: any worker can pick up any thread's next run because state lives in Postgres.
  • Queueing. Long-running runs go on a background queue. Your API returns a run id; client polls or streams.
  • Streaming. SSE between data plane and client. Choose stream_mode="updates" for per-node updates, "messages" for token-by-token, "values" for full state snapshots.
  • Cron jobs. The control plane lets you schedule runs (e.g. "every Monday at 9am, run the weekly-digest assistant"). No external scheduler needed.
  • Webhooks. Configure a webhook URL per run; the Platform pings you when the run finishes.

7. Double-texting — handling concurrent user input

What happens if Priya sends "and book Anuj too" while Jarvis is still mid-run on her previous message? Four built-in strategies, picked per-run with multitask_strategy:

StrategyBehaviourWhen
rejectRefuse the new message until current run finishes.Strict ordering matters (financial actions).
enqueueQueue the new message; run it right after current finishes.Default for ordered workflows.
interruptStop the current run, append the new message, restart from the new state.Conversational chat — the user changed their mind.
rollbackDiscard the current run entirely and start fresh with the new message.The new message replaces the old.
For Jarvis

Use interrupt for chat. That's how users expect chat to feel — say something new, the assistant adapts.

8. Versioning and rollouts

  • Each deploy creates a new graph version. Existing assistants keep pointing at their pinned version; new ones get the latest.
  • Roll out a prompt change by creating a new assistant on the same graph and routing 10% of users to it (A/B test). Promote when LangSmith evals look good.
  • Rollback = re-point the assistant at the previous version. State is preserved.

9. Auth, multi-tenancy, secrets

  • Auth: the Platform supports custom auth handlers — typically you'd verify a JWT from your own auth provider and stamp user_id/org_id into config.configurable.
  • Multi-tenancy: scope thread_id and store namespaces by tenant. Two orgs' Jarvises never see each other's data.
  • Secrets: env vars are set per-deployment in the control plane. Don't bake them into the graph code.

10. Jarvis, deployed

★ Jarvis status

Jarvis is now an HTTP API. Acme's Slack bot, web app, and mobile app all hit the same endpoint with a thread_id per (user, channel). Threads persist; runs stream; concurrent messages are handled gracefully. The control plane lets ops deploy new versions, replay buggy runs, and scale workers without dropping in-flight conversations. Next: actually seeing what it's doing.

Quick check

1. What's the control plane?

2. An "assistant" in LangGraph Platform terms is:

3. How does Jarvis pick up yesterday's conversation in a deployed setup?

4. Priya messages again while the previous run is still going. For a chat UX you most likely want…