From dev containers to agent fleets

In Tooltips v2.0.0 I wrote about Dev Containers and GitHub Codespaces, and how I think they’re useful for getting projects up and running quickly.

The short version was: consistent development environments are good.

That’s still true! But I think I was only describing the first part of the problem.

Dev Containers are useful because they give you a repeatable environment. Same dependencies, same tooling, same setup, same basic expectations. That’s helpful for humans switching machines, or joining a project, or trying to avoid spending a day figuring out why something works for one developer and not another.

They’re also useful for agents. I mentioned that in the post: I feel more comfortable letting coding agents do more when they’re operating inside a container, because the scope of what they can affect is at least somewhat limited. It’s not magic, and it doesn’t solve every security problem, but it moves the work into a more bounded environment.

Then Codespaces takes the same general idea and moves it to the cloud. You get a devcontainer, but with its own compute resources. A VM, basically. Your repo is cloned, your environment is built, and you can get to work without depending quite so much on whatever strange state your laptop happens to be in.

So the next step feels pretty obvious:

What if we spin up those environments specifically for agents?

The task as the unit

The more I’ve worked on Simple Agent Manager, the more I think the mental model changes here.

With a human developer, the unit is usually the workspace. You open the repo, start the services, run the tests, maybe keep the environment around for days or weeks.

With agents, the unit is often the task.

Or maybe the task-shaped conversation.

Sometimes you want a bounded thing: fix this bug, write this migration, update this component, investigate this failing test. Sometimes you want something more open-ended: look through this repo with me, help me think through an architecture, monitor what another agent is doing, or keep working through a thread of ideas until we hit something useful.

Either way, the environment starts to follow the intent.

You don’t necessarily want one giant shared workspace where every agent piles in and starts changing things. You want a fresh environment for the unit of work. One task, one branch, one container, one set of ports.

Nobody stepping on anyone else’s feet.

Or ports.

Ha.

That’s the part that feels like the logical continuation of Dev Containers and Codespaces. If cloud devcontainers are useful for humans, they’re probably even more useful for fleets of agents. You can run a lot of them in parallel, each with its own resources, each with its own branch, each with its own little world to mess up.

That sounds simple. It is not.

The hard part is not launching them

I don’t want to overstate how easy the infrastructure side is, because it has its own annoying details. But the part that surprised me most is that launching agents is not really the hardest problem.

The harder problem is orchestration.

Once you can start ten agents, you have ten agents doing things. Great. Now what?

You need to know which ones are still running, which ones are stuck, which ones need input, which ones produced useful work, which ones changed the wrong files, which ones should be stopped, which ones should be resumed, and which ones should probably never have been started in the first place.

You also need to know what they actually did.

That means preserving tool calls, terminal commands, file diffs, test output, permission requests, transcripts, model usage, and enough context that a human can take over without feeling like they walked into the middle of a weird dream.

That last bit is important. A coding agent saying “done” is not the same thing as the work being done.

I’ve seen agents finish successfully while missing the important part of the task. I’ve seen them make a good local change but fail to understand the wider system. I’ve seen them get blocked on something they could have worked around. I’ve seen them do something useful and then fail to explain it in a way that makes review easy.

None of that means the tools are bad. It means we’re working with systems that are not deterministic.

Nondeterminism all the way up

This is the part I keep coming back to.

If you have a normal job runner, you can define a state machine. Pending, running, succeeded, failed. You can retry. You can time out. You can record logs. It may still be annoying, but the shape of the work is fairly predictable.

Agents make that fuzzier.

They can ask questions. They can misunderstand. They can recover. They can decide to inspect something you didn’t expect. They can produce a useful partial result. They can do the right thing for the wrong reason. They can do the wrong thing very confidently.

So then you start thinking: ok, maybe the answer is to use agents to manage the orchestration.

And honestly... I think that’s probably part of the answer.

But it doesn’t remove the problem. It refines it and moves it up a layer.

Now you have an orchestrator agent deciding when to create work, how to split it up, which environment to use, when to ask for human input, when to stop a child session, when to merge knowledge back into the project, and when to say “this is done.”

That’s powerful, but it means the orchestrator needs even better context than the workers. It needs to understand the project, the task lifecycle, the conversation history, the policies, the current state of the repo, the cost of running more work, and the cleanup obligations when it’s finished.

It’s agents all the way up, but hopefully with each layer getting a narrower and more explicit job.

That is the optimistic version, anyway.

The standards are still young

One of the things I’ve been learning is how much of this still needs better shared language.

SAM uses ACP where it can, and I’m glad it exists. Having a protocol for agent sessions, messages, and tool calls is much better than scraping terminal output and hoping for the best.

But it’s also not enough for everything I want to do.

It can represent a lot of the interaction with an agent, but orchestration needs more than “here is a message” and “here is a tool call.” It needs concepts like task lifecycle, delegation, handoffs, ownership, cleanup, progress, permission boundaries, model usage, and what counts as a terminal state.

Some of that can be built around ACP. Some of it probably belongs in higher-level platform conventions. Some of it is still just... not mature yet.

That’s not a criticism exactly. It’s just where the industry is.

We’re still early enough that every agent tool has slightly different assumptions about how it should run, how it should authenticate, what it reports, how it handles tools, what kind of transcript it keeps, and what it means to resume a session.

That makes orchestration harder than it looks from the outside.

What changed in my head

The way I think about this has shifted.

A few months ago I would have said: make sure your project has a good devcontainer.

I still think that!

But now I’d add: think about what an agent manager would need to understand your project.

Can an agent find the right files quickly? Can it run the tests? Can it tell whether the tests passed for the right reason? Can it work on a branch without bothering other agents? Can a human review what happened? Can the environment be thrown away cleanly? Can another agent pick up where the first one left off?

That’s a different bar.

The devcontainer is the base layer. It gives the work somewhere consistent to happen. But when you start running agents concurrently, the environment is just one piece of the system.

You also need the manager.

You need the thing that knows what should run where, what happened, what it cost, what needs attention, what can be cleaned up, and what should happen next.

That’s what I’ve been trying to build with SAM. It started from the obvious question: why not run coding agents inside cloud devcontainers?

The answer is: yes, absolutely.

And then immediately after that:

Oh no, now we have to manage them.

That’s the interesting part.

&

The task as the unit

The hard part is not launching them

Nondeterminism all the way up

The standards are still young

What changed in my head