Three Models of Agentic Development, and Why the IDE Still Wins

A tool I’ve been using for a while, Roo Code, an extension for Visual Studio Code, recently made a decision that got me thinking about where agentic software development is actually headed.

Roo Code is an AI-powered coding extension for VS Code, and like similar tools in the space it lets you connect to an LLM and work with an AI assistant right inside your development environment. I’ve found it genuinely useful for agentic coding workflows, so it was a bit of a surprise when the team announced they were stopping active development on the extension itself. Update: it appears the team has reversed this decision and that development actively continues. In any case I still find it excellent, and use it with Kimi 2.6 Cloud (via Ollama) as one of my eight agentic coding assistants.

Instead, they’re doubling down on something called Roo Remote.

It’s not available yet, and they’re currently in a pre-prototype phase, collecting email addresses and gauging demand. The concept is intriguing though: rather than requiring a separate coding environment, the agents come to you, wherever you already work. Their bet, at least initially, is that place is Slack.

The premise is straightforward enough. You’re already in Slack with a mix of humans in those channels, and now you add agents to the mix. You assign them tasks, they go off and do things, and they come back and report. It’s a vision of the agentic future where autonomous AI workers operate alongside humans in the same collaborative spaces we already inhabit.

I think about this kind of thing a lot, and while I see the appeal, I’m not sure I fully buy the premise, at least not as a replacement for what we have now.

The Observability Problem

Here’s my issue with the Slack model as a primary development environment: visibility.

As a developer, as a product manager, or as anyone playing an orchestration role, you need to be able to see what’s happening. When an agent disappears into the workflow and comes back with a result, you’ve lost the thread. You have no window into what it changed, what it touched, or what decisions it made along the way. In some contexts that’s fine and you just want the outcome, but in software development that opacity is often a problem.

This is precisely where the IDE earns its keep. It’s not “beautiful” in the way an iPhone is beautiful, but it’s beautiful in the way a well-designed instrument is beautiful: it does exactly what it’s supposed to do, which is communicate state.

When you’re working in an IDE like VS Code or Cursor, everything is visible. The files you’re working on, the status of your changes, the git graph with its entire version history, branching, diffing, all of it. When an agent makes a change inside the IDE, you can choose to trust it and move on, or you can drill down, inspect exactly what changed, and verify it did what you expected. That level of observability is hard to replicate in a chat-based interface like Slack. And this is why I think the Integrated Productivity Environment (IPE), which is essentially using an IDE to do non-coding things, is an idea with legs.

Cursor, by the way, is a living demonstration of this. Its success speaks to how much developers still value having a rich visual environment around their code.

Three Models, All Valid

So where does this leave us? I think we’re heading toward three distinct but complementary models for agentic development, and all three will (and do) coexist.

Model 1: The IDE.

The traditional development environment, enhanced with AI. Agents operate inside your workspace, and you maintain full observability over what they’re doing. VS Code, Cursor, Google’s new Antigravity, Windsurf and similar tools belong here. This model isn’t going away, and if anything, it’s getting stronger.

Team Stark's 2026 AI Coding Agent Lineup

StarkMind, our human-AI collaboration laboratory, currently runs eight AI coding agents across three IDEs for around $276 per month. They handle implementation, code review, architecture, and second-opinion analysis across Stark Insider, StarkMind, and Atelier Stark.

Team Lead (spans all three IDEs)

Claude Code — Anthropic Claude Opus, native extension running in VS Code, Cursor, and Antigravity. Primary engineer and orchestrator for the rest of the team. And the one I mostly rely on for updating our project management tickets (using our in-house Switchback system) to track both dev, and non-dev projects alike.

Visual Studio Code (5 agents)

Codex — OpenAI 5.3-Codex, native Codex extension
Minimax — Minimax M2.7, Kilo Code extension via Ollama Cloud
Kimi — Moonshot Kimi K2.6, Roo Code extension via Ollama Cloud
GLM — Zhipu GLM-5.1, Cline extension via Ollama Cloud

Cursor (2 agents)

Composer 2 — Cursor-native coding agent
GPT-5.5 — OpenAI, Cursor-native

Antigravity (1 agent)

Gemini Pro — Google Gemini 3.1 Pro, Antigravity-native

Each model brings different strengths to agentic coding: Claude Code leads on planning and long-form implementation, Codex on security and edge-case review, the Ollama Cloud trio on alternative perspectives at low cost, and Gemini Pro on Google-stack reasoning. The mix keeps no single vendor as a single point of failure.

Model 2: The Slack Model (Human-Agent Collaboration).

Agents working alongside humans in shared communication platforms like Slack, Discord, and Telegram. The agents participate in the flow of work, take on tasks, and report back. This is a valid and genuinely useful pattern, especially for workflows that are less about code inspection and more about task execution and communication. Think of it as the “meeting room” for humans and agents. For StarkMind, Loni and I are running OpenClaw with three agents via Telegram.

Model 3: Orchestrated Agentic Workflows (LangGraph, CrewAI, et. al.).

This is where things get more sophisticated. Tools like LangGraph and CrewAI allow you to build hybrid workflows that are part deterministic and part probabilistic. You define the structure of how work should flow, but LLMs handle the reasoning and generation within that structure. It’s the combination of the reliability of traditional programming with the flexibility of language models.

Langfuse trace UI showing a multi-step LangGraph research workflow, with nodes for hypothesis generation, search-extract, and evidence evaluation, traced across 3 minutes 38 seconds at a cost of 5.5 cents — Model 3 in action: a LangGraph orchestrated research workflow traced in Langfuse, evaluating AI memory models against the four Meaning Memory dimensions. The graph structure makes every step inspectable.

These systems offer observability too, just in a different form than the IDE. You can trace the workflow, see where agents were in the pipeline, and inspect inputs and outputs at each node. For research workflows, hypothesis testing, and multi-step data processing, this model is exceptionally well suited. A thesis research pipeline, for instance, maps naturally onto this kind of graph-based architecture.

What This Means

The framing that agents will simply replace the IDE, that we’ll all just work in Slack with our AI teammates, misses something important about how developers, and knowledge workers more broadly, actually need to relate to the work.

It’s not just about getting a result. It’s about understanding what happened, being able to verify it, learn from it, and course-correct. That relationship between human and agent, where the human remains an informed orchestrator rather than a passive recipient, requires surfaces that support observation, not just communication. Loni frames this as the Symbiotic Studio; rather, the idea that agents aren’t merely autonomous tools, but are fully integrated into human-ai collaboration workflows where each is an equal participant.

Roo Remote may find its niche, and the Slack/Telegram/Discord model is real and worth taking seriously. But the IDE will remain central to how serious development work gets done, and the graph-based orchestration layer will grow in importance as workflows become more complex.

Three models, all emerging, all valid. The interesting question isn’t which one wins, it’s learning when to reach for which one.

Clinton Stark is co-founder of Stark Insider and StarkMind, a human-AI collaboration laboratory. He covers technology, film, and the arts from Silicon Valley.