Since we last wrote about software engineering agents six months ago, the industry still lacks a shared definition of the term "agent." However, a major development has emerged — not in fully autonomous coding agents (which remain unconvincing) but in supervised agentic modes within the IDE. These modes allow developers to drive implementation via chat, with tools not only modifying code in multiple files but also executing commands, running tests and responding to IDE feedback like linting or compile errors.
This approach, sometimes called chat-oriented programming (CHOP) or prompt-to-code, keeps developers in control while shifting more responsibility to AI than traditional coding assistants like auto-suggestions. Leading tools in this space include Cursor, Cline and Windsurf, with GitHub Copilot slightly behind but catching up. The usefulness of these agentic modes depends on both the model used (with Claude's Sonnet series the current state of the art) and how well the tool integrates with the IDE to provide a good developer experience.
We've found these workflows intriguing and promising, with a notable increase in coding speed. However, keeping problem scopes small helps developers better review AI-generated changes. This works best with low-abstraction prompts and AI-friendly codebases that are well-structured and properly tested. As these modes improve, they’ll also heighten the risk of complacency with AI-generated code. To mitigate this, employ pair programming and other disciplined review practices, especially for production code.
One of the hottest topics right now in the GenAI space is the concept of software engineering agents. These coding assistance tools do more than just help the engineer with code snippets here and there; they broaden the size of the problem they can solve, ideally autonomously and with minimum interference from a human. The idea is that these tools can take a GitHub issue or a Jira ticket and propose a plan and code changes to implement it, or even create a pull request for a human to review. While this is the next logical step to increase the impact of AI coding assistance, the often advertised goal of generic agents that can cover a broad range of coding tasks is very ambitious, and the current state of tooling is not showing that convincingly yet. However, we can see this working sooner rather than later for a more limited scope of straightforward tasks, freeing up developer time to work on more complex problems. Tools that have been released with beta versions of agents include GitHub Copilot Workspace, qodo flow, Tabnine's agents for JIRA, or Amazon Q Developer. The SWE Bench benchmark lists more tools in that space, but we caution you to take benchmarks in the AI space with a grain of salt.
