Tools
Trial
-
54. Claude Sonnet
Claude Sonnet is an advanced language model that excels in coding, writing, analysis and visual processing. It's available in the browser, terminal, most major IDEs and even integrates with GitHub Copilot. As of writing, benchmarking shows it outperforms previous models with versions 3.5 and 3.7, including earlier Claude models. It's also adept at interpreting charts and extracting text from images, and it features a developer-focused experience, such as with the "Artifacts" feature in the browser UI for generating and interacting with dynamic content such as code snippets and HTML designs.
We’ve used version 3.5 of Claude Sonnet extensively in software development and found it significantly boosts productivity across various projects. It excels in greenfield projects, particularly in collaborative software design and architectural discussions. While it may be too early to call any AI model "stable" for coding assistance, Claude Sonnet is among the most reliable models we've worked with. At the time of writing, Claude 3.7 has also been released and is promising, though we’ve not yet fully tested it in production.
-
55. Cline
Cline is an open-source VSCode extension that is currently one of the strongest contenders in the space of supervised software engineering agents. It lets developers drive their implementation entirely from the Cline chat, integrating seamlessly with the IDE they already use. Key features like Plan & Act mode, transparent token usage and MCP integration help developers interact effectively with LLMs. Cline has demonstrated advanced capabilities in handling complex development tasks, especially with Claude 3.5 Sonnet. It supports large codebases, automates headless browser testing and proactively fixes bugs. Unlike cloud-based solutions, Cline enhances privacy by storing data locally. Its open-source nature not only ensures greater transparency but also enables community-driven improvements. However, developers should be mindful of token usage cost, as Cline's code context orchestration, while very effective, is resource-intensive. Another potential bottleneck is rate limiting, which can slow down workflows. Until this is resolved, using API providers like OpenRouter, which provide better rate limits, is advisable.
-
56. Cursor
We continue to be impressed by the AI-first code editor Cursor, which remains a leader in the competitive AI coding assistance space. Its code context orchestration is very effective, and it supports a wide range of models, including the option to use a custom API key. The Cursor team often comes up with innovative user experience features before the other vendors, and they include an extensive list of context providers in their chat, such as the referencing of git diffs, previous AI conversations, web search, library documentation and MCP integration. Alongside tools like Cline and Windsurf, Cursor also stands out for its strong agentic coding mode. This mode allows developers to guide their implementation directly from an AI chat interface, with the tool autonomously reading and modifying files, as well as executing commands. Additionally, we appreciate Cursor's ability to detect linting and compilation errors in generated code and proactively correct them.
-
57. D2
D2 is an open-source diagrams-as-code tool that helps users create and customize diagrams from text. It introduces the D2 diagram scripting language, which prioritizes readability over compactness with a simple, declarative syntax. D2 ships with a default theme and leverages the same layout engine as Mermaid. Our teams appreciate its lightweight syntax, which is specifically designed for software documentation and architecture diagrams.
-
58. Databricks Delta Live Tables
Delta Live Tables (DLT) continues to prove its value in simplifying and streamlining data pipeline management, supporting both real-time streaming and batch processing through a declarative approach. By automating complex data engineering tasks, such as manual checkpoint management, DLT reduces operational overhead while ensuring a robust end-to-end system. Its ability to orchestrate simple pipelines with minimal manual intervention enhances reliability and flexibility, while features like materialized views provide incremental updates and performance optimization for specific use cases.
However, teams must understand DLT’s nuances to fully leverage its benefits and avoid potential pitfalls. As an opinionated abstraction, DLT manages its own tables and restricts data insertion to a single pipeline at a time. Streaming tables are append-only, requiring careful design considerations. Additionally, deleting a DLT pipeline also deletes the underlying table and data, potentially creating operational issues.
-
59. JSON Crack
JSON Crack is a Visual Studio Code extension that renders interactive graphs from textual data. Despite its name it supports multiple formats, including YAML, TOML and XML. Unlike Mermaid and D2, where the textual form is a means to create a specific visual graph, JSON Crack is a tool to look at data that happens to be in a textual format. The layout algorithm works well and the tool allows selective hiding of branches and nodes, making it a great choice for exploring data sets. A companion web-based tool is also available, but our reservations about relying on online services for formatting or parsing code apply. JSON Crack does have a node limit, and directs users to a commercial sibling tool for handling files with more than a few hundred nodes.
-
60. MailSlurp
Testing workflows that involve email are often complex and time-consuming. Development teams must build custom email API clients for automation while also setting up temporary inboxes for manual testing scenarios, such as user testing or internal product training before major releases. These challenges become even more pronounced when developing customer onboarding products. We’ve had a positive experience with MailSlurp, a mail server and SMS API service. It provides REST APIs for creating inboxes and phone numbers as well as validating emails and messages directly in code, and its no-code dashboard is also useful for manual testing preparations. Additional features like custom domains, webhooks, auto-reply and forwarding are worth checking out for more complex scenarios.
-
61. Metabase
Metabase is an open-source analytics and business intelligence tool that allows users to visualize and analyze data from a variety of data sources, including relational and NoSQL databases. The tool helps users create visualizations and reports, organize them into dashboards and easily share insights. It also offers an SDK for embedding interactive dashboards in web applications, matching the theme and style of the application — making it developer-friendly. With both officially supported and community-backed data connectors, Metabase is versatile across data environments. As a lightweight BI tool, our teams find it useful for managing interactive dashboards and reports in their applications.
-
62. NeMo Guardrails
NeMo Guardrails is an easy-to-use open-source toolkit from NVIDIA that empowers developers to implement guardrails for LLMs used in conversational applications. Since we last mentioned it in the Radar, NeMo has seen significant adoption across our teams and continues to improve. Many of the latest enhancements to NeMo Guardrails focus on expanding integrations and strengthening security, data and control, aligning with the project’s core goal.
A major update to NeMo’s documentation has improved usability and new integrations have been added, including AutoAlign and Patronus Lynx, along with support for Colang 2.0. Key upgrades include enhancements to content safety and security as well as a recent release that supports streaming LLM content through output rails for improved performance. We've also seen added support for Prompt Security. Additionally, Nvidia released three new microservices: content safety NIM microservice, topic control NIM microservice and jailbreak detection, all of which have been integrated with NeMo Guardrails.
Based on its growing feature set and increased usage in production, we’re moving NeMo Guardrails to Trial. We recommend reviewing the latest release notes for a complete overview of the changes since our last blip.
-
63. Nyx
Nyx is a versatile semantic release tool that supports a wide range of software engineering projects. It’s language-agnostic and works with all major CI and SCM platforms, making it highly adaptable. While many teams use semantic versioning in trunk-based development, Nyx also supports workflows like Gitflow, OneFlow and GitHub Flow. One key advantage of Nyx in production is its automatic changelog generation, with built-in support for Conventional Commits.
As noted in previous Radar editions, we caution against development patterns that rely on long-lived branches (e.g., Gitflow, GitOps), as they introduce challenges that even powerful tools like Nyx cannot mitigate. We highly recommend trying Nyx in CI/CD workflows, especially for trunk-based development, where we’ve seen repeated success.
-
64. OpenRewrite
OpenRewrite continues to serve us well as a tool for large-scale refactorings that follow a set of rules such as moving to a new API version of a widely used library or applying updates to many services that were created from the same template. Support for languages beyond Java, notably JavaScript, has been introduced. With short LTS release cycles in frameworks like Angular, keeping projects updated to newer versions has become increasingly important. OpenRewrite supports this process effectively. Using an AI coding assistant is an alternative, but for rule-based changes, it’s usually slower, more expensive and less reliable. We like that OpenRewrite comes bundled with a catalog of recipes (rules), which describe the changes to be made. The refactoring engine, bundled recipes and build tool plugins are open-source software, which makes it easier for teams to reach for OpenRewrite when they need it.
-
65. Plerion
Plerion is an AWS-focused cloud security platform that integrates with hosting providers to uncover risks, misconfigurations and vulnerabilities across your cloud infrastructure, servers and applications. Similar to Wiz, Plerion uses risk-based prioritization for detected issues, promising to let you "focus on the 1% that matters." Our teams report positive experiences with Plerion, noting it has provided our clients with significant insights and reinforced the importance of proactive security monitoring for their organizations.
-
66. Software engineering agents
Since we last wrote about software engineering agents six months ago, the industry still lacks a shared definition of the term "agent." However, a major development has emerged — not in fully autonomous coding agents (which remain unconvincing) but in supervised agentic modes within the IDE. These modes allow developers to drive implementation via chat, with tools not only modifying code in multiple files but also executing commands, running tests and responding to IDE feedback like linting or compile errors.
This approach, sometimes called chat-oriented programming (CHOP) or prompt-to-code, keeps developers in control while shifting more responsibility to AI than traditional coding assistants like auto-suggestions. Leading tools in this space include Cursor, Cline and Windsurf, with GitHub Copilot slightly behind but catching up. The usefulness of these agentic modes depends on both the model used (with Claude's Sonnet series the current state of the art) and how well the tool integrates with the IDE to provide a good developer experience.
We've found these workflows intriguing and promising, with a notable increase in coding speed. However, keeping problem scopes small helps developers better review AI-generated changes. This works best with low-abstraction prompts and AI-friendly codebases that are well-structured and properly tested. As these modes improve, they’ll also heighten the risk of complacency with AI-generated code. To mitigate this, employ pair programming and other disciplined review practices, especially for production code.
-
67. Tuple
Tuple, a tool optimized for remote pair programming, was originally designed to fill the gap left by Slack’s Screenhero. Since we last mentioned it in the Radar, it has seen wider adoption, addressed previous quirks and constraints and now supports Windows. A key improvement is enhanced desktop sharing with a built-in privacy feature, allowing users to hide private app windows (such as text messages) while sharing tools like the browser window. Previously, UI limitations made Tuple feel like a pair programming tool rather than a general collaboration tool. With these updates, users can now collaborate on content beyond the IDE.
However, it’s important to note that the remote pair has access to the entire desktop. If not configured properly, this could be a security concern, especially if the pair is not trustworthy. We strongly recommend educating teams on Tuple’s privacy settings, best practices and etiquette before use.
We encourage teams to try the latest version of Tuple in your development workflow. It aligns with our pragmatic remote pairing recommendation, offering low-latency pairing, an intuitive UX and significant usability improvements.
-
68. Turborepo
Turborepo helps manage large JavaScript or TypeScript monorepos by analyzing, caching, parallelizing and optimizing build tasks to speed up the process. In large monorepos, projects often depend on each other; rebuilding all dependencies for every change is inefficient and time-consuming, but Turborepo makes this easier. Unlike Nx, Turborepo's default setup uses multiple package.json files — one per project — which allows having dependencies with different versions (multiple versions of React, for example) in a single monorepo, which Nx discourages. While this might be considered an anti-pattern, it does address certain use cases, like migrating from multi- to monorepo, where teams may temporarily require multiple versions of dependencies. In our experience, TurboRepo is quite simple to set up and performs well.
Assess
-
69. AnythingLLM
AnythingLLM is an open-source desktop application to chat with large documents or pieces of content, backed by out-of-the-box integration with LLMs and vector databases. It has a pluggable architecture for embedder models and can be used with most of the commercial LLMs as well as open-weight models that can be managed by Ollama. In addition to RAG, different skills can be created and organized as agents to perform custom tasks and workflows. It lets users organize the documents and interactions with them in different workspaces and they act as long lived threads with different contexts. Recently, it also became possible to deploy it as a multi-user web application with a simple Docker image. Some of our teams are using it as a local personal assistant and finding it a powerful and useful utility.
-
70. Gemma Scope
Mechanistic interpretability — understanding the inner workings of large language models — is becoming an increasingly important field. Tools like Gemma Scope and the open-source library Mishax provide insights into the Gemma2 family of open models. Interpretability tools play a crucial role in debugging unexpected behavior, identifying components responsible for hallucinations, biases or other failure cases, and ultimately building trust by offering deeper visibility into models. While this field may be of particular interest to researchers, it's worth noting that with the recent release of DeepSeek-R1, model training is becoming more feasible for companies beyond the established players. As GenAI continues to evolve, both interpretability and safety will only grow in importance.
-
71. Hurl
Hurl is a Swiss Army knife for making sequences of HTTP requests, defined in plain text files using Hurl-specific syntax. Beyond sending requests, Hurl can validate responses, ensuring a request returns a specific HTTP status code; assert conditions on response headers or content using XPATH, JSONPath or regular expressions; and extract response data into variables, which can then be used to chain requests.
With its feature set, Hurl is useful for simple API automations but also serves as an automated API testing tool. Its ability to generate detailed test reports in HTML or JSON enhances its utility for testing workflows. While dedicated tools like Bruno and Postman offer GUIs and additional features, we like Hurl for its simplicity. Like Bruno, which also uses plain text files, Hurl tests can be stored in the code repository.
-
72. Jujutsu
Git is the dominant distributed version control system (VCS), holding the vast majority of market share. Yet, despite over a decade of dominance, developers still struggle with its complex workflows for branching, merging, rebasing and conflict resolution. This ongoing frustration has fueled a wave of tools designed to ease the pain — some offering visualizations to clarify complexity, others providing their own graphical interfaces to abstract it away entirely.
Jujutsu takes this a step further, offering a full-fledged alternative to Git while maintaining compatibility by using Git repositories as a storage backend. This allows developers to utilise existing Git servers and services while benefiting from Jujutsu's streamlined workflows. Positioned as "both simple and powerful," Jujutsu emphasizes ease of use for developers of all experience levels. One standout feature is its first-class conflict resolution, which has the potential to significantly improve the developer experience.
-
73. kubenetmon
Monitoring and understanding the network traffic associated with Kubernetes can prove a challenge, particularly when your infrastructure spans multiple zones, regions or clouds. kubenetmon, built by ClickHouse and recently open sourced, hopes to solve this problem by offering detailed Kubernetes data transfer metering across the major cloud providers. If you're running Kubernetes and have been frustrated by opaque data transfer costs on your bill it may be worth exploring kubenetmon.
-
74. Mergiraf
Resolving merge conflicts is probably one of the least liked activities in software development. And while there are techniques that reduce the complexity of merges — for example, practicing continuous integration in the original sense of merging to a shared mainline at least daily — we're seeing too much effort spent on merges. Long-lived feature branches are one culprit, but AI-assisted coding also has a tendency to increase the size of change sets. Help may come in the form of Mergiraf, a new tool that resolves merge conflicts by looking at the syntax tree rather than treating code as lines of text. As a git merge driver, it can be set up so that git subcommands like
merge
andcherry-pick
automatically use Mergiraf instead of the default heuristics. -
75. ModernBERT
The successor to BERT (Bidirectional Encoder Representations from Transformers), ModernBERT is a next-generation family of encoder-only transformer models designed for a wide range of natural language processing (NLP) tasks. As a drop-in replacement, ModernBERT improves both performance and accuracy while addressing some of BERT's limitations — notably including support for dramatically longer context lengths thanks to Alternating Attention. Teams with NLP needs should consider ModernBERT before defaulting to a general-purpose generative model.
-
76. OpenRouter
OpenRouter is a unified API for accessing multiple large language models. It provides a single integration point for mainstream LLM providers, simplifies experimentation, reduces vendor lock-in, and optimizes costs by routing requests to the most appropriate model. Popular tools like Cline and Open WebUI use OpenRouter as their endpoint. During our Radar discussion, we questioned whether most projects truly need to switch between models, given that OpenRouter must add price markup as a profit model on top of this encapsulation layer. However, we also recognize that OpenRouter provides various load-balancing strategies to help optimize costs. One particularly useful feature is its ability to bypass API rate limits. If your application exceeds the rate limit of a single LLM provider, OpenRouter can help you break through this limitation and achieve better throughput.
-
77. Redactive
Redactive is an enterprise AI enablement platform designed to help regulated organizations securely prepare unstructured data for AI applications, such as AI-powered assistants and copilots. It integrates with content platforms like Confluence, creating secure text indices for retrieval-augmented generation (RAG) searches. By serving only live data and enforcing real-time user permissions from source systems, Redactive ensures AI models access accurate, authorized information without compromising security. Additionally, it provides engineering teams with tools to build AI use cases safely using any LLM. For organizations exploring AI-driven solutions, Redactive offers a streamlined approach to data preparation and compliance, balancing security and accessibility for teams experimenting with AI capabilities in a controlled environment.
-
78. System Initiative
We continue to be excited by System Initiative. This experimental tool represents a radical new direction for DevOps work. We really like the creative thinking that has gone into this tool and hope it will encourage others to break with the status quo of infrastructure-as-code approaches. System Initiative is now out of beta and available free and open source under an Apache 2.0 license. While the tool’s developers use it to manage production infrastructure, it still has a way to go before it can scale to meet the demands of large enterprises. However, we continue to think it's worth checking out to experience a completely different approach to DevOps tooling.
-
79. TabPFN
TabPFN is a transformer-based model designed for fast and accurate classification on small tabular data sets. It leverages in-context learning (ICL) to make predictions directly from labeled examples without hyperparameter tuning or additional training. Pretrained on millions of synthetic data sets, TabPFN generalizes well across diverse data distributions and handles missing values and outliers effectively. Its strengths include efficient processing of heterogeneous data and robustness to uninformative features.
TabPFN is particularly suitable for small-scale applications where speed and accuracy are crucial. However, it faces scalability challenges with larger data sets and has limitations in handling regression tasks. As a cutting-edge solution, TabPFN is worth evaluating for its potential to outperform traditional models in tabular classification, especially where transformers are less commonly applied.
-
80. v0
v0 by Vercel is an AI tool for generating front-end code from a screenshot, Figma design or simple prompt. It supports React, Vue, shadcn and Tailwind among other front-end frameworks. Beyond AI-generated code, v0 offers a great user experience, including the ability to preview the generated code and deploy it to Vercel in one step. While building real-world applications involves integrating multiple functionalities beyond a single screen, v0 provides a solid way to prototype and can be used to initialize a starting point for developing complex applications.
-
81. Windsurf
Windsurf is an AI coding assistant by Codeium that stands out for its agentic capabilities. Similar to Cursor and Cline, it lets developers drive their implementation from an AI chat that navigates and changes code and executes commands. It frequently releases interesting new features and integrations for the agentic mode. Recently, for instance, it released a browser preview that makes it easy for the agent to access DOM elements and the browser console, and a web research capability that lets Windsurf look for documentation and solutions on the internet when appropriate. Windsurf provides access to a range of popular models, and users can activate and reference web search, library documentation and MCP integration as additional context providers.
-
82. YOLO
The YOLO (You Only Look Once) series, developed by Ultralytics, continues to advance computer vision models. The latest release, YOLO11, delivers significant improvements in both precision and efficiency over previous versions. YOLO11 can perform image classification at high speed with minimum resources, making it suitable for real-time applications in edge devices. We also found that the ability to use the same framework to do pose estimation, object detection, image segmentation and other tasks is very powerful. This significant development also reminds us that using ‘traditional’ machine-learning models for specific tasks can be more powerful than general AI models, such as LLMs.
Hold
The information in our interactive Radar is not available in your preferred language.
Unable to find something you expected to see?
Each edition of the Radar features blips reflecting what we came across during the previous six months. We might have covered what you are looking for on a previous Radar already. We sometimes cull things just because there are too many to talk about. A blip might also be missing because the Radar reflects our experience, it is not based on a comprehensive market analysis.
