Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Volume 31 | October 2024

Techniques

  • Techniques

    Adopt Trial Assess Hold Adopt Trial Assess Hold
  • New
  • Moved in/out
  • No change

Techniques

Adopt ?

  • 1. 1% canary

    For many years, we've used the canary release approach to encourage early feedback on new software versions, while reducing risk through incremental rollout to selected users. The 1% canary is a useful technique where we roll out new features to a very small segment (say 1%) of users carefully chosen across various user categories. This enables teams to capture fast user feedback, observe the impact of new releases on performance and stability and respond as necessary. This technique becomes especially crucial when teams are rolling out software updates to mobile applications or a fleet of devices like edge computing devices or software-defined vehicles. With proper observability and early feedback, it gives the opportunity to contain the blast radius in the event of unexpected scenarios in production. While canary releases can be useful to get faster user feedback, we believe starting with a small percentage of users is mandatory to reduce and contain the risk of large-scale feature rollouts.

  • 2. Component testing

    Automated testing remains a cornerstone of effective software development. For front-end tests we can argue whether the distribution of different test types should be the classic test pyramid or whether it should be a trophy shape. In either case, though, teams should focus on component testing because test suites should be stable and run quickly. Instead, what we're seeing is that teams forgo mastering component testing in favor of end-to-end browser-based testing as well as very narrowly defined unit tests. Unit tests have a tendency to force components to expose what should be purely internal functionality, while browser-based tests are slow, more flaky and harder to debug. Our recommendation is to have a significant amount of component tests and use a library like jsdom to run the component tests in memory. Browser tools like Playwright of course still have a place in end-to-end tests, but they shouldn't be used for component testing.

  • 3. Continuous deployment

    We believe organizations should adopt continuous deployment practices whenever possible. Continuous deployment is the practice of automatically deploying every change that passes automated tests to production. This practice is a key enabler of fast feedback loops and allows organizations to deliver value to customers more quickly and efficiently. Continuous delivery differs from continuous deployment in that it only requires that code can be deployed at any time; it doesn't require that every change actually is deployed to production. We've hesitated to move continuous deployment into the Adopt ring in the past, as it’s a practice that requires a high level of maturity in other areas of software delivery and is therefore not appropriate for all teams. However, Thoughtworker Valentina Servile’s recent book Continuous Deployment provides a comprehensive guide to implementing the practice in an organization. It offers a roadmap for organizations to follow in order to achieve the level of maturity required to adopt continuous deployment practices.

  • 4. Retrieval-augmented generation (RAG)

    Retrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of responses generated by a large language model (LLM). We’ve successfully used it in many projects, including the Jugalbandi AI platform. With RAG, information about relevant and trustworthy documents is stored in a database. For a given prompt, the database is queried, relevant documents are retrieved and the prompt augmented with the content of the documents, thus providing richer context to the LLM. This results in higher quality output and greatly reduced hallucinations. The context window — which determines the maximum size of the LLM input — has grown significantly with newer models, but selecting the most relevant documents is still a crucial step. Our experience indicates that a carefully constructed smaller context can yield better results than a broad and large context. Using a large context is also slower and more expensive. We used to rely solely on embeddings stored in a vector database to identify additional context. Now, we're seeing reranking and hybrid search: search tools such as Elasticsearch Relevance Engine as well as approaches like GraphRAG that utilize knowledge graphs created with the help of an LLM. A graph-based approach has worked particularly well in our work on understanding legacy codebases with GenAI.

Trial ?

  • 5. Domain storytelling

    Domain-driven design (DDD) has become a foundational approach to the way we develop software. We use it to model events, to guide software designs, to establish context boundaries around microservices and to elaborate nuanced business requirements. DDD establishes a ubiquitous language that both nontechnical stakeholders and software developers can use to communicate effectively about the business. Once established, domain models evolve, but many teams find it hard to get started with DDD. There’s no one-size-fits-all approach to building an initial domain model. One promising technique we've encountered recently is domain storytelling. Domain storytelling is a facilitation technique where business experts are prompted to describe activities in the business. As the experts are guided through their narration, a facilitator uses a pictographic language to capture the relationships and actions between entities and actors. The process of making these stories visible helps to clarify and develop a shared understanding among participants. Since there is no single best approach to developing a domain model, domain storytelling offers a noteworthy alternative or, for a more comprehensive approach to DDD, companion to Event Storming, another technique we often use to get started with DDD.

  • 6. Fine-tuning embedding models

    When building LLM applications based on retrieval-augmented generation (RAG), the quality of embeddings directly impacts both retrieval of the relevant documents and response quality. Fine-tuning embedding models can enhance the accuracy and relevance of embeddings for specific tasks or domains. Our teams fine-tuned embeddings when developing domain-specific LLM applications for which precise information extraction was crucial. However, consider the trade-offs of this approach before you rush to fine-tune your embedding model.

  • 7. Function calling with LLMs

    Function calling with LLMs refers to the ability to integrate LLMs with external functions, APIs or tools by determining and invoking the appropriate function based on a given query and associated documentation. This extends the utility of LLMs beyond text generation, allowing them to perform specific tasks such as information retrieval, code execution and API interaction. By triggering external functions or APIs, LLMs can perform actions that were previously outside their standalone capabilities. This technique enables LLMs to act on their outputs, effectively bridging the gap between thought and action — similar to how humans use tools to accomplish various tasks. By introducing function calling, LLMs add determinism and factuality to the generation process, striking a balance between creativity and logic. This method allows LLMs to connect to internal systems and databases or even perform internet searches via connected browsers. Models like OpenAI's GPT series support function calling and fine-tuned models like Gorilla are specifically designed to enhance the accuracy and consistency of generating executable API calls from natural language instructions. As a technique, function calling fits within retrieval-augmented generation (RAG) and agent architectures. It should be viewed as an abstract pattern of use, emphasizing its potential as a foundational tool in diverse implementations rather than a specific solution.

  • 8. LLM as a judge

    Many systems we build have two key characteristics: being able to provide an answer based on questions about a large data set, and being next to impossible to follow how it arrived at that answer. Despite this opacity we still want to assess and improve the quality of the responses. With the LLM as a judge pattern we use an LLM to evaluate the responses of another system, which in turn might be based on an LLM. We've seen this pattern used to evaluate the relevance of search results in a product catalog and to assess whether an LLM-based chatbot was guiding its users in a sensible direction. Naturally, the evaluator system must be set up and calibrated carefully. It can drive significant efficiency gains, which, in turn, translates to lower costs. This is an ongoing area of research, with the current state summarized in this article.

  • 9. Passkeys

    Shepherded by the FIDO alliance and backed by Apple, Google and Microsoft, passkeys are nearing mainstream usability. Setting up a new login with passkeys generates a key pair: the website receives the public key and the user keeps the private key. Handling login uses asymmetric cryptography. The user proves they're in possession of the private key, which is stored on the user’s device and never sent to the website. Access to passkeys is protected using biometrics or a PIN. Passkeys can be stored and synced within the big tech ecosystems, using Apple's iCloud Keychain, Google Password Manager or Windows Hello. For multiplatform users, the Client to Authenticator Protocol (CTAP) makes it possible for passkeys to be kept on a different device other than the one that creates the key or needs it for login. The most common objection to using passkeys claims that they are a challenge for less tech-savvy users, which is, we believe, self-defeating. These are often the same users who have poor password discipline and would therefore benefit the most from alternative methods. In practice, systems that use passkeys can fall back to more traditional authentication methods if required.

  • 10. Small language models

    Large language models (LLMs) have proven useful in many areas of applications, but the fact that they are large can be a source of problems: responding to a prompt requires a lot of compute resources, making queries slow and expensive; the models are proprietary and so large that they must be hosted in a cloud by a third party, which can be problematic for sensitive data; and training a model is prohibitively expensive in most cases. The last issue can be addressed with the RAG pattern, which side-steps the need to train and fine-tune foundational models, but cost and privacy concerns often remain. In response, we’re now seeing growing interest in small language models (SLMs). In comparison to their more popular siblings, they have fewer weights and less precision, usually between 3.5 billion and 10 billion parameters. Recent research suggests that, in the right context, when set up correctly, SLMs can perform as well as or even outperform LLMs. And their size makes it possible to run them on edge devices. We've previously mentioned Google's Gemini Nano, but the landscape is evolving quickly, with Microsoft introducing its Phi-3 series, for example.

  • 11. Synthetic data for testing and training models

    Synthetic data set creation involves generating artificial data that can mimic real-world scenarios without relying on sensitive or limited-access data sources. While synthetic data for structured data sets has been explored extensively (e.g., for performance testing or privacy-safe environments), we're seeing renewed use of synthetic data for unstructured data. Enterprises often struggle with a lack of labeled domain-specific data, especially for use in training or fine-tuning LLMs. Tools like Bonito and Microsoft's AgentInstruct can generate synthetic instruction-tuning data from raw sources such as text documents and code files. This helps accelerate model training while reducing costs and dependency on manual data curation. Another important use case is generating synthetic data to address imbalanced or sparse data, which is common in tasks like fraud detection or customer segmentation. Techniques such as SMOTE help balance data sets by artificially creating minority class instances. Similarly, in industries like finance, generative adversarial networks (GANs) are used to simulate rare transactions, allowing models to be robust in detecting edge cases and improving overall performance.

  • 12. Using GenAI to understand legacy codebases

    Generative AI (GenAI) and large language models (LLMs) can help developers write and understand code. Help with understanding code is especially useful in the case of legacy codebases with poor, out-of-date or misleading documentation. Since we last wrote about this, techniques and products for using GenAI to understand legacy codebases have further evolved, and we've successfully used some of them in practice, notably to assist reverse engineering efforts for mainframe modernization. A particularly promising technique we've used is a retrieval-augmented generation (RAG) approach where the information retrieval is done on a knowledge graph of the codebase. The knowledge graph can preserve structural information about the codebase beyond what an LLM could derive from the textual code alone. This is particularly helpful in legacy codebases that are less self-descriptive and cohesive. An additional opportunity to improve code understanding is that the graph can be further enriched with existing and AI-generated documentation, external dependencies, business domain knowledge or whatever else is available that can make the AI's job easier.

Assess ?

  • 13. AI team assistants

    AI coding assistance tools are mostly talked about in the context of assisting and enhancing an individual contributor's work. However, software delivery is and will remain team work, so you should be looking for ways to create AI team assistants that help create the 10x team, as opposed to a bunch of siloed AI-assisted 10x engineers. Fortunately, recent developments in the tools market are moving us closer to making this a reality. Unblocked is a platform that pulls together all of a team's knowledge sources and integrates them intelligently into team members' tools. And Atlassian's Rovo brings AI into the most widely used team collaboration platform, giving teams new types of search and access to their documentation, in addition to unlocking new ways of automation and software practice support with Rovo agents. While we wait for the market to further evolve in this space, we've been exploring the potential of AI for knowledge amplification and team practice support ourselves: We open-sourced our Haiven team assistant and started gathering learnings with AI assistance for noncoding tasks like requirements analysis.

  • 14. Dynamic few-shot prompting

    Dynamic few-shot prompting builds upon few-shot prompting by dynamically including specific examples in the prompt to guide the model's responses. Adjusting the number and relevance of these examples optimizes context length and relevancy, thereby improving model efficiency and performance. Libraries like scikit-llm implement this technique using nearest neighbor search to fetch the most relevant examples aligned with the user query. This technique lets you make better use of the model’s limited context window and reduce token consumption. The open-source SQL generator vanna leverages dynamic few-shot prompting to enhance response accuracy.

  • 15. GraphQL for data products

    GraphQL for data products is the technique of using GraphQL as an output port for data products for clients to consume the product. We've talked about GraphQL as an API protocol and how it enables developers to create a unified API layer that abstracts away the underlying data complexity, providing a more cohesive and manageable interface for clients. GraphQL for data products makes it seamless for consumers to discover the data format and relationships with GraphQL schema and use familiar client tools. Our teams are exploring this technique in specific use cases like talk-to-data to explore and discover big data insights with the help of large language models, where the GraphQL queries are constructed by LLMs based on the user prompt and the GraphQL schema is used in the LLM prompts for reference.

  • 16. LLM-powered autonomous agents

    LLM-powered autonomous agents are evolving beyond single agents and static multi-agent systems with the emergence of frameworks like Autogen and CrewAI. This technique allows developers to break down a complex activity into several smaller tasks performed by agents where each agent is assigned a specific role. Developers can use preconfigured tools for performing the task, and the agents converse among themselves and orchestrate the flow. The technique is still in its early stages of development. In our experiments, our teams have encountered issues like agents going into continuous loops and uncontrolled behavior. Libraries like LangGraph offer greater control of agent interactions with the ability to define the flow as a graph. If you use this technique, we suggest implementing fail-safe mechanisms, including timeouts and human oversight.

  • 17. Observability 2.0

    Observability 2.0 represents a shift from traditional, disparate monitoring tools to a unified approach that leverages structured, high-cardinality event data in a single data store. This model captures rich, raw events with detailed metadata to provide a single source of truth for comprehensive analysis. By storing events in their raw form, it simplifies correlation and supports real-time and forensic analysis and enables deeper insights into complex, distributed systems. This approach allows for high-resolution monitoring and dynamic investigation capabilities. Observability 2.0 prioritizes capturing high-cardinality and high-dimensional data, allowing detailed examination without performance bottlenecks. The unified data store reduces complexity, offering a coherent view of system behavior, and aligning observability practices more closely with the software development lifecycle.

  • 18. On-device LLM inference

    Large language models (LLMs) can now run in web browsers and on edge devices like smartphones and laptops, enabling on-device AI applications. This allows for secure handling of sensitive data without cloud transfer, extremely low latency for tasks like edge computing and real-time image or video processing, reduced costs by performing computations locally and functionality even when internet connectivity is unreliable or unavailable. This is an active area of research and development. Previously, we highlighted MLX, an open-source framework for efficient machine learning on Apple silicon. Other emerging tools include Transformers.js and Chatty. Transformers.js lets you run transformers in the browser using ONNX Runtime, supporting models converted from PyTorch, TensorFlow and JAX. Chatty leverages WebGPU to run LLMs natively and privately in the browser, offering a feature-rich in-browser AI experience.

  • 19. Structured output from LLMs

    Structured output from LLMs refers to the practice of constraining a language model's response into a defined schema. This can be achieved either through instructing a generalized model to respond in a particular format or by fine-tuning a model so it "natively" outputs, for example, JSON. OpenAI now supports structured output, allowing developers to supply a JSON Schema, pydantic or Zod object to constrain model responses. This capability is particularly valuable for enabling function calling, API interactions and external integrations, where accuracy and adherence to a format are critical. Structured output not only enhances the way LLMs can interface with code but also supports broader use cases like generating markup for rendering charts. Additionally, structured output has been shown to reduce the chance of hallucinations within model output.

Hold ?

  • 20. Complacência com código gerado por IA

    Assistentes de codificação de IA como GitHub Copilot e Tabnine tornaram-se muito populares. De acordo com a pesquisa de desenvolvedoras da StackOverflow em 2024, 72% de todas as pessoas entrevistadas são favoráveis ou muito favoráveis às ferramentas de IA para desenvolvimento. Embora também vejamos seus benefícios, estamos cautelosas sobre o impacto de médio a longo prazo que isso terá na qualidade do código e alertamos as desenvolvedoras sobre a complacência com o código gerado por IA. É muito tentador sermos menos vigilantes ao revisar sugestões de IA após algumas experiências positivas com um assistente. Estudos como este da GitClear mostram uma tendência de bases de código crescendo mais rapidamente, o que suspeitamos coincidir com pull requests maiores. E este estudo do GitHub nos faz questionar se o aumento mencionado de 15% na taxa de merge de pull requests é realmente uma coisa boa ou se as pessoas estão mesclando pull requests maiores mais rapidamente porque confiam demais nos resultados da IA. Ainda estamos usando o conselho que compartilhamos há mais de um ano de como iniciar o uso, que é ter cuidado com o viés de automação, a falácia do custo irrecuperável, o viés de ancoragem e a fadiga de revisão. Também recomendamos que as programadoras desenvolvam uma boa estrutura mental sobre onde e quando não usar e confiar na IA.

  • 21. Ambientes de testes de integração em toda empresa

    A criação de ambientes de testes de integração em toda empresa é uma prática comum e ineficiente que deixa tudo mais lento. Esses ambientes invariavelmente se tornam um recurso precioso, difícil de replicar e um gargalo no desenvolvimento. Eles também dão uma falsa sensação de segurança devido a inevitável discrepância nos dados e na sobrecarga de configuração entre os ambientes. Ironicamente, uma objeção comum às alternativas — como ambientes efêmeros ou múltiplos ambientes de testes locais — é o custo. No entanto, isso não leva em conta o custo de atraso causado pela utilização de ambientes de testes de integração em toda companhia, como times de desenvolvimento esperando por outros times terminarem ou por novas versões de sistemas dependentes serem implementados. Em vez disso, as equipes devem utilizar ambientes efêmeros e, preferencialmente, uma suíte de testes mantida pela equipe de desenvolvimento, que pode ser criada e descartada de forma barata, usando stubs falsos para seus sistemas em vez de réplicas reais. Para outras técnicas que suportam esta alternativa veja sobre testes de contrato, desacoplamento da implantação do lançamento, foco no tempo médio de recuperação e testes em produção.

  • 22. Proibições de LLM

    Em vez de instituir proibições gerais de LLM no local de trabalho, as organizações devem se concentrar em fornecer acesso a um conjunto aprovado de ferramentas de IA. Uma proibição apenas leva as funcionárias a encontrarem soluções alternativas não aprovadas e potencialmente inseguras, criando riscos desnecessários. Assim como nos primeiros dias da computação pessoal, as pessoas usarão quaisquer ferramentas que considerem eficazes para realizar seu trabalho, independentemente das barreiras existentes. Ao não fornecer uma alternativa segura e endossada, as empresas correm o risco de que as funcionárias usem LLMs não aprovados, o que traz riscos de propriedade intelectual, vazamento de dados e responsabilidade legal. Em vez disso, oferecer LLMs ou ferramentas de IA seguras e aprovadas pela empresa garante tanto a segurança quanto a produtividade. Uma abordagem bem governada permite que as organizações gerenciem preocupações com privacidade de dados, segurança, conformidade e custos, ao mesmo tempo em que empoderam as funcionárias com as capacidades que LLMs oferecem. No melhor cenário, o acesso bem gerenciado às ferramentas de IA pode acelerar o aprendizado organizacional sobre as melhores maneiras de usar a IA no local de trabalho.

  • 23. Substituição da programação em pares por IA

    Quando as pessoas falam sobre assistentes de programação, o tópico programação em pares inevitavelmente aparece. Nossa profissão tem uma relação de amor e ódio com isso: algumas pessoas acreditam veemente, outras não suportam. Assistentes de programação agora imploram pela pergunta: pode um ser humano parear com uma IA, ao invés de outro ser humano, e obter os mesmos resultados para o time? GitHub Copilot chama a si mesmo deseu programador em par de IA. Enquanto acreditamos que um assistente de programação pode trazer alguns dos benefícios da programação em pares, nós aconselhamos totalmente contra substituir programação em pares por IA. Visualizar assistentes de código para o pareamento ignora um dos benefícios chave da programação em pares: fazer o time, não apenas os indivíduos que contribuem, melhor. Assistentes de código podem oferecer benefícios para se desbloquear, aprender sobre uma nova tecnologia, integrar ou tornar o trabalho tático mais rápido para que possamos focar em design estratégico. Mas eles não ajudam com nenhum dos benefícios de colaboração de time, como manter em baixo nível o trabalho em progresso, reduzir handoffs e reaprendizados, tornando a integração contínua possível ou melhorando a propriedade coletiva de código.

Unable to find something you expected to see?

 

Each edition of the Radar features blips reflecting what we came across during the previous six months. We might have covered what you are looking for on a previous Radar already. We sometimes cull things just because there are too many to talk about. A blip might also be missing because the Radar reflects our experience, it is not based on a comprehensive market analysis.

Download the PDF

 

 

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

 

Subscribe now

Visit our archive to read previous volumes