LLM Guardrails | Technology Radar | Thoughtworks Ecuador

Technology Radar Vol 31

Download

Languages & Frameworks

New
Moved in/out
No change

Languages & Frameworks

Adopt

75. dbt

We continue to see dbt as a strong, sensible option for implementing data transformations in ELT pipelines. We like that it lends itself to engineering rigor and enables practices like modularity, testability and reusability of SQL-based transformations. dbt integrates well with many cloud data warehouses, lakehouses and databases — including Snowflake, BigQuery, Redshift, Databricks and Postgres — and has a healthy ecosystem of community packages surrounding it. The native support it recently introduced (in dbt core 1.8+ and the recently introduced dbt Cloud "versionless" experience) for unit testing further strengthens its position in our toolbox. Our teams appreciate that the new unit testing feature allows them to easily define static test data, set up output expectations and test both incremental and full-refresh modes of their pipelines. In many cases, this has allowed them to retire homegrown scripts while maintaining the same level of quality.

View blip history
Related blips
- Snowflake
  
  Trial
  
  Platforms
  April 2021
- BigQuery
  
  Trial
  
  Platforms
  May 2013
Share
76. Testcontainers

In our experience, Testcontainers are a useful default option for creating a reliable environment for running tests. It's a library, ported to multiple languages, that Dockerizes common test dependencies — including various types of databases, queuing technologies, cloud services and UI testing dependencies like web browsers — with the ability to run custom Dockerfiles when needed. Recently, a desktop version was released that allows for the visual management of test sessions and the ability to manage more complex scenarios which our teams have found very useful.

View blip history

Share

Trial

77. CAP

CAP is a .NET library that implements the Outbox pattern. When working with distributed messaging systems like RabbitMQ or Kafka, we frequently face the challenge of ensuring that database updates and event publications are performed atomically. CAP addresses this challenge by recording the intent to publish the event in the same database transaction that caused the event. We find CAP to be quite useful as it supports several databases and messaging platforms while guaranteeing at-least-once delivery.

View blip history

Share
78. CARLA

CARLA is an open-source simulator for autonomous driving research used to test autonomous driving systems before production deployment. It offers flexibility in creating and reusing 3D models of vehicles, terrain, humans, animals and more, making it possible to simulate scenarios like a pedestrian stepping onto the street or encountering an oncoming vehicle at a specific speed. The autonomous driving system under test must recognize these dynamic actors and take appropriate action — such as braking. Our teams use CARLA for the ongoing development and testing of autonomous driving systems.

View blip history

Share
79. Databricks Asset Bundles

Databricks Asset Bundles (DABs), which reached general availability in April 2024, is becoming the go-to tool for packaging and deploying Databricks assets that facilitates the adoption of software engineering practices in our data teams. DABs supports packaging the configuration of workflows and tasks, as well as the code to be executed in those tasks, as a bundle that can be deployed to multiple environments through CI/CD pipelines. It comes with templates for common types of assets and supports custom templates, which allows for the creation of tailored service templates for data engineering and ML projects. Our teams have increasingly adopted it as a key part of their engineering workflows. Even though DABs includes templates for notebooks and supports deploying them to production, we don't recommend productionizing notebooks and instead encourage intentionally writing production code with the engineering practices that support the maintainability, resiliency and scalability needs of such workloads.

View blip history
Related blips
- Tailored service templates
  
  Adopt
  
  Techniques
  October 2020
- Productionizing notebooks
  
  Hold
  
  Techniques
  October 2020
Share
80. Instructor

When we use large language model (LLM) chatbots as end users, they usually come back to us with an unstructured natural language answer. When building GenAI applications that are more than chatbots, it can be useful to ask the LLM for a structured answer in JSON, YAML or other formats, and then parse and use that response in the application. However, as LLMs are nondeterministic, they might not always do what we ask them to do. Instructor is a library that can be used to help us request structured output from LLMs. You can define the intended output structure and configure retries if the LLM doesn't return the structure you asked for. As the best end user experience for working with LLMs is often to stream the results to them instead of waiting for the full response, Instructor also takes care of parsing partial structures from a stream.

View blip history
Related blips
- Structured output from LLMs
  
  Assess
  
  Techniques
  October 2024
Share
81. Kedro

Kedro has significantly improved as a tool for MLOps and has maintained its focus on modularity and engineering practices, which we liked from the start. One step that highlights its modularity is the introduction of the standalone kedro-datasets package, which decouples code from data. Kedro has added enhancements in its CLI, starter project templates and telemetry capabilities. Additionally, the recent release of a VS Code extension is a good boost to the developer experience.

View blip history

Share
82. LiteLLM

LiteLLM is a library for seamless integration with various large language model (LLM) providers' APIs that standardizes interactions through an OpenAI API format. It supports an extensive array of providers and models and offers a unified interface for completion, embedding and image generation. LiteLLM simplifies integration by translating inputs to match each provider's specific endpoint requirements. It also provides a framework needed to implement many of the operational features needed in a production application such as caching, logging, rate limiting and load balancing. This ensures uniform operation across different LLMs. Our teams are using LiteLLM to make it easier to swap various models in and out — a necessary feature in today's landscape where models are evolving quickly. It's crucial to acknowledge when doing this that model responses to identical prompts vary, indicating that a consistent invocation method alone may not fully optimize completion performance. Also, each model implements add-on features uniquely and a single interface may not suffice for all. For example, one of our teams had difficulty taking advantage of function calling in an AWS Bedrock model while proxying through LiteLLM.

View blip history

Share
83. LlamaIndex

LLamaIndex includes engines that enable you to design domain-specific, context-augmented LLM applications and support tasks like data ingestion, vector indexing and natural language question-answering on documents, to name a few. Our teams used LlamaIndex to build a retrieval-augmented generation (RAG) pipeline that automates document ingestion, indexes document embeddings and queries these embeddings based on user input. Using LlamaHub, you can extend or customize LlamaIndex modules to suit your needs and build, for example, LLM applications with your preferred LLMs, embeddings and vector store providers.

View blip history
Related blips
- Retrieval-augmented generation (RAG)
  
  Adopt
  
  Techniques
  October 2024
Share
84. LLM Guardrails

LLM Guardrails is a set of guidelines, policies or filters designed to prevent large language models (LLMs) from generating harmful, misleading or irrelevant content. The guardrails can also be used to safeguard LLM applications from malicious users attempting to misuse the system with techniques like input manipulation. They act as a safety net by setting boundaries for the model to process and generate content. There are some emerging frameworks in this space like NeMo Guardrails, Guardrails AI and Aporia Guardrails our teams have been finding useful. We recommend every LLM application have guardrails in place and that its rules and policies be continuously improved. Guardrails are crucial for building responsible and trustworthy LLM chat apps.

View blip history
Related blips
- NeMo Guardrails
  
  Assess
  
  Tools
  April 2024
Share
85. Medusa

In our experience, most e-commerce solutions for building shopping websites usually fall into the 80/20 trap — we can easily build 80% of what we want but can't do anything about the remaining 20%. Medusa offers a good balance. It's a highly customizable open-source commerce platform that allows developers to create unique and tailored shopping experiences that can be self-hosted or run on Medusa’s platform. Built on Next.js and PostgreSQL, Medusa accelerates the development process with its comprehensive range of modules — from basic shopping cart and order management to advanced features like gift card modules and tax calculation for different regions. We've found Medusa to be a valuable framework and applied it to a few projects.

View blip history
Related blips
- Next.js
  
  Trial
  
  Languages & Frameworks
  April 2021
Share
86. Pkl

Pkl is an open-source configuration language and tooling initially created for use internally at Apple. Its key feature is its type and validation system, allowing configuration errors to be caught prior to deployment. Pkl has enabled our teams to reduce code duplication (for cases such as environment overrides) and perform validation before configuration changes are applied to live environments. It generates JSON, PLIST, YAML and .properties files and has extensive IDE and language integration, including code generation.

View blip history

Share
87. ROS 2

ROS 2 is an open-source framework designed for the development of robotic systems. It provides a set of libraries and tools that enable the modular implementation of applications, covering functions like inter-process communication, multithreaded execution and quality of service. ROS 2 builds on its predecessor by providing improved real-time capabilities, better modularity, increased support for diverse platforms and sensible defaults. ROS 2 is gaining traction in the automotive industry; its node-based architecture and topic-based communication model are especially attractive for manufacturers with complex, evolving in-vehicle applications, such as autonomous driving functionality.

View blip history
Related blips
- Cross mobile platforms
  
  Assess
  
  Tools
  January 2011
Share
88. seL4

In software-defined vehicles (SDV) or other safety-critical scenarios, the real-time stability of the operating system is crucial. A few companies monopolize this field due to its high entry barriers, so open-source solutions like seL4 are precious. seL4 is a high-assurance, high-performance operating system microkernel. It uses formal verification methods to "mathematically" ensure the operating system's behavior complies with the specification. Its microkernel architecture also minimizes core responsibilities to ensure system stability. We've seen EV companies like NIO engage with the seL4 ecosystem, and there may be more development in this area in the future.

View blip history

Share
89. SetFit

Most of the current crop of AI-based tools are generative — they generate text and images and use generative pre-trained transformers (GPTs) to do so. For use cases that require working with existing text — to classify pieces of text or to determine intent — sentence transformers are the tool of choice. In this field, SetFit is a framework for fine-tuning sentence transformers. We like SetFit because it uses contrastive learning to separate different intent classes from each other, often achieving clear separation with a very small number of examples, even 25 or less. Sentence transformers can also play a role in a generative AI system. We've successfully used SetFit for intent detection in a customer-facing chatbot system that uses an LLM, and even though we're aware of OpenAI's moderation API, we chose a classifier based on SetFit to perform additional fine-tuning to achieve stricter filtering.

View blip history

Share
90. vLLM

vLLM is a high-throughput, memory-efficient inference engine for LLMs that can run in the cloud or on-premise. It seamlessly supports multiple model architectures and popular open-source models. Our teams deploy dockerized vLLM workers on GPU platforms like NVIDIA DGX and Intel HPC, hosting models such as Llama 3.1(8B and 70B), Mistral 7B and Llama-SQL for developer coding assistance, knowledge search and natural language database interactions. vLLM is compatible with the OpenAI SDK standard, facilitating consistent model serving. Azure's AI Model Catalog uses a custom inference container to enhance model serving performance, with vLLM as the default inference engine due to its high throughput and efficient memory management. The vLLM framework is emerging as a default for large-scale model deployments.

View blip history

Share

Assess

91. Apache XTable™

Among the available open table formats that enable lakehouses — such as Apache Iceberg, Delta and Hudi — no clear winner has emerged. Instead, we’re seeing tooling to enable interoperability between these formats. Delta UniForm, for example, enables single-directional interoperability by allowing Hudi and Iceberg clients to read Delta tables. Another new entrant to this space is Apache XTable™, an Apache incubator project that facilitates omnidirectional interoperability across Hudi, Delta and Iceberg. Like UniForm, it converts metadata among these formats without creating a copy of the underlying data. XTable could be useful for teams experimenting with multiple table formats. However, for long-term use, given the difference in the features of these formats, relying heavily on omnidirectional interoperability could result in teams only being able to use the "least common denominator" of features.

View blip history
Related blips
Share
92. dbldatagen

Preparing test data for data engineering is a significant challenge. Transferring data from production to test environments can be risky, so teams often rely on fake or synthetic data instead. In this Radar, we explored novel approaches like synthetic data for testing and training models. But most of the time, lower-cost procedural generation is enough. dbldatagen (Databricks Labs Data Generator) is such a tool; it’s a Python library for generating synthetic data within the Databricks environment for testing, benchmarking, demoing and many other uses. dbldatagen can generate synthetic data at scale, up to billions of rows within minutes, supporting various scenarios such as multiple tables, change data capture and merge/join operations. It can handle Spark SQL primitive types well, generate ranges and discrete values and apply specified distributions. When creating synthetic data using the Databricks ecosystem, dbldatagen is an option worth evaluating.

View blip history
Related blips
- Synthetic data for testing and training models
  
  Trial
  
  Techniques
  October 2024
Share
93. DeepEval

DeepEval is an open-source python-based evaluation framework, for evaluating LLM performance. You can use it to evaluate retrieval-augmented generation (RAG) and other kinds of apps built with popular frameworks like LlamaIndex or LangChain, as well as to baseline and benchmark when you're comparing different models for your needs. DeepEval provides a comprehensive suite of metrics and features to assess LLM performance, including hallucination detection, answer relevancy and hyperparameter optimization. It offers integration with pytest and, along with its assertions, you can easily integrate the test suite in a continuous integration (CI) pipeline. If you're working with LLMs, consider trying DeepEval to improve your testing process and ensure the reliability of your applications.

View blip history
Related blips
- LlamaIndex
  
  Trial
  
  Languages & Frameworks
  October 2024
- LangChain
  
  Hold
  
  Languages & Frameworks
  April 2024
Share
94. DSPy

Most language model-based applications today rely on prompt templates hand-tuned for specific tasks. DSPy, a framework for developing such applications, takes a different approach that does away with direct prompt engineering. Instead, it introduces higher-level abstractions oriented around program flow (through modules that can be layered on top of each other), metrics to optimize towards and data to train or test with. It then optimizes the prompts or weights of the underlying language model based on those defined metrics. The resulting codebase looks much like the training of neural networks with PyTorch. We find the approach it takes refreshing for its different take and think it's worth experimenting with.

View blip history
Related blips
- PyTorch
  
  Adopt
  
  Languages & Frameworks
  April 2023
Share
95. Flutter for Web

Flutter is known for its cross-platform support for iOS and Android applications. Now, it has expanded to more platforms. We've evaluated Flutter for Web previously — it allows us to build apps for iOS, Android and the browser from the same codebase. Not every web application makes sense in Flutter, but we think Flutter is particularly suited for cases like progressive web apps, single-page apps and converting existing Flutter mobile apps to the web. Flutter had already supported WebAssembly (WASM) as a compilation target in its experimental channel, which means it was under active development with potential bugs and performance issues. The most recent releases have made it stable. The performance of a Flutter web application compiled to its WASM target is far superior to its JavaScript compilation target. The near-native performance on different platforms is also why many developers initially choose Flutter.

View blip history
Related blips
Share
96. kotaemon

kotaemon is an open-source RAG-based tool and framework for building Q&A apps for knowledge base documents. It can understand multiple document types, including PDF and DOC formats, and provides a web UI, based on Gradio, which allows users to organize and interact with a knowledge base via a chat interface. It has built-in RAG pipelines with a vector store and can be extended with SDKs. kotaemon also cites the source documents in its responses, along with web-based inline previews and a relevance score. For anyone wanting to do a RAG-based document Q&A application, this customizable framework is a very good starting point.

View blip history
Related blips
- Gradio
  
  Trial
  
  Tools
  April 2024
Share
97. Lenis

Lenis is a lightweight and powerful smooth scrolling library designed for modern browsers. It enables smooth scrolling experiences, such as WebGL scroll syncing and parallax effects, which makes it ideal for teams building pages with fluid, seamless scroll interactions. Our developers found Lenis simple to use, offering a streamlined approach for creating smooth scrolls. However, the library can have issues with accessibility, particularly with vertical and horizontal scrolling interactions, which could confuse users with disabilities. While visually appealing, it needs careful implementation to maintain accessibility.

View blip history

Share
98. LLMLingua

LLMLingua enhances LLM efficiency by compressing prompts using a small language model to remove nonessential tokens with minimal performance loss. This approach allows LLMs to maintain reasoning and in-context learning while efficiently processing longer prompts, which addresses challenges like cost efficiency, inference latency and context handling. Compatible with various LLMs without additional training and supporting frameworks like LLamaIndex, LLMLingua is great for optimizing LLM inference performance.

View blip history
Related blips
- LlamaIndex
  
  Trial
  
  Languages & Frameworks
  October 2024
Share
99. Microsoft Autogen

Microsoft Autogen is an open-source framework that simplifies the creation and orchestration of AI agents, enabling multi-agent collaboration to solve complex tasks. It supports both autonomous and human-in-the-loop workflows, while offering compatibility with a range of LLMs and tools for agent interaction. One of our teams used Autogen for a client to build an AI-powered platform where each agent represented a specific skill, such as code generation, code review or summarizing documentation. The framework enabled the team to create new agents seamlessly and consistently by defining the right model and workflow. They leveraged LlamaIndex to orchestrate workflows, allowing agents to manage tasks like product search and code suggestions efficiently. While Autogen has shown promise, particularly in production environments, concerns about scalability and managing complexity as more agents are added remain. Further assessment is needed to evaluate its long-term viability in scaling agent-based systems.

View blip history
Related blips
- LlamaIndex
  
  Trial
  
  Languages & Frameworks
  October 2024
Share
100. Pingora

Pingora is a Rust framework to build fast, reliable and programmable network services. Originally developed by Cloudflare to address Nginx's shortcomings, Pingora is already showing great potential, as newer proxies like River are being built on its foundation. While most of us don't face Cloudflare's level of scale, we do encounter scenarios where flexible application-layer routing is essential for our network services. Pingora's architecture allows us to leverage the full power of Rust in these situations without sacrificing security or performance.

View blip history

Share
101. Ragas

Ragas is a framework designed to evaluate the performance of retrieval-augmented generation (RAG) pipelines, addressing the challenge of assessing both retrieval and generation components in these systems. It provides structured metrics such as faithfulness, answer relevance and context utilization which help evaluate the effectiveness of RAG-based systems. Our developers found it useful for running periodic evaluations to fine-tune parameters like top-k retrievals and embedding models. Some teams have integrated Ragas into pipelines that run daily, whenever the prompt template or the model changes. While its metrics offer solid insights, we’re concerned that the framework may not capture all the nuances and intricate interactions of complex RAG pipelines, and we recommend considering additional evaluation frameworks. Nevertheless, Ragas stands out for its ability to streamline RAG assessment in production environments, offering valuable data-driven improvements.

View blip history
Related blips
- Retrieval-augmented generation (RAG)
  
  Adopt
  
  Techniques
  October 2024
Share
102. Score

Many organizations that implement their own internal development platforms tend to create their own platform orchestration systems to enforce organizational standards among developers and their platform hosting teams. However, the basic features of a paved-road deployment platform for hosting container workloads in a safe, consistent and compliant manner are similar from one organization to another. Wouldn't it be nice if we had a shared language for specifying those requirements? Score is showing some promise of becoming a standard in this space. It’s a declarative language in the form of YAML that describes how a containerized workload should be deployed and which specific services and parameters it will need to run. Score was originally developed by Humanitec as the configuration language for their Platform Orchestrator product, but it is now under custodianship of the Cloud Native Computing Foundation (CNCF) as an open-source project. With the backing of the CNCF, Score has the potential to be more widely used beyond the Humanitec product. It has been released with two reference implementations: Kubernetes and Docker Compose. The extensibility of Score will hopefully lead to community contributions for other platforms. Score certainly bears a resemblance to the Open Application Model (OAM) specification for Kubevela, but it's more focused on the deployment of container workloads than the entire application. There is also some overlap with SST, but SSI is more concerned with deployment directly into a cloud infrastructure rather than onto an internal engineering platform. We're watching Score with interest as it evolves.

View blip history
Related blips
- Platform orchestration
  
  Assess
  
  Techniques
  September 2023
- Open Application Model (OAM)
  
  Assess
  
  Techniques
  April 2021
Share
103. shadcn

shadcn challenges the traditional concept of component libraries by offering reusable, copy-and-paste components that become part of your codebase. This approach gives teams full ownership and control, enabling easier customization and extension — areas where more popular conventional libraries like MUI and Chakra UI often fall short. Built with Radix UI and Tailwind CSS, shadcn integrates seamlessly into any React-based application, which makes it a good fit for projects prioritizing control and extensibility. It includes a CLI to help in the process of copying and pasting the components into your project. Its benefits also include reducing hidden dependencies and avoiding tightly coupled implementations, which is why shadcn is gaining traction as a compelling alternative for teams seeking a more hands-on, adaptable approach to front-end development.

View blip history
Related blips
- Chakra UI
  
  Trial
  
  Languages & Frameworks
  October 2021
- Tailwind CSS
  
  Trial
  
  Languages & Frameworks
  October 2021
Share
104. Slint

Slint is a declarative GUI framework for building native user interfaces for Rust, C++ or JavaScript applications. Although it’s a multiplatform UI framework with important features such as live preview, responsive UI design, VS Code integration and a native user experience, we particularly want to highlight its usefulness for embedded systems. Teams developing embedded applications have traditionally faced a limited number of options for UI development, each with its own trade-offs. Slint offers the perfect balance between developer experience and performance, using an easy-to-use, HTML-like markup language and compiling directly to machine code. At run time, it also boasts a low-resources footprint, which is critical for embedded systems. In short, we like Slint because it brings proven practices from web and mobile development to the embedded ecosystem.

View blip history

Share
105. SST

SST is a framework for deploying applications into a cloud environment along with provisioning all the services that the application needs to run. SST is not just an IaC tool; it's a framework with a TypeScript API that enables you to define your application environment, a service that deploys your application when triggered on a Git push as well as a GUI console to manage the resulting application and invoke the SST management features. Although SST was originally based on AWS Cloud Formation and CDK, its latest version has been implemented on top of Terraform and Pulumi so that, in theory, it's cloud agnostic. SST has native support for deploying several standard web application frameworks, including Next.js and Remix, but also supports headless API applications. SST appears to be in a category of its own. While it bears some resemblance to platform orchestration tools like Kubevela, it also provides developer conveniences like a live mode that proxies AWS Lambda invocations back to a function running on the developer's local machine. Right now, SST remains a bit of a curiosity, but it is a project and part of a category of tools worth watching as it evolves.

View blip history
Related blips
Share

Hold

No blips

Unable to find something you expected to see?

Each edition of the Radar features blips reflecting what we came across during the previous six months. We might have covered what you are looking for on a previous Radar already. We sometimes cull things just because there are too many to talk about. A blip might also be missing because the Radar reflects our experience, it is not based on a comprehensive market analysis.

languages-and-frameworks

Unable to find something you expected to see?

.all-quadrants.js-enabled { display: none; }

Download the PDF

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

Subscribe now

Services

Industries

Resource Hubs

Publications and Tools

All Insights

Languages & Frameworks

Languages & Frameworks

Languages & Frameworks

Adopt ? We feel strongly that the industry should be adopting these items. We use them when appropriate on our projects.

75. dbt

76. Testcontainers

Trial ? Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk.

77. CAP

78. CARLA

79. Databricks Asset Bundles

80. Instructor

81. Kedro

82. LiteLLM

83. LlamaIndex

84. LLM Guardrails

85. Medusa

86. Pkl

87. ROS 2

88. seL4

89. SetFit

90. vLLM

Assess ? Worth exploring with the goal of understanding how it will affect your enterprise.

91. Apache XTable™

92. dbldatagen

93. DeepEval

94. DSPy

95. Flutter for Web

96. kotaemon

97. Lenis

98. LLMLingua

99. Microsoft Autogen

100. Pingora

101. Ragas

102. Score

103. shadcn

104. Slint

105. SST

Hold ? Proceed with caution

Download the PDF

Sign up for the Technology Radar newsletter

Visit our archive to read previous volumes

Adopt

Trial

Assess

Hold