Using AI to unleash the potential of preclinical data
As one of the biggest names in the life sciences market, Bayer has an extensive drug development process spanning multiple stages to ensure drug safety and efficacy. It’s a process that can take more than a decade to complete, and each step generates a significant amount of data that must be stored responsibly while remaining easy for researchers to access and reference.
For almost four years, Thoughtworks has worked together with Bayer to make the most of its preclinical data by improving access and analysis of past and ongoing drug development programs. Initially Bayer built a preclinical information center (PRINCE, a modern data platform built on AWS cloud), as a one-stop shop for all preclinical data. One strength of the platform is that it combines structured and unstructured data from around 17,000 completed study reports and their associated metadata. Whenever researchers need to refer to a previous study of a specific compound or species, they can easily find it in PRINCE.
When the platform was introduced, it accelerated the way scientists worked with preclinical data significantly and made it more convenient for researchers to find the insights they needed. But with a goldmine of preclinical information available through the platform, Bayer recognized this was just the beginning of what could be achieved.
A more intuitive way to search preclinical data
Bayer and Thoughtworks set out to capitalize on the wealth of data available in PRINCE and create a fast and accessible way to search the platform’s content.
While PRINCE was already filled with valuable data, much of it was stored in unstructured formats such as PDFs, so its value was largely unexplored. Generative AI presented an opportunity to unlock the value trapped within that unstructured data and enable teams to easily explore the research.
At the end of 2023, the team started developing a chatbot on top of the PRINCE platform with the aim to:
- Allow researchers and data scientists to search unstructured data by asking simple questions.
- Help project managers find the information they need to create documents for health authority interactions.
- Design new studies based on existing knowledge.
- Uncover specific findings and other crucial information from a vast pool of documents.
Due to the sensitive nature of the data, the chatbot needed to conform to rigorous data governance and compliance standards. And with search results potentially guiding the outcomes of future projects, the chatbot’s answers needed to remain accurate and as close as possible to the specific wording used in the source documents.
An intelligent chatbot for effortless searching
Thoughtworks collaborated with Bayer’s team to build a chatbot on PRINCE, using LLMs hosted by myGenAssist, Bayer’s GenAI platform. Initially rolled out to a pilot group of researchers, the chatbot was later productionized after incorporating their feedback.
Through an intuitive and convenient interface, researchers can ask the chatbot questions about previous projects, specific compounds, or historical reports stored in the platform. The chatbot then generates answers based on Bayer content stored in PRINCE. The context is set by using a Retrieval-Augmented Generation (RAG) pipeline, which uses preclinical reports as embeddings to identify the relevant context.
Every answer from the chatbot also includes the original source of the information, so researchers can explore the data in greater detail and verify its legitimacy and relevance. Meanwhile, the pipeline includes the Langfuse observability platform to continuously monitor the chatbot’s output and ensure it delivers consistently reliable results.
Bayer’s chatbot is particularly useful in cases where researchers from multiple functions collaborate on projects. In a hypothetical example, during project review meetings, researchers could ask the chatbot to bring up any studies related to the topic they’re discussing — whether it’s around a safety-related finding or a specific compound — and present the information to the group. The chatbot correlates any findings with other reports and visualizations stored in PRINCE.
The Chatbot uplifts PRINCE to the next level of usability and performance. The Chatbot helps to unravel the wealth of our internal knowledge, which cannot be easily retrieved by conventional query strategies using established key words and search fields. The Chatbot’s capabilities go far beyond existing controlled vocabulary and will greatly facilitate the checking of safety-related hypotheses. A safety-search engine is at the gates.
Evolving the chatbot into an agent-like research assistant
While the chatbot is currently acting as a Q&A tool to simplify searching in PRINCE, it will evolve into a much more intelligent assistant for researchers over time. In the future, the chatbot will also be able to utilize external content from reliable sources such as PubMed, enabling Bayer’s researchers to consider a much wider range of sources than its own data sets.
In addition, the project team plans to develop the PRINCE chatbot into an assistant that offers proactive support for researchers, project managers, and data scientists. In addition to producing readily available insights during projects, the chatbot will be able to generate ideas for drug discovery plans, answer more complex queries, and create report summaries for Bayer’s teams.
I've been using the PRINCE Chatbot for a few weeks and it's fantastic! It quickly finds studies, summarizes them, and extracts the main conclusions. It also allows complex comparisons. For example, I used it to find a study on DNA strand breaks with higher exposure in female rats than in male rats, which greatly aided in planning my current genotoxicity study.