Fine-tuning is a valuable technique in the world of generative AI. It’s used to adapt large language models (LLMs) for specialized use cases, allowing them to produce more relevant and useful responses.
It’s helping expand the possible use cases and applications of LLMs and providing a way for businesses to properly leverage the opportunities of generative AI.
Fine-tuning can make LLMs more reliable and also more domain-specific.
Fine-tuning can be resource-intensive and limit model flexibility.
It’s helping organizations develop more relevant chatbots and improve search tools, among other things.
What is LLM fine-tuning?
Imagine a large language model (LLM) as a super powerful student with a broad knowledge of language. Fine-tuning is like giving this student some extra training for a specific job. It works like this: first, the LLM goes through a general language learning phase, next, it’s given further training on data (documents, for example) that is specific to a given domain, field or organization.
It’s worth comparing to a similar technique like retrieval-augmented generation (RAG). However, with RAG you don’t train an LLM, you provide sources to refer to in addition to its training; with fine-tuning you are continuing to train the LLM.
What’s in it for you?
Fine-tuning LLMs brings several advantages to businesses:
Accuracy and relevance. By training LLMs on specific data, those LLMs will be better suited to helping solve more context-specific problems (like answering queries that are specific to an organization). A chatbot, for instance, could be fine-tuned to help customers with problems that are highly specific to your product or service.
It mitigates the risks of hallucinations. Fine-tuning can help make LLM applications more accurate, reliable and relevant to a given domain, which means businesses can have more confidence using them in real-world scenarios.
New use cases. Fine-tuning can help you to bring generative AI to new use cases, like internal knowledge management, that wouldn’t be possible using only existing public models like ChatGPT.
What are the trade-offs of fine-tuning LLMs?
Fine-tuning LLMs can be very useful in many contexts, but it's not without some drawbacks:
Fine-tuned models can become overly focused on the training data, struggling with new situations outside that specific context, like a student that only studies past exam papers. (This is what is referred to as overfitting.)
Fine-tuning often requires a lot of high-quality data for a specific task. This can be expensive or difficult to obtain, limiting its use for some applications.
Although great for specific tasks, fine-tuned models might lose the broader knowledge gained during pre-training. They become specialists, but not generalists.
Fine-tuning takes additional time and expertise compared to using a pre-trained LLM "off-the-shelf." There's a learning curve to using this technology effectively.
Fine-tuning offers precision but can be resource-intensive and limit overall flexibility. It's a good fit when accuracy in a specific domain is crucial, but consider the trade-offs before diving in — sometimes a smaller model can do a better job, faster, than one that has been fine-tuned. In Technology Radar Vol.30 we specifically warned against rushing to fine-tune LLMs.
How is LLM fine-tuning being used?
Fine-tuning is being used across many industries to leverage the power of LLMs for specific tasks. Here are a few examples:
Customer service: Chatbots can be fine-tuned to understand a company's specific products, services, and brand voice, offering more helpful and personalized interactions with customers.
Content creation: LLMs can be fine-tuned to generate different creative text formats, like marketing copy, poems, or even scripts, tailored to a specific brand or audience.
Legal industry: Law firms can fine-tune LLMs to analyze legal documents, identify relevant clauses, and summarize complex legal contracts, saving time and resources.
Software development: LLMs can be fine-tuned to write code, translate programming languages, or identify and fix bugs.
Scientific research: Researchers can fine-tune LLMs to analyze scientific data, generate research hypotheses, or even write scientific reports in a specific field.
Medicine: MedPaLM is a fine-tuned version of Google’s PaLM large language model designed specifically for the medical domain.
At Thoughtworks, we used fine-tuning to build a job matching tool for recruitment organization Bolt.Works. As we explain in our client story, “we fine-tuned an open-source, multilingual large language model which learned to match semantically similar jobs and workers based on the structured data generated by ChatGPT.”