Open-source LLMs for coding

Technology Radar

Published : Sep 27, 2023

NOT ON THE CURRENT EDITION

This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more

Sep 2023

Assess

GitHub Copilot is a valuable tool for coding assistance while developing software. Under the hood, LLMs can power seamless developer experiences through inline code assistance, code fine-tuning, conversational support in the IDE and much more. Most of these models are proprietary and can only be used via subscription services. The good news is you can use several open-source LLMs for coding. If you’re in a space where you need to build your own coding assistance service (such as a highly regulated industry), look at models like StarCoder and WizardCoder. StarCoder is trained with a large data set maintained by BigCode, and Wizardcoder is an Evol-Instruct tuned StarCoder model.

We've used StarCoder in our experiments and found it to be useful for generating structured software engineering elements such as code, YAML, SQL and JSON. Based on our experiments, we found both the models to be receptive to in-context learning using few-shot examples in the prompt. Nonetheless, for specific downstream tasks (such as SQL generation for a specific database like Postgres) the models needed fine-tuning. Recently, Meta unveiled its Code Llama, a code-specialized version of Llama 2. Be sure to proceed with caution when using these open-source models. Consider their license, the license of the code and of the data set used to train the model. Carefully assess these aspects before you choose any of these coding LLMs for your organization.