This article talks about the optimization process of gradle build on the CI (continuous integration) pipelines, for a server-side program written in Java/Kotlin + Spring Boot.
Gradle build takes too long on CI agents
Since the CI agents that execute the pipeline scripts are shared by several teams and the technology stacks used by each service are different, the CI agents are expected to be able to build different kinds of programs. For example, some projects require Java while others require NodeJS. Even if several projects all require Java, they might require different versions of Java on CI agents. In order to avoid repeated installation of different versions of Java on each CI agent and to ensure that the Java version used in the CI tests is consistent with the one in the deployed environment, we use the docker container as the runtime for gradle build, so that as long as the CI agent has docker, it can run different builds on the exact version.
The simplest and the most straightforward command is as follows:
docker run -v $(pwd):/app -w /app eclipse-temurin:11-jre ./gradlew clean build
If you use this command to run every pipeline, you will find that it is excruciatingly slow, because an empty Spring Boot repo can take five minutes to finish the build step. After adding a debugging parameter -i to the above gradle command, the logs show that a lot of time is spent on downloading dependencies. And because gradle build is run in the new container every time, the dependencies downloaded by the previous build cannot be reused by the latter one.
1. Caching the Downloaded Dependencies
In order to reuse the downloaded dependencies, there are two main things that need to be done.
1.1 Reusing the Build Caches Locally
We considered alternatives such as docker build cache and gradle dependency cache. Compared to docker build cache, which will be invalidated on any changes to the file content, gradle dependency cache appears to be smarter. Gradle has its logic to decide whether the dependencies can be reused. With gradle dependency cache, adding a new dependency in build.gradle file will trigger gradle downloading only the new dependency, while with docker build cache, because it’s checking the file change and invalidating the whole layer, it will cause a download of all the dependencies.
With gradle caches on CI agents, all we need to do is to mount the gradle cache to the build container.
docker run -v $(pwd):/app -v "$HOME/.gradle:/root/.gradle" -w /app eclipse-temurin:11-jre ./gradlew clean build
With this solution, our gradle build time came to one minute seven seconds. We can reuse the downloaded caches as much as possible, which will be much more efficient than leveraging docker build cache. In this way, if we reformat some of our business code, all the caches are fully reused, which results in a super fast build. Even when we added a new dependency, the new build only took one minute fifty-one seconds in total, which was only slightly slower than the fully cached version. Compared to invalidating all the build layers, this is far more efficient. As a result, leveraging gradle caches became our final solution.
However, those benefits required a lot of conditions. The main one is that the CI agents are required to have recently executed the same pipeline. If the repo is not in a very active state, we can rarely take advantage of the improvement, because the CI agents are usually cleaned up or replaced periodically. Without cache, the pipeline will still need five minutes.
1.2 Sharing the Caches across the CI Agents
Sharing caches or files across the CI agents is a common requirement for CI tools. A lot of pipeline services can pass artifacts across different agents through their native features. Besides, manually uploading to and downloading from a shared place, such as a static server or AWS S3 bucket, is also a feasible solution. To do that, defining a dedicated hook script around the build step, or wrapping it within a plugin are all achievable ways. As long as these costs are lower than downloading the fresh dependencies, they are good solutions.
Adding the average time of fifteen seconds for uploading and downloading the caches, we got a total time of one minute twenty-two seconds, which is still fast enough. Up to now, we can benefit from the caches for almost every build, even if the CI agent is a totally new one.
In the next part, we’ll explore other potential solutions.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.