Getting Smart: Applying Continuous Delivery to Data Science to Drive Car Sales
Published: March 01, 2017
Pricing second-hand cars is a complex procedure: there are many factors that affect a vehicle’s worth and customers’ tastes change quickly.
AutoScout24, the largest online car marketplace Europe-wide, wanted to get ahead of the field by developing an accurate price evaluation tool that updated continuously. Many companies use this type of predictive analytics capabilities internally, but shy away from using them for customer-facing services, because of the complexity.
Working together, AutoScout24 and Thoughtworks were able to develop a constantly updated price evaluation tool that delivers superb performance and scalability, using a Continuous Delivery approach to predictive analytics. We implemented automated verification using live test data sets in a continuous delivery pipeline to enable us to release model improvements with confidence at any given time.
It had previously developed price evaluation tool, which based price recommendations on current active listings. This pricing engine used a machine learning approach that often draws a linear relationship between the vehicle price and certain factors—such as the vehicle age or the mileage.
But this tool couldn’t take into account real world prices. AutoScout24 needed a pricing tool that could make accurate price predictions based on constantly changing information.
But now, with the price evaluation tool we needed a prediction model that would be continuously integrated into live operations. This posed a significant challenges for our data science team: we needed to ensure that the system could handle performance requirements, without needing manual performance optimization or sacrificing prediction accuracy.
To achieve that result, we realized there was an opportunity to take a Continuous Delivery approach to predictive analytics. Typically, concepts such as Continuous Delivery, Test-Driven Development and Consumer-Driven Contracts are increasingly common in software engineering. They’re almost unheard of in data science practice.
The first stage of this transition was to divide the monolith into verticals that are managed by autonomous development team, each covering specific functions. These verticals are so-called self-contained systems.
It was decided to implement the price evaluation tool as a single microservice, operated on Amazon Web Services (AWS), using the Play Framework to deliver the greatest flexibility.
[German homepage of the new price evaluation tool]
To improve price evaluation performance, AutoScout24’s data science team evaluated various machine learning approaches. Given the challenges they faced, they decided that a Random Forest approach would work best.
Random Forest is a supervised machine learning approach based on decision trees, which effectively counteracts the overfitting of other decision tree-based approaches. One big benefit is that it minimizes the chances that the prediction model only produces good results with input very similar to the learning data sets.
[An exemplary price evaluation decision tree]
We used the statistics programming language “R”, which has random forest libraries available, to develop an initial model training script. This script processes and cleans the raw vehicle data from recent years and then generates a price prediction model.
That meant the price model whose training we specified in R needed to be able to automatically transfer to the production system, without changing its behavior.
One obvious solution was to provide the price model directly in an R runtime via an appropriate service, which is then accessed from a Play Framework front-end application via a REST API. Unfortunately, the open source version of the R runtime is not capable of multi-threading which makes it virtually impossible to scale to multiple parallel user requests.
We therefore decided to use H2O, an open source Java-based predictive analytics engine that can be easily integrated with the Apache projects Hadoop and Spark. It also connects to other programming languages popular in the big data field, such as Python and R.
As H2O provides its own implementation of Random Forest, it was a straightforward task to train a random forest-based price prediction model using H2O. That prediction model can then be executed in a cluster using the H2O engine and can be accessed via an API.
H2O also offers the option of exporting a fully trained prediction model completely to Java source code. In the case of our random forest price models, however, the compiled JAR files are very large, with several gigabytes per country.
This happens because the decision trees combine the model's logic and data. The accuracy and size of the model are linked, because both are largely determined by the configured maximum height of the decision trees.
Furthermore, we could fully utilize the already available scaling mechanisms of both the JVM (thread pools, concurrency) and of AWS. The Play Framework builds heavily on Java's non-blocking I/O support. With Elastic Load Balancers (ELBs) and Autoscaling Groups (ASGs), AWS provides the option of automatically and load-sensitively lifting new EC2 machines using the same web application and distributing the load over these machines.
[How a price prediction model is trained, exported, and deployed together with the web application]
This enabled us to deliver price predictions in a matter of a few milliseconds because the Java code generated from the prediction model consists almost exclusively of very large “if-else” statements. As a result, no objects have to be created during the calculation and the heap space usage remains consistently low.
On the other hand, loading the unusually high number of large classes requires lots of memory. But once these are fully loaded, the memory utilization virtually never changes during operation. That further reduces maintenance and monitoring complexity and simplifies the handling of load changes.
Our approach enabled us to quickly release an initial simple price evaluation product for a launch country and for one user segment.
To extend that, we separated the service that implements the web front-end from the prediction model service. The latter was deployed as an independent web service with a REST interface.
The main reason for this service split was the different iteration speeds of the web interface and the prediction model: while the latter was only updated occasionally, improvements to the web interface were rolled out several times a day. Combining the web interface and price model service resulted in unnecessarily long deployment times.
The separation then also allowed us to use one price-model service per country and user segment, so we could partition the prediction model over several machines. And because AWS autoscaling simultaneously replicates the prediction models, we could dispense with a cluster database system — even though the combined size of all the price models together exceeded the storage capacity of a single machine.
Together with our data science team, we therefore developed both test suites for the model generation script implemented in R and Consumer-Driven Contracts (CDCs), which automatically ensure the price model behavior expected by the price evaluation service before every deployment.
We also introduced extensive end-to-end tests, which ensure that the web application provides the same prices as the originally generated price model. To do this, the price model generated using H2O is initially queried with a large number of test price evaluations. The input data for these evaluations is taken from a test data set that was not used for the price model training.
The results of these test price evaluations serve two purposes: firstly, the model quality can be determined by comparing the actual prices with those in the test data set. Secondly, the results can be compared with those obtained by accessing the price model which was converted to Java bytecode. Altogether, the actions described so far allow us to release model improvements directly to productive operations in a fully automated way, allowing our users to immediately benefit from these improvements.
[Test and validation steps during model deployment]
We have since used Continuous Delivery principles in other data science products. In some of those products, the prediction model rapidly grows stale—for instance, when current user behavior data is used for model training. Here, applying Continuous Delivery to data science has delivered even better results.
AutoScout24, the largest online car marketplace Europe-wide, wanted to get ahead of the field by developing an accurate price evaluation tool that updated continuously. Many companies use this type of predictive analytics capabilities internally, but shy away from using them for customer-facing services, because of the complexity.
Working together, AutoScout24 and Thoughtworks were able to develop a constantly updated price evaluation tool that delivers superb performance and scalability, using a Continuous Delivery approach to predictive analytics. We implemented automated verification using live test data sets in a continuous delivery pipeline to enable us to release model improvements with confidence at any given time.
Keeping price evaluations up to speed
AutoScout24 has listings for more than 2.4 million vehicles across Europe. That’s given it a huge trove of current and historical data. But how could it use that to help sellers determine a fair price and buyers to make good decisions?It had previously developed price evaluation tool, which based price recommendations on current active listings. This pricing engine used a machine learning approach that often draws a linear relationship between the vehicle price and certain factors—such as the vehicle age or the mileage.
But this tool couldn’t take into account real world prices. AutoScout24 needed a pricing tool that could make accurate price predictions based on constantly changing information.
Data science and continuous delivery
Before the price evaluation program, AutoScout24 had used predictive analytics primarily for internal decision making and looking to answer questions based on historical data.But now, with the price evaluation tool we needed a prediction model that would be continuously integrated into live operations. This posed a significant challenges for our data science team: we needed to ensure that the system could handle performance requirements, without needing manual performance optimization or sacrificing prediction accuracy.
To achieve that result, we realized there was an opportunity to take a Continuous Delivery approach to predictive analytics. Typically, concepts such as Continuous Delivery, Test-Driven Development and Consumer-Driven Contracts are increasingly common in software engineering. They’re almost unheard of in data science practice.
Accelerating delivery through service teams
We had the opportunity to try something new because AutoScout24 had begun a large-scale migration of its technical infrastructure from its previous self-hosted, .NET-based monolithic system to a cloud-hosted, JVM-based microservices architecture. The aim of this tech stack migration was to enable innovations to be released more quickly.The first stage of this transition was to divide the monolith into verticals that are managed by autonomous development team, each covering specific functions. These verticals are so-called self-contained systems.
It was decided to implement the price evaluation tool as a single microservice, operated on Amazon Web Services (AWS), using the Play Framework to deliver the greatest flexibility.
[German homepage of the new price evaluation tool]
Using Random Forest to make better predictions
To improve price evaluation performance, AutoScout24’s data science team evaluated various machine learning approaches. Given the challenges they faced, they decided that a Random Forest approach would work best.Random Forest is a supervised machine learning approach based on decision trees, which effectively counteracts the overfitting of other decision tree-based approaches. One big benefit is that it minimizes the chances that the prediction model only produces good results with input very similar to the learning data sets.
[An exemplary price evaluation decision tree]
We used the statistics programming language “R”, which has random forest libraries available, to develop an initial model training script. This script processes and cleans the raw vehicle data from recent years and then generates a price prediction model.
From a price prediction model to a car evaluation product
This initial price prediction model only needed to be able to provide an accurate prediction. The final product would also need to deliver exceptional performance: it needed to be responsive, highly available and able to support a high volume of users. What’s more, the predictions had to reflect the current market situation — so we needed to be able to rapidly integrate model improvements and new training data.That meant the price model whose training we specified in R needed to be able to automatically transfer to the production system, without changing its behavior.
One obvious solution was to provide the price model directly in an R runtime via an appropriate service, which is then accessed from a Play Framework front-end application via a REST API. Unfortunately, the open source version of the R runtime is not capable of multi-threading which makes it virtually impossible to scale to multiple parallel user requests.
We therefore decided to use H2O, an open source Java-based predictive analytics engine that can be easily integrated with the Apache projects Hadoop and Spark. It also connects to other programming languages popular in the big data field, such as Python and R.
As H2O provides its own implementation of Random Forest, it was a straightforward task to train a random forest-based price prediction model using H2O. That prediction model can then be executed in a cluster using the H2O engine and can be accessed via an API.
H2O also offers the option of exporting a fully trained prediction model completely to Java source code. In the case of our random forest price models, however, the compiled JAR files are very large, with several gigabytes per country.
This happens because the decision trees combine the model's logic and data. The accuracy and size of the model are linked, because both are largely determined by the configured maximum height of the decision trees.
Millisecond response times need the right approach
Overall our approach of exporting the trained prediction models to Java source code offered the key advantage that the compiled price model JAR file can be executed together with a Play web application (also deployable as a JAR) on one and the same Amazon EC2 machine and in a single JVM. This significantly reduces maintenance complexity because only the memory utilization of this JVM needed to be configured and monitored.Furthermore, we could fully utilize the already available scaling mechanisms of both the JVM (thread pools, concurrency) and of AWS. The Play Framework builds heavily on Java's non-blocking I/O support. With Elastic Load Balancers (ELBs) and Autoscaling Groups (ASGs), AWS provides the option of automatically and load-sensitively lifting new EC2 machines using the same web application and distributing the load over these machines.
[How a price prediction model is trained, exported, and deployed together with the web application]
This enabled us to deliver price predictions in a matter of a few milliseconds because the Java code generated from the prediction model consists almost exclusively of very large “if-else” statements. As a result, no objects have to be created during the calculation and the heap space usage remains consistently low.
On the other hand, loading the unusually high number of large classes requires lots of memory. But once these are fully loaded, the memory utilization virtually never changes during operation. That further reduces maintenance and monitoring complexity and simplifies the handling of load changes.
Our approach enabled us to quickly release an initial simple price evaluation product for a launch country and for one user segment.
To extend that, we separated the service that implements the web front-end from the prediction model service. The latter was deployed as an independent web service with a REST interface.
The main reason for this service split was the different iteration speeds of the web interface and the prediction model: while the latter was only updated occasionally, improvements to the web interface were rolled out several times a day. Combining the web interface and price model service resulted in unnecessarily long deployment times.
The separation then also allowed us to use one price-model service per country and user segment, so we could partition the prediction model over several machines. And because AWS autoscaling simultaneously replicates the prediction models, we could dispense with a cluster database system — even though the combined size of all the price models together exceeded the storage capacity of a single machine.
The price for Continuous Delivery: extensive end-to-end testing
Initially, our prediction model interface would frequently change, without warning—for instance, when model parameters were renamed. That meant the model couldn’t be automatically transferred to production environments.Together with our data science team, we therefore developed both test suites for the model generation script implemented in R and Consumer-Driven Contracts (CDCs), which automatically ensure the price model behavior expected by the price evaluation service before every deployment.
We also introduced extensive end-to-end tests, which ensure that the web application provides the same prices as the originally generated price model. To do this, the price model generated using H2O is initially queried with a large number of test price evaluations. The input data for these evaluations is taken from a test data set that was not used for the price model training.
The results of these test price evaluations serve two purposes: firstly, the model quality can be determined by comparing the actual prices with those in the test data set. Secondly, the results can be compared with those obtained by accessing the price model which was converted to Java bytecode. Altogether, the actions described so far allow us to release model improvements directly to productive operations in a fully automated way, allowing our users to immediately benefit from these improvements.
[Test and validation steps during model deployment]
Conclusions
Using practices from Continuous Delivery such as automated end-to-end testing during deployment allowed us to automate the release of model improvements directly to productive operations. As a result, our data scientists don't need to wait for their improvements to be integrated into live operations and users benefit from improvements instantly.Applying Continuous Delivery to data science accelerates its impact to your business.
In addition to ongoing improvements, the prediction model needs to be retrained every month at minimum, to accurately reflect the market. To encourage experimentation, and to improve the model further, we found that it helped to automatically validate prediction accuracy prior to deployment.We have since used Continuous Delivery principles in other data science products. In some of those products, the prediction model rapidly grows stale—for instance, when current user behavior data is used for model training. Here, applying Continuous Delivery to data science has delivered even better results.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.