Kubeflow is a Kubernetes-native machine learning (ML) platform that simplifies build, train and deploy lifecycles of models to diverse infrastructure. We've extensively used Pipelines to encode ML workflows for several models across experimentation, training and serving use cases. Besides Pipelines, Kubeflow ships with multiple components, among which we find hyperparameter tuning with Katib and multi-tenancy to be quite useful.
Kubeflow is interesting for two reasons. First, it is an innovative use of Kubernetes Operators which we've spotlighted in our April 2019 edition of the Radar. Second, it provides a way to encode and version machine-learning workflows so that they can be more easily ported from one execution environment to another. Kubeflow consists of several components, including Jupyter notebooks, data pipelines, and control tools. Several of these components are packaged as Kubernetes operators to draw on Kubernetes's ability to react to events generated by pods implementing various stages of the workflow. By packaging the individual programs and data as containers, entire workflows can be ported from one environment to another. This can be useful when moving a useful but computationally challenging workflow developed in the cloud to a custom supercomputer or tensor processing unit cluster.