Watch Part 1 and Part 2
About the event:
Today’s organizations are faced with increasing pressure to become more “data-driven” or “AI-driven.” However, incorporating data science and data engineering approaches into the software development process presents a myriad of challenges. Thoughtworks’ industry leading Continuous Delivery (CD) principles and practices can be applied to machine learning to solve these issues, uniting data scientists, developers, data engineers, and business stakeholders.
In this two-part series, you’ll learn from Thoughtworks’ seasoned data experts on how to maintain productivity, collaborate effectively, and continuously and seamlessly deliver value in practice. These interactive sessions are geared toward tech and business practitioners.
Part One
What is CD4ML?
Attempts to get machine learning applications into production often fail because proof-of-concepts are not conducive to the delivery of real production applications at scale.
Machine Learning is usually taught from tutorials using small, clean datasets put into data-frames and orchestrated with Jupyter notebooks; all done in one, in-memory, local environment. While this is a fine format in theory, real industrial situations involve multiple environments and data sets from databases or other data stores rather than file-based input. They interact with live production systems and must be coordinated with software delivery teams and product owners. They must be production quality, with good design, well-tested, and maintainable. Data scientists are left to choose between the environment that they are used to and one that is suitable for delivery to production; leading to an awkward migration from one to the other.
Part Two:
CD4ML In Practice
We apply the CD4ML approach learned in session one to a hands-on, practical demonstration environment on your own laptop. We will demonstrate and guide participants through CI/CD (Continuous Integration/Continuous Delivery) practices for machine learning and a new pattern of working that avoids most of the pitfalls of typical proof-of-concept approaches.
We’ll use an open-source environment with common ML tools. Participants will learn how to utilize new patterns of repeatable continuous model development to collaborate effectively and deliver value continuously and seamlessly in industrial data science projects using CI/CD practices.
*All sessions will be held in English, closed captioning will be provided in recordings after each session
Speakers
Global Head of Artificial Intelligence
Christoph Windheuser is the Global Head of Artificial Intelligence at Thoughtworks Inc. Before joining Thoughtworks, he gained more than 20 years of experience in the industry in several positions at SAP and Capgemini. Prior to that, he completed his Ph.D. in Neural Networks with a focus on Speech Recognition at the University of Bonn, Germany, Carnegie Mellon University in Pittsburgh, USA, Waseda University in Tokyo, Japan, and France Telekom (E.N.S.T.) in Paris, France.
Lead Data Scientist
As Lead Data Scientist, David creates statistical models and predictive algorithms at the core of the innovative data science applications for Thoughtworks clients. He has a PhD in physics and over 20 years of experience working with data. Prior to his career in consulting, he conducted cosmological research at academic institutions, NASA, and government laboratories developing data processing pipelines, statistical algorithms and optimization strategies for space missions and astronomical experiments.
Lead Data Engineer
Eric is an experienced Data Engineer focused on big data architecture, design, solution integration, and implementation with cloud and data-intensive computing technologies. He enjoys researching and testing new cutting-edge technologies and bringing his experiences to teams to enhance data-driven solutions.