Federated learning is a machine learning technique that allows you to train AI models without data having to be stored and processed centrally. With it, a model can be trained collaboratively in lots of different locations (ie. on lots of different devices).
This offers greater privacy protection — it means data can be trained without leaving someone’s phone or laptop.
A machine learning technique where a model can be trained without being moved to a centralized location.
It allows you to train machine learning models in a way that respects privacy and data protection laws.
Federated learning is a balancing act between accuracy and privacy. It can also have a significant overhead in terms of bandwidth and energy.
Federated learning is used in products from big tech firms, like Google messages and speech recognition, but it is also being explored in sectors where there is a need to share sensitive data.
What is federated learning?
Federated learning is a machine learning technique designed to improve privacy when training an algorithm. It does this through decentralization — rather than the training being done in a single centralized location (like a single cloud location or an on-premises server), datasets are contained locally.
Through local, decentralized training, a global model can be created without data ever having to be transferred to a centralized place.
What’s in it for you?
Because federated learning helps protect privacy, it can help organizations leverage data and AI in new ways without compromising the trust of users. It allows data to be controlled by those who actually own it. This reduces the need for organizations to push the envelope when it comes to user consent around data use.
Not only does it open up new opportunities with AI it can also drive adoption in domains and industries where governance and regulation around data is particularly tight.
What are the trade-offs of federated learning?
Despite the advantages of federated learning, there are still a few things that are important to consider:
There's always a balancing act between accuracy and privacy. Because the data being trained is limited, the results of the training may be less accurate than they would be otherwise.
Bandwidth and battery usage. Because federated learning involves communication between devices ‘at the edge’ and a centralized server to exchange updates to a model, this can be intensive in terms of both bandwidth and energy consumption.
Different kinds of data. Federated learning assumes devices have similar data distributions — what’s termed ‘independent and identically distributed’ (IID). If data on different devices varies significantly (non-IID), it can hinder the training process.
Finally, federated learning systems can be more complex to design and manage compared to traditional centralized learning approaches.
How is federated learning being used?
Federated learning is being used in a number of different areas. First, it’s particularly useful in the world of mobile app development. Google uses the technique in its Messages application, for example, as it makes it possible to provide users with greater personalization (things like autocomplete and smart reply) without compromising their privacy.
It’s also incredibly useful in fields such as healthcare or other public sector services where data can be immensely valuable but where organizations are dealing with sensitive personal information. At Thoughtworks we’ve used federated learning to help us develop anonymesh, a method which enables a combination of decentralization and more efficient data sharing. Specifically, we’ve helped a UK local authority share critical information while also protecting highly sensitive data.