Artificial intelligence (AI) can bolster clinical trials by optimising patient recruitment, improving study retention, and developing digital biomarkers. But many large healthcare datasets that could train AI algorithms remain locked away in silos over patient privacy concerns. Federated learning, however, could unlock machine learning by training on these datasets locally while maintaining patient privacy.
With federated learning, local versions of an AI train on data from individual sites without the data ever leaving its local server, explains Sarthak Pati, a biomedical AI software developer at the University of Pennsylvania. The local AI versions then share what they learn from local datasets—and not the data itself—to form a collaborative, predictive model.
By avoiding the need to transfer data, experts say federated learning models could incorporate many more datasets. However, the technology is far from a silver bullet. Like any new AI technology, federated learning will require diligent implementation to avoid the bias and security pitfalls pervasive in many current AI models, explains Dr. Eric Perakslis, chief science and digital officer at Duke Clinical Research Institute.
“Most technologies are agnostic…they aren’t good or evil,” Perakslis says. “It comes down to how you use them.”
Decentralising healthcare AI
Federated learning could upend traditional machine learning through a decentralised approach that does not externally aggregate patient data, Pati explains. Under traditional machine learning, sites must transfer patient data to a centralised server for an AI to train, he adds. This opens up the door for third-party privacy violations, causing many sites and data sources to forgo sharing data altogether.
By decentralising machine learning, however, federated learning programs can train on a broader range of datasets, explains Dr. Ittai Dayan, CEO of Rhino Health, an AI healthcare company. Without the need to transfer data and content with the resulting privacy risks, many more datasets become available. Instead of crafting increasingly complex tools and contracts to ethically de-identify patient data, federated learning eliminates the need to share patient data in the first place.
In clinical trials, federated learning can greatly improve the scope of predictive models, Dayan says. For example, an AI could be tasked with predicting when patients are likely to progress to a second-line breast cancer therapy to optimise clinical trial recruitment. A centralised AI tackling this question would likely only use data from one or two hospitals given privacy concerns under Health Insurance Portability and Accountability Act (HIPAA) and General Data Protection Regulation (GDPR) laws, he notes. By contrast, a federated learning program could train on many more datasets—including ones previously unavailable to machine learning algorithms—since it does not require data transferring. Decentralised machine learning, Dayan explains, is a much more scalable technology.
Can federated learning eliminate bias?
Because federated learning programs can utilise more datasets, they tend to minimise the effect of a biased dataset, Dayan says. However, a poorly designed program could serve as an engine for biased data propagation, Perakslis notes.
In healthcare, biased and unconsented datasets are often the cheapest to access, and this is no different in federated learning, Perakslis says. For example, even broad swaths of biometric data collected on patients’ mobile phones—a key feature of many decentralised clinical trials—can carry significant biases. Apple devices have much stronger privacy protections than Android devices, which use an open-source operating system, he explains. But choosing to use only Apple devices creates a data bias problem, as only a specific portion of the population can afford these higher-priced products, he notes.
To minimise the risk of data bias, local copies of a federated learning program should share model weights among each other, Pati explains. This ensures each dataset’s influence on the overall program is proportional to the strength of that specific data, he notes.
Meanwhile, to minimise concerns of security breaches, federated learning programs should utilise edge computing, Dayan explains. Edge computing is a type of information technology where data is processed close to the “edge” of a network rather than an external cloud or datacenter.
“On its own, federated learning is just a computational method,” Dayan says. “Creating the supporting infrastructure that preserves data quality and data privacy is how you make it impactful.”
Where is federated learning in healthcare headed?
As public concerns over data privacy increase, data sharing regulations will likely tighten in parallel. The EU recently enacted GDPR, and the US Congress has proposed a new data privacy bill that would strengthen existing HIPAA regulations. Experts say these new laws could provide a runway for federated learning to take off in healthcare.
Because datasets never leave their source in federated learning, they are not subject to GDPR, HIPAA, or any laws in the works that focus on data-sharing, Pati explains. Earlier this year, Pati and his team used a federated learning program to define clinical outcomes in glioblastoma by defining volumetric measurements of tumours. The program trained on data from 71 sites across six continents, in large part because these sites never had to actually share data and deal with the resulting regulatory and legal hurdles, he explains.
Meanwhile, Dayan says there is growing demand from pharma companies to collaborate with each other on clinical trial design data. As real-world data becomes more pervasive, industry players realise they can share data insights in a mutually beneficial way, he notes. Federated learning will allow companies to share data insights that can improve selecting patients, predicting adverse events, and more—all while maintaining full ownership of their proprietary datasets, he explains.
Still, Perakslis urges caution in implementing federated learning programs, despite the many possibilities for improving clinical trial and healthcare outcomes. “It’s very easy to get excited about what technologies can do,” he says. “It can take much longer to figure out what harms they cause.”
- Federated learning allows AIs to train on datasets from multiple trial sites and institutions, without data ever leaving those sites.
- Since no patient data is transferred, this technology can incorporate a much larger amount of datasets.
- Federated learning programs should appropriately weight individual datasets and employ edge computing, reducing the risk of biased or insecure data.
- With data-sharing regulations tightening and pharma’s appetite for collaboration growing, federated learning will play an important role in the future of AI-driven drug development and healthcare.