How the Cancer AI Alliance operationalizes secure, multi-site research with federated learning

Federated Learning

Dec 5

- CAIA accelerates cancer research by using federated learning, ensuring sensitive patient data never leaves the participating cancer centers' firewalls; only model summaries and weights are shared.
- Cancer centers maintain complete autonomy over their data, actively defining security parameters, selecting specific datasets for projects, and mandating that only pre-approved code is executed on their servers.
- The platform leverages Rhino FCP (Rhino Federated Computing Platform) and NVIDIA FLARE to orchestrate the secure workflow, managing the exchange of code and model updates between the orchestration layer and edge nodes.
- By combining vast amounts of data from across the Alliance with rigorous data standardization, the platform enables the creation of robust, global AI models.

The Cancer AI Alliance (CAIA) is committed to accelerating cancer research using AI while upholding the highest standards of patient privacy and security. A core principle of our work is that sensitive patient data must remain behind participating cancer centers' firewalls.

To achieve this, CAIA’s participating cancer centers utilize federated learning — a machine learning method where AI models are trained on data across multiple cancer centers while sensitive clinical data remain safely behind institutional firewalls. In this model, only model summaries and weights are shared and updated across the federated learning network.

In this blog post, we explore the federated learning components that help CAIA securely operationalize this multi-site AI training approach.

Cancer centers retain control over their data

CAIA’s participating cancer centers always maintain complete control over their clinical data, actively defining the security and access parameters during the federated learning process. This includes:

Data Selection: Each cancer center specifies which data is made available for any given research project.

Tight Access Control: Cancer centers can mandate that only pre-approved code is run, ensuring that only specific, validated AI models can be trained on their data.
Local Execution: All model training is executed locally on the cancer centers’ servers, governed by their existing security protocols.
Security and Privacy: Additional technologies to ensure the privacy and security of sensitive patient data and trained models — including differential privacy, model encryption, and confidential computing — are available for specific projects requiring enhanced protection.

To enable the federated learning process, CAIA’s platform leverages the Rhino Federated Computing Platform (RhinoFCP) which facilitates scalable, privacy-preserving data collaboration and confidential computing. Rhino FCP uses NVIDIA FLARE (Federated Learning Application Runtime Environment), an open-source library that allows researchers to adapt existing AI training approaches to a federated framework:

Rhino provides the infrastructure for the orchestration layer that oversees the federated learning processes. This includes facilitating code object creation and execution (the flow of AI model training code between cancer centers’ edge nodes) and providing governance controls for secure data exploration and quality management. Rhino also serves as the user-facing portal that enables participating cancer centers to configure their projects, select training datasets and manage collaborators.
Approved researchers log in to the federated network via the RhinoFCP user interface and submit their AI model training code to launch the training process. The resulting models are only accessible to those approved individuals.
Leveraging NVIDIA FLARE, Rhino coordinates the secure exchange of model training code and weights between the orchestration layer and participating cancer centers’ edge nodes to aggregate them into a final global model.

Diagram of CAIA's federated learning workflow using Rhino SDK and NVIDIA FLARE to securely aggregate local cancer center models.

CAIA’s platform enables the development of robust models due to the sheer volume of training data available across the alliance. However, data quality is just as critical as volume. To that end, CAIA’s participating cancer centers employ a comprehensive approach to data standardization to build high-quality AI models.

If you’d like to learn more about CAIA, subscribe to our newsletter for updates and follow us on LinkedIn and X.

More insights from CAIA