AI models generally learn best from vast and diverse datasets, but cross-institutional cancer data can't be easily brought together due to privacy and regulatory guidelines. How do we train AI models on the collective knowledge of the nation's top cancer centers without exposing sensitive patient information?

CAIA’s solution is an AI training approach called federated learning. Instead of bringing sensitive patient data to a centralized location, federated learning brings the AI model to the data. This ensures that sensitive data remains in its secure location (at the cancer centers) and is never shared centrally.

How federated learning works

The technical setup:

Each participating cancer center acts as an edge node, a device that is a secure gateway between the cancer center and the rest of the alliance. Patient data remains safely behind a firewall and never leaves the cancer center.
A central orchestration layer, which CAIA manages, acts like a conductor for an orchestra. It sends the AI model and instructions to each of the edge nodes (at the cancer centers). The orchestration layer doesn't see the patient data.

Diagram showing three cancer centers collaborating to train a global AI model. — An overview of the federated learning process

The training process:

For each project, a researcher provides the model code for evaluation by the federated learning system. Think of this as the sheet music that the conductor uses to guide the orchestra.
Each edge node trains the AI model using its own local, secure data.
Once the model is trained locally, the edge node sends a summary of its learnings (the updated model) back to the central orchestration layer. This summary contains no private patient information.
The orchestration layer then combines the learnings from all the cancer centers to create a more powerful model. This new, improved model is then sent back out to the edge nodes to repeat the process.

The final model can be thought of as the recording of the full performance of the orchestra.

The benefits of federated learning in cancer research

Patient privacy is protected: Patient data never leaves the security of its home institution and patient privacy remains uncompromised.
More powerful, equitable models: By learning from a diverse and representative sample of patients across the country, federated learning helps build more robust and equitable models that are not over-reliant on a specific demographic or patient type. This allows the model to pick up signals and make predictions for a broader group of people.
Accelerating research for rare cancers: For rare diseases, no single cancer center has enough patients to study them effectively. Federated learning allows researchers to combine insights from small patient populations across the country to uncover new patterns and potential therapies.

If you’d like to learn more about CAIA, subscribe to our newsletter for updates and follow us on LinkedIn and X.

More insights from CAIA