One year in: Cancer AI Alliance unveils collaborative AI platform at IA40

    • Launch of CAIA’s federated learning platform: Marking its one-year anniversary, CAIA announced the successful launch of its collaborative AI platform using federated learning, which is designed to overcome institutional data silos and accelerate cancer discovery by enabling secure, multi-institutional collaboration.

    • Federated learning technology: The platform utilizes federated learning to train AI models on a Gen 1 dataset of over one million structured clinical records from the four founding cancer centers, ensuring that private patient data remains secure within its home institution.

    • Immediate impact and future docus: CAIA is currently running eight unique pilot projects: targeting complex areas like rare cancer analysis and predicting treatment response, and is now focused on expansion to incorporate multimodal data (genomic and pathology imaging) and inviting new alliance members across the nation.

On October 1st, 2025, the Cancer AI Alliance (CAIA) returned to Madrona’s IA summit to mark its one-year anniversary and announce the successful launch of its collaborative AI platform

Several CAIA team members participated in this event including Jeff Leek (Fred Hutch Cancer Center), Srinivasan Yegnasubramanian (Johns Hopkins Kimmel Cancer Center), Rafael Irizarry (Dana-Farber Cancer Institute), Alexis Battle (Johns Hopkins Whiting School of Engineering), Sohrab Shah (Memorial Sloan Kettering Cancer Center), and Brian Bot (Fred Hutch Cancer Center). Each of these individuals represents cancer centers that are foundational members of CAIA, and have joined together in a collaborative effort to solve one of the biggest challenges in cancer research: sharing critical insights without sharing private patient data


CAIA was created to overcome the challenge that keeps clinical insights within institutional silos. Rafael Irizarry, Chair of the Data Science Department at Dana-Farber, explained that with advances in AI methodology, researchers could “really increase the number of discoveries they make” while following regulations to keep clinical data private and secure. He continued that CAIA’s collaborative AI platform also allows for cross-cloud computing, which can “change the way we do research today and lead to a new era of discovery.”

A brief summary of CAIA’s first year

In the span of just one year, CAIA has achieved significant milestones, rapidly establishing the infrastructure for multi-institutional AI development. In addition to establishing its collaborative AI platform powered by federated learning, this progress also includes a new legal framework and successfully navigating regulatory pathways across all four participating centers. 

CAIA’s technical foundation now includes a Gen 1 dataset comprising structured clinical data from over one million patients, enabling eight pilot use cases to prove the platform's immediate impact

Overview of CAIA’s progress in one year

Overview of CAIA’s progress in one year

Showcasing the power of federated learning

CAIA’s mission to accelerate cancer discovery from years to months is powered by federated learning. This approach allows AI models to travel to data sources securely located at the four founding cancer centers without private patient data ever leaving its home institution. The development of CAIA’s federated learning platform was made possible due to generous technological and philanthropic support from several AI industry leaders

Jeff Leek, CAIA’s Scientific Director, highlighted the platform's immediate power by showcasing Asta DataVoyager, a tool CAIA prototyped in collaboration with Ai2 (Allen Institute for AI), that enables scientists and researchers to ask questions in plain language and get clearly cited, explainable answers across the federated platform. Ongoing work includes exploring the intersection of Asta DataVoyager and trained researchers to better understand how such a tool can be maximally leveraged to answer important scientific questions.

A view of Asta DataVoyager’s response to a clinical query in plain language

A view of Asta DataVoyager’s response to a clinical query in plain language 

This has the potential to unlock insights from more diverse AI models trained on each cancer center’s data. It also allows clinicians and researchers to pose complex questions and instantly receive analyzed results derived from the combined, federated data across participating cancer centers. Most importantly, the responses that researchers receive will not contain any sensitive clinical data due to CAIA’s federated learning approach. 

Proven impact: Eight pilot projects underway

Alexis Battle, Director of the Malone Center for Engineering in Healthcare at Johns Hopkins Whiting School of Engineering, confirmed that CAIA’s federated platform is supporting eight unique pilot projects. 

“With this platform,” she explained, “we sourced ideas, scientific questions… across all of these four [participating] institutions from scientists, oncologists, clinicians, and machine learning researchers. We came up with many [questions] that we're going to pursue, but we started with eight that we're piloting now.”

These projects target critical, complex areas that no single institution could tackle alone such as:

  • Predicting treatment response

  • Identifying novel biomarkers

  • Analyzing trends in rare cancers, providing the scale necessary to uncover new therapies

  • Fine-tuning large language models on cancer patient data to predict future diagnoses

As Alexis emphasized, these eight projects “range from asking specific clinical questions about rare events or rare diseases that we can only ask with data this large to training large language models. So we're really excited that we're actually running this now on our federated platform."

An overview of the eight research projects being piloted on CAIA’s platform

An overview of the eight research projects being piloted on CAIA’s platform

The road to precision cancer medicine

CAIA is now focused on its next phase of expansion and data integration.

Sohrab Shah, Chief of Computational Oncology at Memorial Sloan Kettering Cancer Center, outlined the plan to evolve the platform from using structured clinical records to incorporating multimodal data sets. “[Every] patient goes through a diagnostic odyssey that encompasses medical imaging, molecular tests that yield genomic data, pathology imaging data. We can harness that data by bringing it together to start building predictive models about what will happen to that patient in the future if they're treated with a particular therapy,” he said. 

Combining these diverse data types at scale is essential for building the next generation of highly accurate predictive models for precision cancer medicine.

What’s next for CAIA?

CAIA has proved that this level of collaboration among cancer centers is possible. The alliance is now extending an invitation to additional cancer centers and technology partners across the nation to join the effort.

Brian M. Bot, Director of AI and Data Science partnerships at Fred Hutch Cancer Center and Director of CAIA’s Strategic Coordinating Center, emphasized that by joining CAIA, cancer centers will not only help “expand the types of data that we're collecting, but also the breadth and diversity of data that is going to be important to build and validate these models over time."

With the federated platform in place, its insights and benefits can be scaled globally to advance cancer care for everyone. 

If you’d like to learn more about CAIA, subscribe to our newsletter for updates and follow us on LinkedIn and X.

Next
Next

How the orchestration layer securely enables AI insights in a federated learning framework