Beyond AI chatbots: Why CAIA is leaning into federated learning to accelerate cancer discovery
Have you asked a chatbot a health question lately?
You are not alone. AI tools are now allowing consumers to upload their own medical data to help them better understand their health and advocate for themselves. We are in what NVIDIA’s Kimberly Powell calls a "transformer moment in biology," where the convergence of AI and healthcare is creating new possibilities for patients.
For common health conditions that have a well-understood and accessible evidence base, these AI tools can often offer helpful responses. On the healthcare provider side, integrating AI into workflows offers clear benefits by lessening administrative and clinical burdens. But as Tom Lynch, President and Director of Fred Hutch Cancer Center, noted, the third and perhaps most critical promise of AI is its ability to operate at scale—a capability that is currently missing from many popular tools, especially for rare health conditions.
Currently, if you ask a chatbot about a rare side effect from cancer treatments, you are unlikely to get a satisfactory response. Without access to data from hundreds or thousands of patients with this rare condition, current AI models will not have seen the patterns that lead to or arise from this side effect. The answers to such complex queries likely reside inside the secure firewalls of the world’s leading cancer centers, hidden in the vast troves of their patients' data.
This is where the Cancer AI Alliance (CAIA) steps in. By bringing together four leading cancer centers, CAIA is building an entirely new infrastructure for cancer discovery and research.
As part of a panel at the recent J.P. Morgan Healthcare conference, CAIA’s scientific director, Jeff Leek, chatted with Alexis Battle (JHU), Tom Lynch (Fred Hutch), Kimberly Powell (NVIDIA), and Matt Mcllwain (Madrona Ventures) about how CAIA is accelerating the future of cancer care by federating data across cancer centers, helping researchers solve problems together instead of in isolation.
Solving the "blind spot" problem
Current AI models have blind spots. As Fred Hutch’s Tom Lynch noted, if a severe side effect occurs in only 2% of patients at a single hospital, that data is too sparse to train an AI model. For instance, consider this example: how many breast cancer patients will develop a rare, debilitating side effect called osteonecrosis of the jaw? Some of the most popular AI models fail to predict these kinds of conditions because they lack access to evidence of clinical patterns across large numbers of patients.
To see the full picture, you need scale. By connecting data across four major institutions, CAIA turns rare anomalies into clear patterns. As Alexis Battle, the Director of the Malone Center for Engineering in Healthcare at Johns Hopkins, explains, "Bringing together diverse data can help clarify these weak signals, a new variant you've never seen or a slight aberration on an imaging study that is unlike things that you've observed before.” Via federated learning, CAIA aims to harness the multimodal data — genomics, imaging, and patient history — that standard models simply cannot see.
But it isn't just about finding rare cases; it's about predicting the future. Alexis highlighted that CAIA’s models are trained on full 'patient trajectories'—tracking the journey from pre-diagnosis through years of treatment. By analyzing these longitudinal patterns, the AI can help clinicians predict what might happen next, flagging potential missed lab tests or forecasting outcomes in a way that static data cannot.
2. The rise of the "dry lab"
For decades, cancer breakthroughs happened almost exclusively in the "wet lab"—at the bench, with pipettes and petri dishes. But as NVIDIA’s VP of Healthcare, Kimberly Powell, observed, we are now entering the era of the "dry lab.”
This isn't about replacing biologists; it's about creating a digital environment where we can test hypotheses at lightning speed. An "AI infrastructure lab" such as CAIA serves as a faster, closed loop of science, allowing researchers to simulate and validate discoveries digitally before moving them to clinical trials.
3. World-leading cancer centers collaborate, with patients and privacy top of mind
Historically, academic medical centers have been cautious about sharing their data or compromising patient privacy. Negotiating a data-sharing agreement between just two centers can take years, but CAIA has overcome this barrier by taking a different approach: developing federated access models, governance structures, and streamlined regulatory pathways that accelerate multi-institutional AI research.
CAIA’s federated learning approach allowed us to launch a secure, multi-cloud platform representing one million patients in less than a year—enabling not just one, but several scientific projects in parallel in just a couple of months. Instead of moving patient data, only the mathematical learnings are shared. Patients’ private data never leaves the hospitals’ secure firewalls.
4. Are reasoning machines the future?
While a single AI model may not be the solution, Matt Mcilwain of Madrona Ventures suggests a different path: reasoning machines, where the answer isn't just one algorithm, but a complex system. With this approach, regular clinician engagement creates a system that learns from expert feedback, becoming smarter and more valuable in ways a generic chatbot never could.
Matt also emphasized that unlike general AI models, which can become commodities over time, domain-specific models like those built by CAIA are “appreciating assets” where human clinicians and AI collaborate to improve patient outcomes continuously.
If you’d like to learn more about CAIA, subscribe to our newsletter for updates and follow us on LinkedIn and X.