Federated Learning: Columbia Workshop Showcases the Future of AI Collaboration in Healthcare

Leaders explore how federated learning enables secure, collaborative AI in healthcare.

people in a workshop

Today’s most promising medical research, especially in AI and data-driven discovery, requires access to large, diverse datasets. However, much of this data is locked away in isolated systems, protected by privacy laws and institutional boundaries. This is where federated learning shows the promise to change the game. It enables researchers to learn from data across many institutions without ever transferring sensitive patient information.

On May 16, 2025, researchers, clinicians, and industry partners convened at Columbia University’s Roy and Diana Vagelos Education Center for the “Federated Learning for Health: Why, What, How?” workshop. Organized by the AI at VP&S Initiative, the event presented the challenges and successes of applying federated learning to real-world healthcare.

“Large AI models are data-hungry, but in healthcare, data lives in silos,” said Dr. Gamze Gürsoy, Herbert Irving Assistant Professor of Biomedical Informatics. “Federated learning offers a way to break those silos without compromising privacy.”

The workshop showcased detailed use cases across specialties, from radiation oncology to neurocritical care, and featured a panel of CUIMC expert stakeholders and industry leaders from NVIDIA, Johnson & Johnson, and Rhino Federated Computing.

Gürsoy opened the event with a primer on how federated learning works. Traditional AI systems require pooling large volumes of data in a single server to train models. But in healthcare, this is often impossible due to institutional barriers and patient privacy concerns. Federated learning flips the model: each institution trains the AI locally on its own data, and only the model parameters—not the data itself—are sent to a central server for aggregation. It’s a simple idea. But in healthcare, the simplicity unravels quickly.

Medical data is notoriously non-identically distributed (non-IID). Two hospitals may serve different demographics, follow different diagnostic criteria, or use entirely different equipment. In federated learning, this statistical mismatch can derail performance. Despite these and other challenges (e.g., communication overhead), Gürsoy emphasized that federated learning still holds tremendous promise. As a framework that respects patient privacy while unlocking the power of collective data, it may ultimately enable more equitable, robust, and generalizable AI in healthcare.

From Concept to Practice: Real-World Use Cases

Dr. Kaveri Thakoor, Assistant Professor in the Department of Ophthalmology, presented three case studies that underscored both the promise and complexity of federated learning in ophthalmic imaging. While performance was strong when data were consistent across sites, differences in imaging methods introduced significant challenges. In one case, such differences were reduced using domain adaptation, which works by aligning the data distributions from different device manufacturers more closely together, enabling more effective federated learning across sites. Another case highlighted how encrypted model sharing enabled privacy-preserving collaboration across institutions and enhanced model performance when more sites were added to the collaborative federated learning network. Across all examples, federated learning proved effective, but only with careful attention to data heterogeneity, alignment, and institutional trust.

Dr. Yading Yuan, Herbert and Florence Associate Professor of Radiation Oncology (in the Data Science Institute), took federated learning a step further into decentralization. In his work with radiation therapy planning, a centralized server was removed entirely. Instead, hospitals shared updates peer-to-peer using a protocol called gossip learning. His team developed a framework for a variant of federated learning called “personalized federated learning,” where each institution trains a model customized to its own data while still benefiting from insights learned across all participating sites. The system proved not only more accurate than centralized federated learning in some tasks, but it was also more robust when sites dropped out of the network — a realistic challenge in distributed health systems.

Despina Kontos talks at workshop

Dr. Despina Kontos, Director of Columbia University's newly founded Center for Innovation in Imaging Biomarkers and Integrated Diagnostics (CIMBID), offered a stark, firsthand account of the challenges of traditional AI in medicine. After spending six years coordinating a centralized study across four hospitals to build a breast cancer risk model, she found the process so laborious, due to data transfers, formatting, and harmonization, that much of the data was obsolete by the time the analysis was done.

“The modeling took months. The data transfer and cleaning took years,” she said. “By the time we finished, 3D tomosynthesis had replaced 2D scans. And I swore I would never do that again.”

Now, her team is part of a new multi-institutional collaboration using federated learning to build a next-generation breast cancer risk assessment tool based on 3D data. Given that the new data modality is even more complex and voluminous than before, federated learning becomes not just a convenience but a necessity.

Dr. Soojin Park, Director of the Program for Hospital and Intensive Care Informatics for Columbia Neurology, and Medical Director of Critical Care Data Science and Artificial Intelligence at NewYork-Presbyterian Hospital, brought the conversation into the intensive care unit, where physiological monitoring data is abundant but largely untapped. Her team is using federated learning to predict deterioration in patients with brain injuries across 4 hospitals. The models are trained locally on each hospital’s data, helping identify patterns that precede life-threatening events.

Building on Web-Scale Privacy for Mobile Health

Dr. Roxana Geambasu, Associate Professor of Computer Science, closed the research talks by connecting federated learning in healthcare to her work on web-scale privacy infrastructure. Her team’s system, Cookie Monster, helps power new browser standards for private ad tracking by running code locally and never transmitting raw data. She proposed adapting these architectures to mobile health apps and wearables, opening up possibilities for consumer-facing federated learning systems that are secure, transparent, and clinically useful.

Institutional Realities and Industry Perspectives on Deploying Federated AI

Panel at workshop

The panel explored what it will take to make federated learning viable at scale. Industry experts from NVIDIA (Dr. Holger Roth), Johnson & Johnson (Asha Mahesh), and Rhino Federated Computing (Dr. Ittai Dayan) joined Columbia experts, including Dr. Muredach Reilly, Associate Dean for Clinical and Translational, IT expert David Wentsler, and Brenda Ruotolo, AVP for Human Research Protection, HRPO/IRBs, to tackle implementation hurdles, from privacy safeguards to institutional coordination.

They emphasized the opportunity for academia-industry partnerships to advance research in sensitive areas like rare diseases and precision medicine, highlighting the value of federated learning in optimizing clinical trial site selection and validating AI tools using decentralized real-world data.

The panelists agreed that the future of federated learning will depend as much on building social and institutional trust as on advancing technical infrastructure. Standardized practices, better communication within and across institutions, and sustained industry-academic partnerships will be key to unlocking the full potential of federated AI in healthcare.

The workshop concluded with live demonstrations from Johnson & Johnson, Rhino Federated Computing, and NVIDIA, showcasing real-world federated learning applications and tools in action.