Abstract
Healthcare institutions face a critical challenge in training and deploying machine learning applications due to data scarcity compounded by stringent privacy regulations. In this case study involving breast cancer identification, we evaluated four experimental scenarios under conditions of limited data availability and strict privacy requirements. Specifically, we compared: (i) federated learning with distributed real data, (ii) federated learning with synthetic data, (iii) centralized learning on aggregated synthetic datasets generated locally, and (iv) multi-step synthetic data generation. Our results indicate that when local datasets are too small to be useful independently, federated learning with real data achieves the highest performance, outperforming federated learning with synthetic data. In contrast, models developed on aggregated synthetic datasets or via centralized generation of synthetic data based on local synthetic samples yielded suboptimal results. Although federated learning with real data appeared to be the best-performing strategy, it still fell behind centralized learning with pooled real data. This result demonstrates that federated learning is preferable to synthetic data approaches in low dataset scenarios. Additionally, the modest performance gap compared to the centralized real-data benchmark underscores the importance of further research into improved federated methods.
| Original language | English |
|---|---|
| Title of host publication | 2025 IEEE 16th Annual Information Technology, Electronics and Mobile Communication Conference, IEMCON 2025 |
| Editors | Rajashree Paul |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 185-190 |
| Number of pages | 6 |
| ISBN (Electronic) | 9798331565053 |
| DOIs | |
| Publication status | Published - 16 Feb 2026 |
| Event | 16th Annual IEEE Information Technology, Electronics and Mobile Communication Conference, IEMCON 2025 - Berkeley, United States Duration: 29 Oct 2025 → 31 Oct 2025 |
Publication series
| Name | 2025 IEEE 16th Annual Information Technology, Electronics and Mobile Communication Conference, IEMCON 2025 |
|---|
Conference
| Conference | 16th Annual IEEE Information Technology, Electronics and Mobile Communication Conference, IEMCON 2025 |
|---|---|
| Country/Territory | United States |
| City | Berkeley |
| Period | 29/10/25 → 31/10/25 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Breast cancer research
- Data Privacy
- Federated learning
- Distributed databases
- Medical services
- Benchmark testing
- Breast cancer
- Generators
- Protection
- Synthetic data
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Computer Networks and Communications
- Computer Science Applications
- Hardware and Architecture
- Information Systems and Management
- Health Informatics
Fingerprint
Dive into the research topics of 'A comparative study of federated learning and synthetic data for privacy-aware machine learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver