Skip to main navigation Skip to search Skip to main content

A comparative study of federated learning and synthetic data for privacy-aware machine learning

  • Akhtar Hussain*
  • , Atiquer Rahman Sarkar
  • , Eunjin Kim
  • , Muhammad Habib Ur Rehman
  • , Noman Mohammed
  • *Corresponding author for this work
  • University of North Dakota
  • University of Manitoba

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Healthcare institutions face a critical challenge in training and deploying machine learning applications due to data scarcity compounded by stringent privacy regulations. In this case study involving breast cancer identification, we evaluated four experimental scenarios under conditions of limited data availability and strict privacy requirements. Specifically, we compared: (i) federated learning with distributed real data, (ii) federated learning with synthetic data, (iii) centralized learning on aggregated synthetic datasets generated locally, and (iv) multi-step synthetic data generation. Our results indicate that when local datasets are too small to be useful independently, federated learning with real data achieves the highest performance, outperforming federated learning with synthetic data. In contrast, models developed on aggregated synthetic datasets or via centralized generation of synthetic data based on local synthetic samples yielded suboptimal results. Although federated learning with real data appeared to be the best-performing strategy, it still fell behind centralized learning with pooled real data. This result demonstrates that federated learning is preferable to synthetic data approaches in low dataset scenarios. Additionally, the modest performance gap compared to the centralized real-data benchmark underscores the importance of further research into improved federated methods.

Original languageEnglish
Title of host publication2025 IEEE 16th Annual Information Technology, Electronics and Mobile Communication Conference, IEMCON 2025
EditorsRajashree Paul
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages185-190
Number of pages6
ISBN (Electronic)9798331565053
DOIs
Publication statusPublished - 16 Feb 2026
Event16th Annual IEEE Information Technology, Electronics and Mobile Communication Conference, IEMCON 2025 - Berkeley, United States
Duration: 29 Oct 202531 Oct 2025

Publication series

Name2025 IEEE 16th Annual Information Technology, Electronics and Mobile Communication Conference, IEMCON 2025

Conference

Conference16th Annual IEEE Information Technology, Electronics and Mobile Communication Conference, IEMCON 2025
Country/TerritoryUnited States
CityBerkeley
Period29/10/2531/10/25

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Breast cancer research
  • Data Privacy
  • Federated learning
  • Distributed databases
  • Medical services
  • Benchmark testing
  • Breast cancer
  • Generators
  • Protection
  • Synthetic data

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Information Systems and Management
  • Health Informatics

Fingerprint

Dive into the research topics of 'A comparative study of federated learning and synthetic data for privacy-aware machine learning'. Together they form a unique fingerprint.

Cite this