Category Archives: News

Little is Enough: Boosting Privacy in Federated Learning with Hard Labels

Can we train high-quality models on distributed, privacy-sensitive data without compromising security? Federated learning aims to solve this problem by training models locally and aggregating updates, but there’s a catch—shared model parameters or probabilistic predictions (soft labels) still leak private information. Our new work, “Little is Enough: Boosting Privacy by Sharing Only Hard Labels in Federated Semi-Supervised Learning,” presents an alternative: federated co-training (FedCT), where clients share only hard labels on a public dataset.

This simple change significantly enhances privacy while maintaining competitive performance. Even better, it enables the use of interpretable models like decision trees, boosting the practicality of federated learning in domains such as healthcare.

This work was presented at AAAI 2025 and was done in collaboration with Amr Abourayya, Jens Kleesiek, Kanishka Rao, Erman Ayday, Bharat Rao, Geoffrey I. Webb, and Michael Kamp.


Federated Learning: A Privacy Illusion?

Federated learning (FL) is often praised for enabling collaborative model training without centralizing sensitive data. However, model parameters can still leak information through various attacks, such as membership inference and gradient inversion. Even differentially private FL, which adds noise to mitigate these risks, suffers from a trade-off—stronger privacy means weaker model quality.

Another alternative, distributed distillation, reduces communication costs by sharing soft labels on a public dataset. While this improves privacy compared to sharing model weights, soft labels still expose patterns that can be exploited to infer private data.

This brings us to an essential question: Is there a way to improve privacy without sacrificing model performance?

This image has an empty alt attribute; its file name is image-1024x848.png

A Minimalist Solution: Federated Co-Training (FedCT)

Instead of sharing soft labels, FedCT takes a radical yet intuitive approach: clients only share hard labels—definitive class assignments—on a shared, public dataset. The server collects these labels, forms a consensus (e.g., majority vote), and distributes the pseudo-labels back to clients for training.

This image has an empty alt attribute; its file name is image-2-1024x778.png

This approach has three key advantages:

  1. Stronger Privacy – Hard labels leak significantly less information than model parameters or soft labels, drastically reducing the risk of data reconstruction attacks.
  2. Supports Diverse Model Types – Unlike standard FL, which relies on neural networks for parameter aggregation, FedCT works with models like decision trees, rule ensembles, and gradient-boosted decision trees.
  3. Scalability – Since only labels are communicated, the approach dramatically reduces bandwidth usage—by up to two orders of magnitude compared to FedAvg!

Empirical Results: Privacy Without Performance Trade-Offs

We evaluated FedCT on various benchmark datasets, including CIFAR-10, Fashion-MNIST, and real-world medical datasets like MRI-based brain tumor detection and pneumonia classification.

Key Findings:

  • Privacy wins: FedCT achieves near-optimal privacy (VUL ≈ 0.5), making membership inference attacks no better than random guessing.
  • Competitive accuracy: FedCT performs as well as, or better than, standard FL and distributed distillation in most scenarios.
  • Interpretable models: It successfully trains decision trees, random forests, and rule ensembles, unlike other FL approaches.
  • Robust to data heterogeneity: Even when clients have diverse (non-iid) data, FedCT performs comparably to FedAvg.
This image has an empty alt attribute; its file name is image-1-1024x249.png

Notably, FedCT also shines in fine-tuning large language models (LLMs), where its pseudo-labeling mechanism outperforms standard federated learning.


Privacy Through Simplicity

The core idea behind FedCT is beautifully simple: share less, retain more privacy. By moving away from parameter sharing and probabilistic labels, we mitigate privacy risks while keeping performance intact.

That said, some open questions remain:

  • How does the choice of public dataset impact FedCT’s effectiveness?
  • Can we further refine the consensus mechanism for extreme data heterogeneity?
  • How does FedCT behave when applied to real-world hospital collaborations?

For now, our results suggest that in federated learning, little is enough. Sharing only hard labels provides a surprisingly strong privacy-utility trade-off—especially in critical applications like healthcare.

For more details, check out our paper on arXiv or our GitHub repository.

Stay tuned for more research updates from our Trustworthy Machine Learning group at IKIM and Ruhr University Bochum!

Layer-Wise Linear Mode Connectivity

We presented our work on layer-wise linear mode connectivity at ICLR 2024 let by Linara Adilova, with Maksym Andriushchenko, Michael Kamp, Asja Fischer and Martin Jaggi.

We know that linear mode connectivity doesn’t hold for two independently trained models. But what about *layer-wise* LMC? Well, it is very different!

We investigate layer-wise averaging and discover that for multiple networks, tasks, and setups averaging only one layer does not affect the performance! This is inline with the research showing that re-initialization of individual layers does not change accuracy.

Nevertheless, is there some critical amount of layers needed to be averaged to get to a high loss point? It turns out that barrier-prone layers are concentrated in the middle of a model.

Is there a way to gain more insights on this phenomenon? Let’s see how it looks like for a minimalistic example of a deep linear network. Ultimately, linear network is convex with respect to any of its layer cuts.

Can robustness explain this property, i.e., all the neural networks have a particular weight changes robustness that allows to compensate for one layer modifications? For some layers the answer is yes, it is indeed much harder to get to a high loss for a more robust model.

It also means that we cannot treat random directions as uniformly representative of the loss surface: our experiment shows particular subspaces to be more stable than others. Especially, single layer subspaces have a different tolerance to noise!

Federated Daisy-Chaining

How can we learn high quality models when data is inherently distributed across sites and cannot be shared or pooled? In federated learning, the solution is to iteratively train models locally at each site and share these models with the server to be aggregated to a global model. As only models are shared, data usually remains undisclosed. This process, however, requires sufficient data to be available at each site in order for the locally trained models to achieve a minimum quality – even a single bad model can render aggregation arbitrarily bad.

In healthcare settings, however, we often have as little as a few dozens of samples per hospital. How can we still collaboratively train a model from a federation of hospitals, without infringing on patient privacy?

At this year’s ICLR, my colleagues Jonas Fischer, Jilles Vreeken and me presented an novel building block for federated learning called daisy-chaining. This approach trains models consecutively on local datasets, much like a daisy chain. Daisy-chaining alone, however, violates privacy, since a client can infer from a model upon the data of the client it received it from. Moreover, performing daisy-chaining naively would lead to overfitting which can cause learning to diverge. In our paper “Federated Learning from Small Datasets“, we propose to combine daisy-chaining of local datasets with aggregation of models, both orchestrated by the server, and term this method Federated Daisy-Chaining (FedDC).

This approach allows us to train models successfully from as little as 2 samples per client. Our results on image data (Table 1) show that FedDC not only outperforms standard federated avering (FedAvg), but also state-of-the-art federated learning approaches, achieving a test accuracy close to centralized training.

Nothing but Regrets – Federated Causal Discovery

Discovering causal relationships enables us to build more reliable, robust, and ultimately trustworthy models. It requires large amounts of observational data, though. In healthcare, for most diseases the amount of available data is large, but this data is scattered over thousands of hospitals worldwide. Since this data in most cases mustn’t be pooled for privacy reasons, we need a way to learn a structural causal model in a federated fashion.

At this year’s AISTATS, my co-authors Osman Mian, David Kaltenpoth, Jilles Vreeken and me presented the paper “Nothing but Regrets – Privacy-Preserving Federated Causal Discovery” in which we show that you can discover causal relationships by sharing only regret values with a server: The server sends a candidate causal model to each client and the clients reply with how much worse single-edge extensions of this global model are compared to the original global model. From this information alone, the server can compute the best extension of the current global model.

In practice, the environments at the local clients are not the same. We should expect local differences that could be modeled by interventions into the global causal structure. In our AAAI paper “Information-Theoretic Causal Discovery and Intervention Detection over Multiple Environments” we have shown how to discover a global causal structure as well as local interventions in a centralized setting. Our current goal is to combine these two works to provide an approach to federated causal discovery from heterogeneous environments.

Causal Discovery from Multiple Environments

At AAAI 2023 my colleagues Osman Mian, Jilles Vreeken and me presented our paper “Information-Theoretic Causal Discovery and Intervention Detection over Multiple Environments” in which we learn a global structural causal model over multiple environments, as well as discover potential local intervention that change some causal relationships within particular environments.

For medical data this has an enormous impact: Being able to reliably detect causal relationships in medical data, such as gene expressions or patient records, allows us not only to build more reliable and trustworthy models, but also to detect novel insights on diseases and risk factors.

Reliably detecting causal relationships requires large amounts of observational data, though. Therefore, it is paramount to develop privacy-preserving methods to tap into the large, but inherently distributed medical datasets in hospitals all over the world. What we need is federated causal discovery.