2025 |
Moon Kim Frederic Jonske, Enrico Nasca Why does my medical AI look at pictures of birds? Exploring the efficacy of transfer learning across domain boundaries (Journal Article) In: Computer Methods and Programs in Biomedicine, 2025. (Links | BibTeX | Tags: deep learning, fine-tuning, foundation models, Medical AI) @article{jonske2025why, |
2024 |
Singh, Sidak Pal; Adilova, Linara; Kamp, Michael; Fischer, Asja; Schölkopf, Bernhard; Hofmann, Thomas Landscaping Linear Mode Connectivity (Proceedings Article) In: ICML Workshop on High-dimensional Learning Dynamics: The Emergence of Structure and Reasoning, 2024. (BibTeX | Tags: deep learning, linear mode connectivity, theory of deep learning) @inproceedings{singh2024landscaping, |
Adilova, Linara; Andriushchenko, Maksym; Fischer, Michael Kamp Asja; Jaggi, Martin Layer-wise Linear Mode Connectivity (Proceedings Article) In: International Conference on Learning Representations (ICLR), Curran Associates, Inc, 2024. (Abstract | Links | BibTeX | Tags: deep learning, layer-wise, linear mode connectivity) @inproceedings{adilova2024layerwise, Averaging neural network parameters is an intuitive method for fusing the knowledge of two independent models. It is most prominently used in federated learning. If models are averaged at the end of training, this can only lead to a good performing model if the loss surface of interest is very particular, i.e., the loss in the exact middle between the two models needs to be sufficiently low. This is impossible to guarantee for the non-convex losses of state-of-the-art networks. For averaging models trained on vastly different datasets, it was proposed to average only the parameters of particular layers or combinations of layers, resulting in better performing models. To get a better understanding of the effect of layer-wise averaging, we analyse the performance of the models that result from averaging single layers, or groups of layers. Based on our empirical and theoretical investigation, we introduce a novel notion of the layer-wise linear connectivity, and show that deep networks do not have layer-wise barriers between them. We analyze additionally the layer-wise personalization averaging and conjecture that in particular problem setup all the partial aggregations result in the approximately same performance. |
2022 |
Michael Kamp Amr Abourayya, Erman Ayday AIMHI: Protecting Sensitive Data through Federated Co-Training (Workshop) 2022. (Links | BibTeX | Tags: aimhi, co-training, deep learning, federated learning, privacy) @workshop{abourayya2022aimhi, |
2021 |
Li, Xiaoxiao; Jiang, Meirui; Zhang, Xiaofei; Kamp, Michael; Dou, Qi FedBN: Federated Learning on Non-IID Features via Local Batch Normalization (Proceedings Article) In: Proceedings of the 9th International Conference on Learning Representations (ICLR), 2021. (Abstract | Links | BibTeX | Tags: batch normalization, black-box parallelization, deep learning, federated learning) @inproceedings{li2021fedbn, The emerging paradigm of federated learning (FL) strives to enable collaborative training of deep models on the network edge without centrally aggregating raw data and hence improving data privacy. In most cases, the assumption of independent and identically distributed samples across local clients does not hold for federated learning setups. Under this setting, neural network training performance may vary significantly according to the data distribution and even hurt training convergence. Most of the previous work has focused on a difference in the distribution of labels or client shifts. Unlike those settings, we address an important problem of FL, e.g., different scanners/sensors in medical imaging, different scenery distribution in autonomous driving (highway vs. city), where local clients store examples with different distributions compared to other clients, which we denote as feature shift non-iid. In this work, we propose an effective method that uses local batch normalization to alleviate the feature shift before averaging models. The resulting scheme, called FedBN, outperforms both classical FedAvg, as well as the state-of-the-art for non-iid data (FedProx) on our extensive experiments. These empirical results are supported by a convergence analysis that shows in a simplified setting that FedBN has a faster convergence rate than FedAvg. Code is available at https://github.com/med-air/FedBN. |
Publications
2025 |
Why does my medical AI look at pictures of birds? Exploring the efficacy of transfer learning across domain boundaries (Journal Article) In: Computer Methods and Programs in Biomedicine, 2025. |
2024 |
Landscaping Linear Mode Connectivity (Proceedings Article) In: ICML Workshop on High-dimensional Learning Dynamics: The Emergence of Structure and Reasoning, 2024. |
Layer-wise Linear Mode Connectivity (Proceedings Article) In: International Conference on Learning Representations (ICLR), Curran Associates, Inc, 2024. |
2022 |
AIMHI: Protecting Sensitive Data through Federated Co-Training (Workshop) 2022. |
2021 |
FedBN: Federated Learning on Non-IID Features via Local Batch Normalization (Proceedings Article) In: Proceedings of the 9th International Conference on Learning Representations (ICLR), 2021. |