My research aims to leverage foundation models for solving hard tasks through specialization and reinforcement learning. Beyond this, I have broad interests including (approximate) probabilistic inference, optimization, and online learning.
Always feel free to reach out to me with things you find exciting.
We release three papers on self-distillation: [1] on self-distillation from demonstrations enabling continual learning, [2] on reinforcement learning via self-distillation, and [3] on online self-distillation from raw user interactions. Read more.
@article{kleinebuening2026aligning,title={Aligning Language Models from User Interactions},author={Kleine Buening, Thomas and Hübotter, Jonas and Pásztor, Barna and Shenfeld, Idan and Ramponi, Giorgia and Krause, Andreas},year={2026},journal={arXiv preprint arXiv:2603.12273},}
Oral
Reinforcement Learning via Self-Distillation
Jonas Hübotter, Frederike Lübeck, Lejs Behric , and 8 more authors
arXiv preprint arXiv:2601.20802, 2026
Oral Presentation at ICLR 2026 Workshops on Scaling Post-training for LLMs and on Test-Time Updates.
@article{hubotter2026reinforcement,title={Reinforcement Learning via Self-Distillation},author={Hübotter, Jonas and Lübeck, Frederike and Behric, Lejs and Baumann, Anton and Bagatella, Marco and Marta, Daniel and Hakimi, Ido and Shenfeld, Idan and Kleine Buening, Thomas and Guestrin, Carlos and Krause, Andreas},year={2026},journal={arXiv preprint arXiv:2601.20802},}
Oral
Self-Distillation Enables Continual Learning
Idan Shenfeld, Mehul Damani, Jonas Hübotter , and 1 more author
arXiv preprint arXiv:2601.19897, 2026
Oral Presentation at ICLR 2026 Workshop on Lifelong Agents.
@article{shenfeld2026self,title={Self-Distillation Enables Continual Learning},author={Shenfeld, Idan and Damani, Mehul and Hübotter, Jonas and Agrawal, Pulkit},year={2026},journal={arXiv preprint arXiv:2601.19897},}
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning
Jonas Hübotter*, Leander Diaz-Bone*, Ido Hakimi , and 2 more authors
@article{hubotter2025learning,title={Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning},author={Hübotter, Jonas and Diaz-Bone, Leander and Hakimi, Ido and Krause, Andreas and Hardt, Moritz},year={2025},journal={arXiv preprint arXiv:2510.04786},}
ICLR ’26 Oral
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Jonas Hübotter*, Patrik Wolf*, Alexander Shevchenko* , and 3 more authors
In International Conference on Learning Representations (2026) , 2025
Oral Presentation at NeurIPS 2025 Workshop on Continual and Compatible Foundation Model Updates.
@inproceedings{hubotter2025specialization,title={Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models},author={Hübotter, Jonas and Wolf, Patrik and Shevchenko, Alexander and Jüni, Dennis and Krause, Andreas and Kur, Gil},year={2025},booktitle={International Conference on Learning Representations (2026)},}
NeurIPS ’25
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Leander Diaz-Bone*, Marco Bagatella*, Jonas Hübotter* , and 1 more author
In Advances in Neural Information Processing Systems (2025) , 2025
@inproceedings{diazbone2025discover,title={DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning},author={Diaz-Bone, Leander and Bagatella, Marco and Hübotter, Jonas and Krause, Andreas},year={2025},booktitle={Advances in Neural Information Processing Systems (2025)},}
COLM ’25
Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging
Ryo Bertolissi*, Jonas Hübotter*, Ido Hakimi , and 1 more author
@inproceedings{bertolissi2025local,title={Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging},author={Bertolissi, Ryo and Hübotter, Jonas and Hakimi, Ido and Krause, Andreas},year={2025},booktitle={Conference on Language Modeling (2025)},}
ICLR ’25 Best Paper
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
Jonas Hübotter, Sascha Bongni, Ido Hakimi , and 1 more author
In International Conference on Learning Representations (2025) , 2024
Best Paper Award at NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning.
@inproceedings{hubotter2024efficiently,title={Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs},author={H{\"u}botter, Jonas and Bongni, Sascha and Hakimi, Ido and Krause, Andreas},year={2024},booktitle={International Conference on Learning Representations (2025)},}
NeurIPS ’24 Oral
Transductive Active Learning: Theory and Applications
Jonas Hübotter, Bhavya Sukhija, Lenart Treven , and 2 more authors
In Advances in Neural Information Processing Systems (2024) , 2024
Oral Presentation at ICML 2024 Workshop on Aligning Reinforcement Learning Experimentalists and Theorists.
@inproceedings{hubotter2024transductive,title={Transductive Active Learning: Theory and Applications},author={H{\"u}botter, Jonas and Sukhija, Bhavya and Treven, Lenart and As, Yarden and Krause, Andreas},year={2024},booktitle={Advances in Neural Information Processing Systems (2024)},}
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models🎥📝 NeurIPS Workshop on Continual and Compatible Foundation Model Updates, San Diego
Supervision
I have had the privilege of advising several BSc and MSc students during their theses and semester projects. Some of these projects have led to publications.
Lejs Behric (MSc): Reinforcement Learning via Self-Distillation
Tim Launer (MSc): Evaluating Training Paradigms and Consensus Mechanisms For Learning To Reason (with Marco Bagatella and Ido Hakimi)
Dennis Jüni (MSc): Meta Test-Time Training for Image Classification (with Frederike Lübeck, ICLR'26)
Matthias Otth (MSc): Efficient Fine-Tuning and Test-Time Training of Large Language Models for Reasoning Tasks (with Ido Hakimi, SCALR@COLM'25)
Ryo Bertolissi (BSc): Test-Time Model Merging for Mixture of Local Experts (with Ido Hakimi, COLM'25)
Nicolas Menet (MSc): Efficiently Estimating Gaussian Probability of Maximality (with Parnian Kassraie, AISTATS'25)
Sascha Bongni (BSc): Active Fine-Tuning of Large Language Models (ICLR'25)
Pablo Lahmann (MSc): Safe Control as Inference (with Yarden As)
Anh Duc Nguyen (BSc): Safe Bayesian Optimization without Regret
You can find a list of potential projects of our research group here. If you want to work with me, please send me an email describing your area of interest. Please also attach your CV and up-to-date transcripts.