T2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

Haixin Wang, Hejie Cui, Chenwei Zhang, Xin Liu, Shuowei Jin, Shijie Geng, Xinyang Zhang, Nasser Zalmout, Zhenyu Shi, Yizhou Sun

May, 2026

Type

Publication

Forty-Third International Conference on Machine Learning

My research interests include efficient LLM inference/training algorithms/systems, LLM post-training recipe, and general machine learning systems.