T2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

Publication
Forty-Third International Conference on Machine Learning
Shuowei Jin
Shuowei Jin
Applied Scientist at Amazon

My research interests include efficient LLM inference/training algorithms/systems, LLM post-training recipe, and general machine learning systems.