Publications

(2026). T2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning ICML 2026 Spotlight
(2026). CoMem: Context Management with A Decoupled Long-Context Model ICML 2026
(2025). LLMVisor: A Real-Time Latency Attribution Model for Multi-Tenant LLM Serving NeurIPS 2025 ML4Sys-Wksp
(2025). Plato: Plan to Efficiently Decode for Large Language Model Inference COLM 2025
(2025). Compute Or Load KV Cache? Why Not Both? ICML 2025
(2025). HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs Preprint
(2025). MARS: Harmonizing Multimodal Convergence via Adaptive Rank Search arXiv
(2024). Eagle: Efficient Training-Free Router for Multi-LLM Inference ML for Systems@NeurIPS 2024
(2024). AutoSpec: Automated Generation of Neural Network Specifications Preprint
(2024). On Data Fabrication in Collaborative Vehicular Perception: Attacks and Countermeasures USENIX Security 2024
(2024). OASIS: Collaborative Neural-Enhanced Mobile Video Streaming MMSys 2024 Best Paper Award
(2024). QUIC is not Quick Enough over Fast Internet WWW 2024
(2024). The Case for Boosting Mobile Application QoE via Smart Band Switching in 5G/xG Networks HotMobile 2024
(2022). Vivisecting Mobility Management in 5G Cellular Networks SIGCOMM 2022
(2021). ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases SIGMOD 2021
(2021). A Variegated Look at 5G in the Wild: Performance, Power, and QoE Implications SIGCOMM 2021