Publications

(2025). Compute Or Load KV Cache? Why Not Both? ICML2025
(2025). Plato: Plan to Efficiently Decode for Large Language Model Inference Preprint
(2024). Eagle: Efficient Training-Free Router for Multi-LLM Inference ML for Systems@NeurIPS24
(2024). AutoSpec: Automated Generation of Neural Network Specifications Preprint
(2024). On Data Fabrication in Collaborative Vehicular Perception: Attacks and Countermeasures USENIX Security 2024
(2024). OASIS: Collaborative Neural-Enhanced Mobile Video Streaming MMSys 2024 Best Paper Award
(2024). QUIC is not Quick Enough over Fast Internet WWW 2024
(2024). The Case for Boosting Mobile Application QoE via Smart Band Switching in 5G/xG Networks HotMobile 2024
(2022). Vivisecting Mobility Management in 5G Cellular Networks SIGCOMM 2022
(2021). ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases SIGMOD 2021
(2021). A Variegated Look at 5G in the Wild: Performance, Power, and QoE Implications SIGCOMM 2021