Shuowei Jin

Shuowei Jin

Applied Scientist at Amazon

University of Michigan

Biography

Hi there, I am currently an Applied Scientist at Amazon working on LLM Post-training Algorithm/System. Previously, I obtained the PhD in the Computer Science and Engineering Department at the University of Michigan under the supervision of Prof. Z. Morley Mao. I received my bachelor’s degree from the School of the Gifted Young at the University of Science and Technology of China, majoring in computer science.

My research spans efficient LLM inference/training algorithms and systems, LLM post-training recipes and general machine learning systems.

Interests
  • Efficient LLM Inference/Training Algorithms/Systems
  • LLM Post-Training Recipe
  • General Machine Learning Systems
Education
  • PhD in Computer Science and Engineering, 2020-2025

    University of Michigan

  • BEng in Computer Science, 2016-2020

    School of the Gifted Young, University of Science and Technology of China

Research Snapshot

Efficient LLM Inference/Training

  • Efficient multi-agent training framework: AstraFlow
  • Low-latency KV cache loading: Cake ICML'25
  • Parallel inference for LLMs: Plato COLM'25
  • Multi-tenant LLM serving: LLMVisor NeurIPS'25
  • MoE training on heterogeneous GPUs: HeterMoE
  • Efficient router for LLMs: Eagle NeurIPS'24
🧠

LLM Post-Training Recipe

  • LLM context management: Co-Mem ICML'26
  • Stable multi-turn agent training: T2PO ICML'26 Spotlight
  • Cumulative importance sampling: CTPO
  • Reflection-enhanced distillation: RESD
🏬

General Machine Learning Systems

  • Neural-enhanced mobile video streaming: Oasis MMSys'24 Best Paper
  • ML-driven cloud database optimization: Restune SIGMOD'21

Recent Publications

Quickly discover relevant content by filtering publications.
(2026). CoMem: Context Management with A Decoupled Long-Context Model ICML 2026
(2026). T2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning ICML 2026 Spotlight
(2025). LLMVisor: A Real-Time Latency Attribution Model for Multi-Tenant LLM Serving NeurIPS 2025 ML4Sys-Wksp
(2025). Plato: Plan to Efficiently Decode for Large Language Model Inference COLM 2025
(2025). Compute Or Load KV Cache? Why Not Both? ICML 2025
(2025). HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs Preprint
(2025). MARS: Harmonizing Multimodal Convergence via Adaptive Rank Search arXiv
(2024). Eagle: Efficient Training-Free Router for Multi-LLM Inference ML for Systems@NeurIPS 2024

Services

  • Conference Reviewer: ICLR 2025, MM 2024, WWW 2024, NeurIPS 2023, ICML 2023.
  • Journal Reviewer: Neural Networks, IEEE TMC, IEEE Network Magazine.
  • Artifact Evaluation: SIGCOMM 2024.

Contact

Miscellaneous

Ars longa, vita brevis (艺术千秋,人生朝露)
.
Things I love.
. .js-id-nonexistent