Shuowei Jin

Shuowei Jin

Applied Scientist at Amazon

University of Michigan

Biography

Hi there, I am currently an Applied Scientist at Amazon working on LLM Post-training Algorithm/System. Previously, I obtained the PhD in the Computer Science and Engineering Department at the University of Michigan under the supervision of Prof. Z. Morley Mao. I received my bachelor’s degree from the School of the Gifted Young at the University of Science and Technology of China, majoring in computer science.

My research interests lie at the intersection of machine learning systems and network systems, with a current focus on enhancing the efficiency of large language model (LLM) inference through the co-design of algorithms and system architectures.

Interests
  • Efficient LLM inference algorithms/systems
  • Machine Learning Systems
Education
  • PhD in Computer Science and Engineering, 2020-2025

    University of Michigan

  • BEng in Computer Science, 2016-2020

    School of the Gifted Young, University of Science and Technology of China

Research Snapshot

Recent topics of my research:

  • Efficient LLM Inference and Training Algorithms/Systems

    • Low-latency KV cache fetching: Cake
    • Semantic-Aware Parallel Decoding for LLMs: Plato
    • MoE Training on Heterogeneous GPUs: HeterMoE
    • Multi-model routing for LLM serving: Eagle
  • Machine Learning Systems

    • Neural-enhanced video streaming on mobile devices: Oasis
    • Specification-guided ML system safety: AutoSpec
    • ML-driven cloud database optimization: Restune
  • Network Systems

    • Performance analysis of 5G networks: 5G
    • Performance analysis of QUIC protocol: QUIC

Recent Publications

Quickly discover relevant content by filtering publications.
(2025). Plato: Plan to Efficiently Decode for Large Language Model Inference COLM2025
(2025). Compute Or Load KV Cache? Why Not Both? ICML2025
(2025). HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs Preprint
(2024). Eagle: Efficient Training-Free Router for Multi-LLM Inference ML for Systems@NeurIPS24
(2024). AutoSpec: Automated Generation of Neural Network Specifications Preprint

Services

  • Conference Reviewer: ICLR 2025, MM 2024, WWW 2024, NeurIPS 2023, ICML 2023.
  • Journal Reviewer: Neural Networks, IEEE TMC, IEEE Network Magazine.
  • Artifact Evaluation: SIGCOMM 2024.

Contact

Miscellaneous

Ars longa, vita brevis (艺术千秋,人生朝露)
.
Things I love.
. .js-id-nonexistent