Shuowei Jin
Shuowei Jin
Home
Experiences
Publications
Services
MISC
Light
Dark
Automatic
LLM
Compute Or Load KV Cache? Why Not Both?
Large Language Models (LLMs) are increasingly deployed in large-scale online services, enabling sophisticated applications. However, …
Shuowei Jin
,
Xueshen Liu
,
Qingzhao Zhang
,
Z. Morley Mao
PDF
Plato: Plan to Efficiently Decode for Large Language Model Inference
Large Language Models (LLMs) are increasingly deployed in large-scale online services, enabling sophisticated applications. However, …
Shuowei Jin
,
Xueshen Liu
,
Yongji Wu
,
Haizhong Zheng
,
Qingzhao Zhang
,
Atul Prakash
,
Matthew Lentz
,
Danyang Zhuo
,
Feng Qian
,
Z. Morley Mao
PDF
Eagle: Efficient Training-Free Router for Multi-LLM Inference
The proliferation of Large Language Models (LLMs) with varying capabilities and costs has created a need for efficient model selection …
Zesen Zhao
,
Shuowei Jin
,
Z. Morley Mao
PDF
AutoSpec: Automated Generation of Neural Network Specifications
The increasing adoption of neural networks in learning-augmented systems highlights the importance of model safety and robustness, …
Shuowei Jin
,
Francis Y. Yan
,
Cheng Tan
,
Anuj Kalia
,
Xenofon Foukas
,
Z. Morley Mao
PDF
Cite
×