Biography
I am a R&D engineer (started from 07/2024) at Platform of Artificial Intelligence (PAI), Alibaba Cloud. Currently, I am working on building Llumnix v1, a cloud-native dynamic scheduling system for LLM serving, deployed on PAI-EAS (Alibaba Cloud’s model service platform). Previously, I also contributed to and maintained EasyCkpt, a high-performance asynchronous checkpointing framework for LLM pre-training, accelerating the pre-training of Qwen2.5 and Qwen3.
Prior to that, I received my M.Eng. degree from the School of Computer Science and Engineering at Beihang University in June 2024, advised by Prof. Hailong Yang. And I received my B.Eng. degree from the School of Software Engineering at South China University of Technology in June 2021.
Education
- M.Eng., Computer Science, Beihang University, Beijing. (09/2021 - 06/2024)
- Supervisor: Prof. Hailong Yang
- Research Interests: Machine Learning System, Heterogeneous Computing
- B.Eng., Software Engineering, South China University of Technology, Guangzhou. (09/2017 - 06/2021)
Work Experience
-
Platform of Artificial Intelligence, Alibaba Cloud, Hangzhou. (07/2024 - Now)
R&D Engineer
- Core contributor of Llumnix v1, a cloud-native dynamic scheduling system for LLM serving, deployed on PAI-EAS (Alibaba Cloud’s model service platform).
- Core contributor of Llumnix v0, a ray-native dynamic scheduling system for LLM serving. [Repo]
- Contributor and core maintainer of EasyCkpt, a high performance asynchronous checkpointing framework for LLM pre-training, accelerating the pre-training of Qwen2.5 and Qwen3. [Doc]
Intern Experience
-
Platform of Artificial Intelligence, Alibaba Cloud, Beijing. (06/2023 - 06/2024)
Intern R&D Engineer
- Designed and developed the prototype system of Llumnix. Llumnix proposes a dynamic scheduling policy that reschedules requests and their in-memory states with an efficient and scalable live migration mechanism, improving load balancing and isolation, mitigating resource fragmentation, and differentiating request priorities and SLOs.
- Llumnix: Dynamic Scheduling for Large Language Model Serving (OSDI’24, co-first author) [Paper/Slides/Talk]
- Model-ToolChain Team, Sensetime, Beijing. Intern System Researcher. (03/2023 - 05/2023)
- Platform of Artificial Intelligence, Alibaba Cloud, Beijing. Intern System Researcher. (07/2021 - 12/2021)
Publications
-
Qwen3 Technical Report. Qwen Team (contributor). arXiv preprint 2025 (arXiv:2505.09388). [PDF]
-
Qwen2.5 Technical Report. Qwen Team (contributor). arXiv preprint 2024 (arXiv:2412.15115). [PDF]

-
Llumnix: Dynamic Scheduling for Large Language Model Serving
Biao Sun*, Ziming Huang*, Hanyu Zhao*, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin (*co-first authors)
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI’24)
-
Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU
Jianjin Liao, Mingzhen Li, Hailong Yang, Qingxiao Sun, Biao Sun, Jiwei Hao, Tianyu Feng, Fengwei Yu, Shengdong Chen, Ye Tao, Zicheng Zhang, Zhongzhi Luan, Depei Qian
2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS’23)
-
EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs
Mingzhen Li, Wencong Xiao, Hailong Yang, Biao Sun, Hanyu Zhao, Shiru Ren, Zhongzhi Luan, Xianyan Jia, Yi Liu, Yong Li, Depei Qian, Wei Lin
International Conference for High Performance Computing, Networking, Storage, and Analysis 2023 (SC’23)
-
Adapting Combined Tiling to Stencil Optimizations on Sunway Processor
Biao Sun, Mingzhen Li, Hailong Yang, Jun Xu, Huaitao Zhang, Zhongzhi Luan, Depei Qian
CCF Transactions on High Performance Computing 2023 (THPC’23)
Awards
- Top 10 Parents of Alibaba Cloud & Tongyi(Qwen) Lab, Alibaba Cloud, 2025
- CCF HPCChina Outstanding Paper Award, Beihang University, 2022
- National Scholarship, South China University of Technology, 2018