I am a Ph.D. candidate in the School of Computer Science at Nanjing University. My research focuses on building efficient systems for Large Language Models.

Ph.D. in Computer Science
School of Computer Science
Advised by Prof. Chen Tian and Prof. Zhibin Wang

B.S. in Computer Science
School of Future Technology

Research Intern
LLM Inference Optimization, Scheduling Optimization

Research Intern
Network Innovation Engineering, Multi-Agent Diagnosis
Developing novel approaches to accelerate large language model inference through speculative decoding. By leveraging diffusion-inspired methods, we can generate multiple tokens simultaneously while maintaining output quality, achieving significant speedups in real-world serving scenarios.
Building efficient systems for serving large language models at scale, focusing on optimizing throughput and latency through system-level innovations in scheduling, memory management, and parallelism strategies.