Publications

(* denotes equal contributions)

Hexgen-Text2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow
You Peng*, Youhe Jiang*, Wenqi Jiang, Chen Wang, Binhang Yuan
ICDE 2026
| paper | code |

BOute: Cost-Efficient LLM Serving with Heterogeneous LLMs and GPUs via Multi-Objective Bayesian Optimization
Youhe Jiang, Fangcheng Fu, Eiko Yoneki
MLSys 2026
| paper | code |

FSA: An Alternative Efficient Implementation of Native Sparse Attention Kernel
Ran Yan*, Youhe Jiang*, Zhuoming Chen, Haohui Mai, Beidi Chen, Binhang Yuan
ICLR 2026
| paper | code |

Cascadia: A Cascade Serving System for Large Language Models
Youhe Jiang*, Fangcheng Fu*, Wanru Zhao*, Stephan Rabanser, Nicholas D. Lane, Binhang Yuan
ICLR 2026
| paper | code |

HexiScale: Accommodating Large Language Model Training over Heterogeneous Environment
Ran Yan*, Youhe Jiang*, Xiaonan Nie, Fangcheng Fu, Bin Cui, Binhang Yuan
MLSys 2026
| paper | code |

Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs
Guoliang He*, Youhe Jiang*, Wencong Xiao, Jiang Kaihua, Shuguang Wang, Jun Wang, Du Zixian, Zhuo Jiang, Xinlei Zhang, Binhang Yuan, Eiko Yoneki
NIPS 2025
| paper | code |

Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately
Yuhang Wang, Youhe Jiang, Bin Cui, Fangcheng Fu
FAISys 2025
| paper | code |

Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs
Youhe Jiang*, Fangcheng Fu*, Xiaozhe Yao, Guoliang He, Xupeng Miao, Ana Klimovic, Bin Cui, Binhang Yuan, Eiko Yoneki
ICML 2025
| paper | code |

ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments
Youhe Jiang*, Fangcheng Fu*, Xiaozhe Yao*, Taiyi Wang, Bin CUI, Ana Klimovic, Eiko Yoneki
MLSys 2025
| paper | code |

HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment
Youhe Jiang*, Ran Yan*, Binhang Yuan
ICLR 2025
| paper | code |

HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment
Youhe Jiang*, Ran Yan*, Xiaozhe Yao*, Yang Zhou, Beidi Chen, Binhang Yuan
ICML 2024
| paper | code |

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Bin Cui
IJCAI 2023
| paper | code |

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Xupeng Miao*, Yujie Wang*, Youhe Jiang*, Chunan Shi, Xiaonan Nie, Hailin Zhang, Bin Cui
VLDB 2023
| paper | code |

Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Shenhan Zhu, Xiaonan Nie, Yaofeng Tu, Bin Cui
TKDE 2024
| paper | code |

Revisiting the Time Cost Model of AllReduce
Dian Xiong, Li Chen, Youhe Jiang, Dan Li, Shuai Wang, Songtao Wang
Arxiv
| paper | code |

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang, Xupeng Miao, Xiaonan Nie, Bin Cui
ICML 2023 workshop
| paper | code |

2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning
Youhe Jiang, Huaxi Gu, Yunfeng Lu, Xiaoshan Yu
IEEE Access 2020
| paper | code |