Publications


(* denotes equal contributions)

HexiScale: Accommodating Large Language Model Training over Heterogeneous Environment
Ran Yan*, Youhe Jiang*, Xiaonan Nie, Fangcheng Fu, Bin Cui, Binhang Yuan
Arxiv
| paper | code |

Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs
Youhe Jiang*, Fangcheng Fu*, Xiaozhe Yao, Guoliang He, Xupeng Miao, Ana Klimovic, Bin Cui, Binhang Yuan, Eiko Yoneki
Arxiv
| paper | code |

ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments
Youhe Jiang*, Fangcheng Fu*, Xiaozhe Yao*, Taiyi Wang, Bin CUI, Ana Klimovic, Eiko Yoneki
MLSys 2025
| paper | code |

HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment
Youhe Jiang*, Ran Yan*, Binhang Yuan
ICLR 2025
| paper | code |

HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment
Youhe Jiang*, Ran Yan*, Xiaozhe Yao*, Yang Zhou, Beidi Chen, Binhang Yuan
ICML 2024
| paper | code |

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Bin Cui
IJCAI 2023
| paper | code |

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Xupeng Miao*, Yujie Wang*, Youhe Jiang*, Chunan Shi, Xiaonan Nie, Hailin Zhang, Bin Cui
VLDB 2023
| paper | code |

Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Shenhan Zhu, Xiaonan Nie, Yaofeng Tu, Bin Cui
TKDE 2024
| paper | code |

Revisiting the Time Cost Model of AllReduce
Dian Xiong, Li Chen, Youhe Jiang, Dan Li, Shuai Wang, Songtao Wang
Arxiv
| paper | code |

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang, Xupeng Miao, Xiaonan Nie, Bin Cui
ICML 2023 workshop
| paper | code |

2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning
Youhe Jiang, Huaxi Gu, Yunfeng Lu, Xiaoshan Yu
IEEE Access 2020
| paper | code |