I just left Kuaishou Technology (SEHK:1024) , where I spent a few years leading multiple engineering and algorithm teams building the company's large scale AI platform. I also served as a Senior Staff Research Scientist at Kuaishou Seattle AI Lab . We built reliable, scalable, and distributed systems with cutting edge technologies, including large scale ranking/personalization engines, high performance distributed deep learning infrastructure, and model compression frameworks.
I received my PhD degree from University of Rochester in 2019. We published important results on distributed machine learning, including the theoretical justification of asynchronous SGD (NeurIPS 2015 spotlight ), and the first decentralized SGD with linear speedup (NeurIPS 2017 oral ).
I got started with programming before I went to preschool. Comprehensive engineering knowledge together with deep understanding of algorithms help me keep teams on the right direction to solve challenging real world problems.
Curriculum Vitæ: HTML .
- Ph.D. (2015-2019) in Computer Science, University of Rochester
- B.S. (2011-2015) in Physics, University of Science and Technology of China
Our advanced GPU-based large scale learning system for ad recommendation
and CTR prediction tasks. PERSIA was launched in 2018, and open sourced in
2021. PERSIA supports models with up to 100 trillion parameters, and is by
far the fastest public recommendation model training framework.
(collaborate with DS3 Lab)
- Training Deep Learning-based recommender models of 100 trillion parameters over Google Cloud.
- PERSIA, the largest recommended training system in the history of open source by far.
- Story: 640x Faster GPU Based Learning System for Ad Recommendation.
- Story: Innovation, Balance, and Big Picture: The Speed of Kwai Commercialization.
Our deep learning training acceleration framework to speed up large
scale training tasks, including data loader optimization, advanced distributed
training algorithm, network communication optimization, and more. Bagua is
our solution to solve the training bottleneck at Kuaishou Technology (more
than a million videos uploaded per hour). It is now open sourced.
(collaborate with DS3 Lab)
- Hammer An automatic deep learning model compression tool by us. Useful for trimming down large models while keeping the model's accuracy as good as possible. Hammer has saved thousands of GPU cards and enabled hundreds of complex models to be successfully deployed in Kuaishou Technology.
We released a new game AI for DouDizhu
. The corresponding research paper was accepted by ICML 2021.
(collaborate with DATA Lab )
- cproxy A little handy tool I created to apply proxy transparently on individual processes.
- IEEE Low-Power Computer Vision Challenge Our team won the 2nd prize. Congrats!
- I received the 30 New Generation Digital Economy Talents (30 位新生代数字经济人才) award from World Internet Conference and Big Data Digest . Thanks!
- How we reduced 95% of the communication cost in training deep neural networks.
- IBM cuts AI speech recognition training time from a week to 11 hours using our AD-PSGD algorithm.
- IBM achieves 10x performance improvement with our decentralized training algorithms.
- Our research on staleness-aware ASGD is applied in the largest Java deep learning framework Deeplearning4j .
Academic Professional Activities
- Senior Program Committee of AAAI
- Program Committee of ICML, NeurIPS , ICLR, AISTATS , AAAI, and ScaDL .
Journal Reviewer of
- Journal of Machine Learning Research (JMLR)
- Machine Learning
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- IEEE Transactions on Information Theory
- IEEE Transactions on Network Science and Engineering
- IEEE Transactions on Neural Networks and Learning Systems
- IEEE Transactions on Knowledge and Data Engineering
- IEEE Transactions on Signal Processing
- IEEE Internet of Things Journal
- Data Mining and Knowledge Discovery
- BIT Numerical Mathematics
- Computational Optimization and Applications
- Optimization Methods and Software
- European Journal of Operational Research
- Journal of Parallel and Distributed Computing
- Pattern Recognition
- Neural Networks
- Parallel Computing
- International Journal of Electrical Power & Energy Systems
- Journal of Optimization Theory and Applications