Sorry, your browser cannot access this site
This page requires browser support (enable) JavaScript
Learn more >

Large-scale pre-trained models (PTMs), such as Transformer models, have promoted the deep learning (DL) development on various complicated tasks, including natural language processing, e.g., BERT [9], GPT [6], T5 [41], computer vision, e.g., ViT [10], Swin [25], advertising recommendation, e.g., M6 [24], and so on.
These models are also known as foundation models since they are trained on hundreds of gigabytes of data and can be adapted, e.g., task-specific fine-tuning, to a wide range of downstream tasks

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

https://zhuanlan.zhihu.com/p/338817680

评论