Collection de modèles

Model Collection


Cette section est en plein développement.

Cette section est composée d'une collection et d'un résumé des LLMs notables et fondamentaux. Données adoptées de Papers with Code (opens in a new tab) et du travail récent de Zhao et al. (2023) (opens in a new tab).


ModelRelease DateDescription
BERT (opens in a new tab)2018Bidirectional Encoder Representations from Transformers
GPT (opens in a new tab)2018Improving Language Understanding by Generative Pre-Training
RoBERTa (opens in a new tab)2019A Robustly Optimized BERT Pretraining Approach
GPT-2 (opens in a new tab)2019Language Models are Unsupervised Multitask Learners
T5 (opens in a new tab)2019Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
BART (opens in a new tab)2019Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
ALBERT (opens in a new tab)2019A Lite BERT for Self-supervised Learning of Language Representations
XLNet (opens in a new tab)2019Generalized Autoregressive Pretraining for Language Understanding and Generation
CTRL (opens in a new tab)2019CTRL: A Conditional Transformer Language Model for Controllable Generation
ERNIE (opens in a new tab)2019ERNIE: Enhanced Representation through Knowledge Integration
GShard (opens in a new tab)2020GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
GPT-3 (opens in a new tab)2020Language Models are Few-Shot Learners
LaMDA (opens in a new tab)2021LaMDA: Language Models for Dialog Applications
PanGu-α (opens in a new tab)2021PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
mT5 (opens in a new tab)2021mT5: A massively multilingual pre-trained text-to-text transformer
CPM-2 (opens in a new tab)2021CPM-2: Large-scale Cost-effective Pre-trained Language Models
T0 (opens in a new tab)2021Multitask Prompted Training Enables Zero-Shot Task Generalization
HyperCLOVA (opens in a new tab)2021What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Codex (opens in a new tab)2021Evaluating Large Language Models Trained on Code
ERNIE 3.0 (opens in a new tab)2021ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Jurassic-1 (opens in a new tab)2021Jurassic-1: Technical Details and Evaluation
FLAN (opens in a new tab)2021Finetuned Language Models Are Zero-Shot Learners
MT-NLG (opens in a new tab)2021Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Yuan 1.0 (opens in a new tab)2021Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
WebGPT (opens in a new tab)2021WebGPT: Browser-assisted question-answering with human feedback
Gopher (opens in a new tab)2021Scaling Language Models: Methods, Analysis & Insights from Training Gopher
ERNIE 3.0 Titan (opens in a new tab)2021ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
GLaM (opens in a new tab)2021GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
InstructGPT (opens in a new tab)2022Training language models to follow instructions with human feedback
GPT-NeoX-20B (opens in a new tab)2022GPT-NeoX-20B: An Open-Source Autoregressive Language Model
AlphaCode (opens in a new tab)2022Competition-Level Code Generation with AlphaCode
CodeGen (opens in a new tab)2022CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
Chinchilla (opens in a new tab)2022Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data.
Tk-Instruct (opens in a new tab)2022Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
UL2 (opens in a new tab)2022UL2: Unifying Language Learning Paradigms
PaLM (opens in a new tab)2022PaLM: Scaling Language Modeling with Pathways
OPT (opens in a new tab)2022OPT: Open Pre-trained Transformer Language Models
BLOOM (opens in a new tab)2022BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
GLM-130B (opens in a new tab)2022GLM-130B: An Open Bilingual Pre-trained Model
AlexaTM (opens in a new tab)2022AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Flan-T5 (opens in a new tab)2022Scaling Instruction-Finetuned Language Models
Sparrow (opens in a new tab)2022Improving alignment of dialogue agents via targeted human judgements
U-PaLM (opens in a new tab)2022Transcending Scaling Laws with 0.1% Extra Compute
mT0 (opens in a new tab)2022Crosslingual Generalization through Multitask Finetuning
Galactica (opens in a new tab)2022Galactica: A Large Language Model for Science
OPT-IML (opens in a new tab)2022OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
LLaMA (opens in a new tab)2023LLaMA: Open and Efficient Foundation Language Models
GPT-4 (opens in a new tab)2023GPT-4 Technical Report
PanGu-Σ (opens in a new tab)2023PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
BloombergGPT (opens in a new tab)2023BloombergGPT: A Large Language Model for Finance
Cerebras-GPT (opens in a new tab)2023Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
PaLM 2 (opens in a new tab)2023A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM.