モデル一覧
⚠️
このセクションの内容は、鋭意開発進行中です。
このセクションには、注目すべきLLMの基礎技術(モデル)の一覧とその概要をまとめています(Papers with Code (opens in a new tab)とZhao et al. (2023) (opens in a new tab) による直近の研究成果を元に一覧を作成しています)。
Models
| モデル名 | 発表された年 | 概要説明 |
|---|---|---|
| BERT (opens in a new tab) | 2018 | Transformer による双方向(Bidirectional)エンコーダーの特徴表現を利用したモデル |
| GPT (opens in a new tab) | 2018 | 事前学習を利用した生成モデルにより、自然言語の理解を進展させた |
| RoBERTa (opens in a new tab) | 2019 | 頑健性(Robustness)を重視して BERT を最適化する事前学習のアプローチ |
| GPT-2 (opens in a new tab) | 2019 | 自然言語モデルが、教師なし学習によってマルチタスクをこなせるようになるということを実証 |
| T5 (opens in a new tab) | 2019 | フォーマットを統一した Text-to-Text Transformer を用いて、転移学習の限界を探索 |
| BART (opens in a new tab) | 2019 | 自然言語の生成、翻訳、理解のために、 Sequence-to-Sequence な事前学習モデルのノイズを除去した |
| ALBERT (opens in a new tab) | 2019 | 言語表現を自己教師学習するための BERT 軽量(Lite)化モデル |
| XLNet (opens in a new tab) | 2019 | 自然言語の理解と生成のための自己回帰事前学習の一般化 |
| CTRL (opens in a new tab) | 2019 | CTRL: 生成モデルをコントロール可能にするための、条件付き Transformer 言語モデル |
| ERNIE (opens in a new tab) | 2019 | ERNIE: 知識の統合を通じて特徴表現を高度化 |
| GShard (opens in a new tab) | 2020 | GShard: 条件付き演算と自動シャーディング(Sharding)を用いた巨大モデルのスケーリング |
| GPT-3 (opens in a new tab) | 2020 | 自然言語モデルが、 Few-Shot で十分学習できるということを実証 |
| LaMDA (opens in a new tab) | 2021 | LaMDA: 対話(Dialogue)アプリケーションのための自然言語モデル |
| PanGu-α (opens in a new tab) | 2021 | PanGu-α: 自動並列演算を用いて自己回帰事前学習された、中国語大規模言語モデル |
| mT5 (opens in a new tab) | 2021 | mT5: 多言語で大規模に事前学習された text-to-text transformer |
| CPM-2 (opens in a new tab) | 2021 | CPM-2: Large-scale Cost-effective Pre-trained Language Models |
| T0 (opens in a new tab) | 2021 | Multitask Prompted Training Enables Zero-Shot Task Generalization |
| HyperCLOVA (opens in a new tab) | 2021 | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers |
| Codex (opens in a new tab) | 2021 | Evaluating Large Language Models Trained on Code |
| ERNIE 3.0 (opens in a new tab) | 2021 | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation |
| Jurassic-1 (opens in a new tab) | 2021 | Jurassic-1: Technical Details and Evaluation |
| FLAN (opens in a new tab) | 2021 | Finetuned Language Models Are Zero-Shot Learners |
| MT-NLG (opens in a new tab) | 2021 | Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model |
| Yuan 1.0 (opens in a new tab) | 2021 | Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning |
| WebGPT (opens in a new tab) | 2021 | WebGPT: Browser-assisted question-answering with human feedback |
| Gopher (opens in a new tab) | 2021 | Scaling Language Models: Methods, Analysis & Insights from Training Gopher |
| ERNIE 3.0 Titan (opens in a new tab) | 2021 | ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation |
| GLaM (opens in a new tab) | 2021 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts |
| InstructGPT (opens in a new tab) | 2022 | Training language models to follow instructions with human feedback |
| GPT-NeoX-20B (opens in a new tab) | 2022 | GPT-NeoX-20B: An Open-Source Autoregressive Language Model |
| AlphaCode (opens in a new tab) | 2022 | Competition-Level Code Generation with AlphaCode |
| CodeGen (opens in a new tab) | 2022 | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis |
| Chinchilla (opens in a new tab) | 2022 | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. |
| Tk-Instruct (opens in a new tab) | 2022 | Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks |
| UL2 (opens in a new tab) | 2022 | UL2: Unifying Language Learning Paradigms |
| PaLM (opens in a new tab) | 2022 | PaLM: Scaling Language Modeling with Pathways |
| OPT (opens in a new tab) | 2022 | OPT: Open Pre-trained Transformer Language Models |
| BLOOM (opens in a new tab) | 2022 | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
| GLM-130B (opens in a new tab) | 2022 | GLM-130B: An Open Bilingual Pre-trained Model |
| AlexaTM (opens in a new tab) | 2022 | AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model |
| Flan-T5 (opens in a new tab) | 2022 | Scaling Instruction-Finetuned Language Models |
| Sparrow (opens in a new tab) | 2022 | Improving alignment of dialogue agents via targeted human judgements |
| U-PaLM (opens in a new tab) | 2022 | Transcending Scaling Laws with 0.1% Extra Compute |
| mT0 (opens in a new tab) | 2022 | Crosslingual Generalization through Multitask Finetuning |
| Galactica (opens in a new tab) | 2022 | Galactica: A Large Language Model for Science |
| OPT-IML (opens in a new tab) | 2022 | OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization |
| LLaMA (opens in a new tab) | 2023 | LLaMA: Open and Efficient Foundation Language Models |
| GPT-4 (opens in a new tab) | 2023 | GPT-4 Technical Report |
| PanGu-Σ (opens in a new tab) | 2023 | PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing |
| BloombergGPT (opens in a new tab) | 2023 | BloombergGPT: A Large Language Model for Finance |
| PaLM 2 (opens in a new tab) | 2023 | A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. |