Collections
Discover the best community collections!
Collections including paper arxiv:2306.11644
-
Attention Is All You Need
Paper • 1706.03762 • Published • 108 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 20 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 248
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 2 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
AgentInstruct: Toward Generative Teaching with Agentic Flows
Paper • 2407.03502 • Published • 51 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 71 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 152 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 434 -
Muon is Scalable for LLM Training
Paper • 2502.16982 • Published • 8 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 99
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 152 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 434 -
Muon is Scalable for LLM Training
Paper • 2502.16982 • Published • 8 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 99
-
Attention Is All You Need
Paper • 1706.03762 • Published • 108 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 20 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 248
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 2 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
AgentInstruct: Toward Generative Teaching with Agentic Flows
Paper • 2407.03502 • Published • 51 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 71 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31