Llama 3.1 405B Instruct Model Card

Pricing About Us Blog Docs

Interested in understanding the performance of Llama 3.1 405B Instruct in production? Get a demo of Context.ai

Overview

	Llama 3.1 405B Instruct
Provider The entity that provides this model.	Meta
Input Context Window The number of tokens supported by the input context window.	128K tokens
Maximum Output Tokens The number of tokens that can be generated by the model in a single request.	2,048 tokens
Open Source Whether the model's code is available for public use.	Yes
Release Date When the model was first released.	July 23rd, 2024 8 months ago 2024-07-23
Knowledge Cut-off Date When the model's knowledge was last updated.	December 2023
API Providers The providers that offer this model. (This is not an exhaustive list.)	Azure AI, AWS Bedrock, Google Cloud Vertex AI Model Garden, NVIDIA NIM, IBM watsonx
Empirical Throughput The number of tokens the model can generate per second.	Unknown

Throughput Comparison

Pricing

	Llama 3.1 405B Instruct
Input Cost of input data provided to the model.	Pricing not available.
Output Cost of output tokens generated by the model.	Pricing not available.

Input Token Price

Output Token Price

Benchmarks

	Llama 3.1 405B Instruct
MMLU Evaluating LLM knowledge acquisition in zero-shot and few-shot settings.	85.2 (5-shot) Source
MMMU A wide ranging multi-discipline and multimodal benchmark.	Benchmark not available.
HellaSwag A challenging sentence completion benchmark.	Benchmark not available.
HumanEval A benchmark to measure functional correctness for synthesizing programs from docstrings.	Benchmark not available.
MATH Benchmark performance on Math problems ranging across 5 levels of difficulty and 7 sub-disciplines.	73.8 (0-shot) Source

Measure & Improve LLM Product Performance.