llama-4-scout

llama-4-scout

Lightweight MoE Open-Source Model
2025-05-07
LLM
Model capability: function_call
Input:
$0.5/1M tokens
Output:
$0.5/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Llama-4-Scout-17B-16E-Instruct is an efficient Mixture-of-Experts (MoE) language model released by unsloth. Its core positioning is as a lightweight MoE flagship characterized by "small activation, big capability," balancing high performance with low inference costs.

  • MoE Architecture Design: With a total parameter size of 17 billion and 16 activated experts, it activates only a subset of parameters during inference, significantly reducing computational overhead.
  • Ultra-long Context Support: Natively supports context lengths of up to 128K tokens, making it ideal for scenarios such as long-document understanding and multi-turn complex dialogues.
  • Instruction Fine-tuning Optimization: Specifically trained for high-quality instruction following, delivering more accurate and reliable outputs in tasks such as logic reasoning, creative writing, and question answering.
  • Efficient Inference Acceleration: Deeply optimized by unsloth, it supports FlashAttention and INT4 quantization, enabling smooth operation even on consumer-grade GPUs.
  • Open-source and Commercially Usable: Released under a permissive license, it supports both research and commercial deployment, with a complete inference and fine-tuning toolchain provided as part of the package.

───────────────────────────────────────────────────────────────────

Core Capabilities

High-energy-efficiency inference: The MoE architecture achieves "large-model capabilities at small-model costs," delivering higher intelligence density per unit of compute power.

🧠 Precise task execution: After meticulous alignment training, it can accurately understand and execute fine-grained instructions covering format, style, and logic.

🧩 Strong ability to handle long texts: Maintains information coherence and captures key details even in ultra-long contexts, avoiding forgetting or distortion.

🛠️ Developer-friendly rapid adoption: Natively compatible with the Hugging Face ecosystem, paired with the unsloth acceleration library, boosting fine-tuning and deployment efficiency manifold.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(LLaMA4)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

llama-4-scout

-
128000

Input$0.5 / 1M tokens
Output$0.5 / 1M tokens

Input$0.5/ 1M tokens
Output$0.5/ 1M tokens
Original Price