llama3.2-90b

llama3.2-90b

An open-source model with powerful visual understanding capabilities
2024-09-25
LLM
Model capability: image
Input:
$2/1M tokens
Output:
$2/1M tokens
Bulk order? Contact your manager for exclusive deals

API Overview

Llama 3.2 90B Vision is Meta’s flagship multimodal language model, designed as an integrated intelligent engine that combines powerful visual understanding with versatile language capabilities.

  • Native Multimodal Architecture: Deeply integrates visual and language modules, enabling it to understand and reason about image content without any additional adaptation.
  • Ultra-Long Context Support: Supports up to 128K tokens of context, effortlessly handling long sequences that mix text and images.
  • Extensive Multilingual Coverage: Supports over 100 languages, catering to the localized expression needs of global users in text-and-image scenarios.
  • Efficient Inference Optimization: Achieves high throughput and low latency multimodal responses on mainstream GPUs such as A100 and H100.
  • Agent-Ready Design: Supports structured outputs and tool calls, making it suitable for scenarios like visual question answering, content moderation, and creative assistance.

───────────────────────────────────────────────────────────────────

Core Capabilities

👁️ Deep Visual Understanding: Not only can it recognize objects and scenes in images, but it can also reason about the relationships between text and images, interpret charts, and understand interface layouts.

🧠 Text-and-Image Joint Reasoning: Combines visual cues with textual instructions to accomplish complex tasks such as “writing code based on a screenshot” or “analyzing product images to generate copy.”

🌍 Multi-Language Text-and-Image Generation: Supports cross-language text-and-image description, translation, and creation, producing outputs that are both natural and culturally appropriate.

🧩 Seamless Agent Integration: Natively compatible with Function Calling and structured responses, allowing easy integration into automated multimodal workflows.

Playground

Log in to explore more features! Click to Log In

API Analytics

API Reference (1)

API DescriptionAPI EndpointRequest MethodStabilityParameter Description
Chat(LLaMA3.2 multimodal)
POST
Stable
View Details

API Pricing

$
ModelDescriptionContextOfficial Price302.AI Price

llama3.2-90b

-
131072

Input$2 / 1M tokens
Output$2 / 1M tokens

Input$2/ 1M tokens
Output$2/ 1M tokens
Original Price