
gemma-2-27b
Google's high-performing and efficient open-source model
2024-06-27
Input:
$0.18/1M tokens
Output:
$0.18/1M tokens
Bulk order? Contact your manager for exclusive deals
API Overview
Basic Information
Gemma-2-27B is an open-source large language model released by Google on June 27, 2024. It belongs to the Gemma 2 series, with a total of 27 billion parameters. Its training data comprises 13 trillion tokens (covering web documents, code, scientific texts, and more, predominantly in English), and it features an 8,192-token context window. Under the commercially friendly Gemma License, it can be used for commercial purposes free of charge. It is already publicly accessible via Google AI Studio, Kaggle, and Hugging Face, and will soon support deployment on Vertex AI. It is optimized for full-precision inference on a single NVIDIA A100/H100 GPU or Google Cloud TPU instance.
Core Features
- Leading performance: In the LMSYS Chatbot Arena, it outperforms larger models such as Llama 3 70B. It scores 75.2 points on the MMLU 5-shot benchmark, achieves a 51.8% pass rate on HumanEval at 1 shot, and attains a 74.0% accuracy rate on the GSM8K math problem set—demonstrating outstanding performance among open-source models of similar size.
- Excellent efficiency: Full-precision inference can run on a single GPU or TPU, significantly reducing costs compared to previous generations. It also supports 4-bit and 8-bit quantization, making it suitable for devices such as gaming laptops and desktop computers.
- Enhanced security: The training data has been filtered using CSAM and sensitive information detection techniques, meeting policy thresholds in benchmarks for content safety and harm characterization. SynthID text watermarking technology will be integrated in future updates.
Technical Highlights
- The model adopts a redesigned architecture that alternates between local sliding windows (4,096 tokens) and global attention (8,192 tokens), striking a balance between capturing fine-grained details and achieving global understanding.
- It supports compatibility with multiple frameworks, including Hugging Face Transformers, JAX, PyTorch, TensorFlow (Keras 3.0), and vLLM. It is optimized for NVIDIA hardware through integration with NVIDIA TensorRT-LLM.
- A Gemma Cookbook example library and an LLM Comparator evaluation tool are provided, supporting Keras and Hugging Face fine-tuning and lowering the barrier to entry for developers.
Applicable Scenarios
- Development: Code generation and debugging for small- and medium-sized projects, lightweight agent development, ideal for enterprises seeking cost-effective development solutions.
- Research and Education: As a foundational model for NLP research, literature summarization, and knowledge exploration; academic researchers can apply for Google Cloud credits to support their work.
- Commercial Applications: Customer service chatbots, text generation (marketing copy, emails), and RAG-based question answering—helping businesses rapidly implement AI solutions.
Playground
Log in to explore more features! Click to Log In
API Analytics
API Reference (1)
API Pricing
$¥ 円 ₽