LLaMA 3.3 70B Versatile
LLaMA 3.3 70B Versatile
A state-of-the-art LLaMA model served on Groq’s hardware, optimized for general-purpose use with high speed.Use Cases:
- High-speed chat agents
- Real-time customer interaction bots
- Lightweight RAG-based knowledge assistants
- Ultra-fast response time (token streaming in milliseconds)
- Balanced accuracy and generation speed
- Great for user-facing experiences
LLaMA 3.1 8B Instant
LLaMA 3.1 8B Instant
Smaller LLaMA model optimized for extremely lightweight and instant responses.Use Cases:
- Instant query resolution
- Embedded LLM features in apps
- Low-cost multi-turn agents
- Minimal latency (ideal for mobile/web)
- Lightweight for cost-effective scaling
- Suitable for basic inferencing needs
LLaMA 4 Scout 17B (16E)
LLaMA 4 Scout 17B (16E)
Meta’s natively multimodal model using a Mixture-of-Experts (MoE) architecture, offering exceptional performance for its size.Use Cases:
- Multimodal assistants (Text + Image reasoning)
- High-speed coding and debugging tools
- Multilingual chat support (12+ languages)
- Native vision support (early fusion architecture)
- 128K context window for long-form analysis
- Optimized for “assistant-like” conversational flow
LLaMA 4 Maverick 17B (128E)
LLaMA 4 Maverick 17B (128E)
The high-capacity variant of the LLaMA 4 series, featuring a massive 128-expert MoE architecture for deeper reasoning.Use Cases:
- Complex decision-making and policy-based agents
- Enterprise-scale orchestration
- Knowledge-intensive research and reasoning
- Large-scale reasoning with 512K context support
- Superior coding and technical problem-solving
- Maintains sub-100ms latency on Groq hardware
GPT-oss-20B
GPT-oss-20B
An open-source GPT variant optimized for balanced performance and efficiency on Groq hardware.Use Cases: - General-purpose conversational agents
- Fast inference for customer support and FAQs
- Lightweight reasoning at scale
- Optimized for Groq inferencing speed
- Ideal for real-time interactive applications
GPT-oss-120B
GPT-oss-120B
A large-scale open-source model optimized for Groq, delivering deeper reasoning and broader coverage.Use Cases: - Knowledge-heavy assistants
- Multi-turn conversational flows
- Enterprise orchestration agents
- Supports complex queries with high accuracy
- Extremely low latency for a 100B+ parameter model
Moonshot Kimi K2 Instruct
Moonshot Kimi K2 Instruct
A frontier Mixture-of-Experts (MoE) model designed for autonomous agentic intelligence.Use Cases:
- Autonomous agents requiring tool-calling
- Complex multi-step reasoning tasks
- Interactive data visualization and frontend coding
- 256K context window for long-horizon tasks
- State-of-the-art “Agentic” reasoning and tool-use
- High-tier performance in math and technical logic
⚡ With Groq, agents in Lyzr get sub-100ms latency inferencing, making it ideal for real-time apps where user experience and responsiveness are critical.