#AI horizons 25-04 – models releases

[ad_1]

I no longer view the AI model race as the primary source of value. Benchmarks, while useful, are often influenced by prompting techniques and should not be treated as definitive measures of quality. Moreover, it’s not the model but how you actually use it or, better yet, how you mix and match models in your solutions that brings value. That said, I’ve compiled this report to capture the notable AI model releases in April. Leading companies such as OpenAI and Meta introduced major new models, while Google, Microsoft, Alibaba, and others delivered upgrades focused on reasoning, multimodality, cost-efficiency, and deployment flexibility. This report begins with OpenAI’s comprehensive update and Meta’s new Llama 4 series.

Table of Contents

OpenAI Launches Five New Models: GPT-4.1 and the O-Series

OpenAI introduced a new family of general-purpose and reasoning-optimized models, including GPT-4.1, GPT-4.1 mini, GPT-4.1 nano, and two new reasoning models—o3 and o4-mini. The company plans to retire GPT-4.5 by July, making these newer models the cornerstone of its platform moving forward (OpenAI announcement).

GPT-4.1 Series

Expanded context window: Supports up to 1 million tokens, enabling users to process entire books, extensive codebases, and lengthy transcripts in a single pass.
Improved coding and instruction-following: GPT-4.1 scored 54.6% on SWE-bench Verified, outperforming GPT-4o by over 21 percentage points (OpenAI blog).
Lower latency and cost: GPT-4.1 mini offers similar quality to GPT-4o with half the latency and 83% lower costs, while GPT-4.1 nano is tuned for high-throughput, low-cost applications.
Model architecture and efficiency: These models were trained to better ignore irrelevant information and follow nuanced instructions more reliably.
Pricing: GPT-4.1 is priced at $2/$8 per million input/output tokens, with mini and nano tiers available at $0.40/$1.60 and $0.10/$0.40 respectively. Cached token discounts can reduce input costs by 75% (OpenAI pricing).

o3 and o4-mini: Tool-Enabled Reasoning Models

Multimodal tool use: These models integrate with external tools including code interpreters, web search, and image editors (OpenAI blog).
Performance: o3 scored 98.4% on AIME 2025 and a 2700+ ELO rating on Codeforces.
Flexible reasoning modes: o4-mini balances performance and cost, outperforming earlier reasoning models across most benchmarks.
Context and personalization: Both models support up to 200,000 tokens and adjust reasoning effort dynamically based on task complexity.
Deployment: Available via API and included in ChatGPT Plus, Pro, and Team plans.

Meta Debuts Llama 4 Series: Multimodal, Open, and Efficient

Meta released two new open-weight multimodal models—Llama 4 Scout and Llama 4 Maverick—with a third, Llama 4 Behemoth, still under internal testing. The models are available on llama.meta.com and Hugging Face.

Llama 4 Scout: A lightweight model with 17B active parameters designed to run on a single NVIDIA H100.
Llama 4 Maverick: A more powerful model that exceeds GPT-4o in several coding and reasoning benchmarks (Artificial Analysis (reference removed – original link unavailable)).
Architecture: All use a Mixture-of-Experts (MoE) architecture for efficiency.
Benchmarks: Maverick ranks high in GPQA Diamond, MMMU, and LiveCodeBench; Scout surpasses models like Mistral 3.1 and Gemini Flash Lite.
Pricing: Inference costs for Maverick range from $0.19–$0.495 per million tokens (Meta research note).

Meta claims Behemoth will outperform GPT-4.5 and Claude 3.7 on STEM benchmarks. However, developers noted discrepancies between publicly released and benchmarked versions, prompting criticism over transparency .

Google’s Gemini 2.5 Pro: Best-in-Class Reasoning, at a Price

Google launched Gemini 2.5 Pro, its flagship model, boasting state-of-the-art reasoning:

Multimodal capabilities: Supports text, images, audio, and video.
Benchmarks: Scored 86.7% on AIME 2025 and 84.0% on GPQA Diamond (Helicone Gemini analysis).
Context window: Up to 1 million tokens.
Adoption: Became Google’s most requested model, with an 80% increase in API traffic.
Pricing: $1.25/$10 per million tokens (input/output) for up to 200K tokens; $2.50/$15 beyond (Vertex AI pricing).

Google also previewed Gemini 2.5 Flash, a hybrid model with adjustable reasoning effort, ideal for latency-sensitive applications (Google AI Studio).

Additional Noteworthy Releases

Alibaba Qwen3: Open-source models with hybrid reasoning and support for 119 languages (Alibaba Qwen GitHub).
Zhipu AI GLM-4-32B: Open-weight models with strong results on coding and analysis (Zhipu AI).
IBM Granite Speech 3.3: A new speech-to-text model with multilingual support (IBM Research).
Microsoft BitNet b1.58: A highly efficient 1.58-bit quantized model (arXiv preprint).
Midjourney V7: Alpha release of its upgraded image model with improved draft, turbo, and relax modes (Midjourney changelog).

Why It Matters

April’s model launches underline key trends shaping the AI landscape:

Multimodality is foundational: All leading models now support multiple input types.
Agentic capabilities are emerging: With tool integration and long-context understanding, models are evolving into autonomous problem-solvers.
Open-source remains competitive: Meta, Alibaba, and Zhipu offer strong alternatives to closed models from OpenAI and Google.
Efficiency is a differentiator: Models like GPT-4.1 mini/nano and Llama 4 Scout highlight growing demand for performance at reduced cost.

For decision-makers, these advancements suggest a richer toolkit for AI integration. Organizations can now match model capabilities to use case needs—ranging from code generation and research to multimodal analysis and real-time conversation—all with greater flexibility in cost and deployment.

This entry was posted on May 7, 2025, 1:43 pm and is filed under AI. You can follow any responses to this entry through RSS 2.0.

You can leave a response, or trackback from your own site.

[ad_2]

#AI horizons 25-04 – models releases