Tony Kim
Jun 12, 2026 21:58
NVIDIA GB300 NVL72 leads AgentPerf benchmark, running 20x more AI agents per megawatt than Hopper. Here’s why it matters for AI infrastructure.
NVIDIA’s Blackwell GB300 NVL72 platform has emerged as the top performer in the inaugural AgentPerf benchmark, designed by Artificial Analysis to evaluate infrastructure for agentic AI workloads. According to results published on June 12, 2026, the GB300 NVL72 runs up to 20 times more agents per megawatt of power compared to NVIDIA’s Hopper architecture. This efficiency leap underscores Blackwell’s potential to redefine AI infrastructure for enterprises scaling agentic systems.
Agentic AI differs fundamentally from traditional conversational AI. Instead of single-turn interactions, agentic workloads involve complex, multi-step tasks where agents chain together dozens or even hundreds of large language model (LLM) calls, integrating tools like database searches, code execution, or web browsing at each step. This complexity makes conventional AI benchmarks inadequate, as they focus on isolated LLM performance rather than the real-world demands of continuous, tool-augmented workflows.
Why NVIDIA Blackwell Dominated
Key to Blackwell’s dominance is its full-stack optimization. The GB300 NVL72 integrates 72 GPUs in a single rack-scale system, enabling efficient distribution of large mixture-of-experts (MoE) models like DeepSeek V4 Pro, which powers leading agentic applications. NVIDIA’s CUDA kernels and TensorRT LLM software further enhance performance by overlapping compute and communication tasks, minimizing latency and improving scalability as concurrent agent sessions grow.
AgentPerf’s methodology emphasizes this scalability. The benchmark simulates real-world agentic coding tasks, measuring how many tasks a system can support simultaneously while maintaining stringent performance thresholds for responsiveness and output speed. Blackwell’s ability to sustain high concurrency levels while meeting these thresholds highlights its edge in delivering cost-efficient, high-performance AI infrastructure.
Implications for AI Infrastructure
The results of this benchmark are significant for enterprises deploying AI at scale. As inference workloads grow, the cost and energy efficiency of running agentic tasks—measured in agents per megawatt—becomes a critical metric. For companies investing in AI infrastructure, NVIDIA Blackwell’s performance directly translates to lower operational costs and higher productivity per dollar spent.
The broader market context further underscores this point. With inference already accounting for a growing share of AI infrastructure expenditure, tools like AgentPerf shift the industry’s focus from raw model quality to infrastructure capability. This transition mirrors the challenges enterprises face in scaling agentic systems, where orchestration, memory management, and deployment topology often determine success.
Adoption and Ecosystem Integration
Early adopters of NVIDIA Blackwell include companies like Together AI and DeepInfra, which are leveraging its performance to power real-world agentic applications. Together AI, for instance, uses Blackwell to support Cursor, a coding platform where agents debug, refactor, and generate code in real time. Similarly, DeepInfra powers Pam.ai, an AI workforce platform for car dealerships, which uses agents to handle tasks like scheduling and sales outreach.
As NVIDIA continues to refine its software stack and introduces new architectures like Vera Rubin, the company aims to further enhance performance for agentic workloads. Given the benchmark results, Blackwell appears poised to set the standard for next-generation AI infrastructure.
The AgentPerf benchmark is a turning point, emphasizing the importance of workload-specific evaluation in AI deployments. For enterprises, the ability to run more agents with less power could be the difference between scalable success and operational bottlenecks in the era of agentic AI.
Image source: Shutterstock
Credit: Source link



















