Close Menu
CatchTheBullCatchTheBull
  • Home
  • Crypto News
  • Bitcoin
  • Altcoin
  • Blockchain
  • Airdrops News
  • NFT News
What's Hot

INJ Price Prediction: Injective Eyes $3.26 Recovery Despite Bearish Momentum

March 27, 2026

Circle and Sasai Partner to Expand USDC Stablecoin Payments Across Africa – News Bytes Bitcoin News

March 27, 2026

Shiba Inu Could Have Made You A Billionaire: Here’s How

March 27, 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
CatchTheBullCatchTheBull
  • Home
  • Crypto News
  • Bitcoin
  • Altcoin
  • Blockchain
  • Airdrops News
  • NFT News
CatchTheBullCatchTheBull
Blockchain

NVIDIA Surpasses 1,000 TPS/User with Llama 4 Maverick and Blackwell GPUs

By WebDeskMay 23, 20253 Mins Read
NVIDIA Surpasses 1,000 TPS/User with Llama 4 Maverick and Blackwell GPUs
Share
Facebook Twitter LinkedIn Pinterest Email


Lawrence Jengar
May 23, 2025 02:10

NVIDIA achieves a world-record inference speed of over 1,000 TPS/user using Blackwell GPUs and Llama 4 Maverick, setting a new standard for AI model performance.





NVIDIA has set a new benchmark in artificial intelligence performance with its latest achievement, breaking the 1,000 tokens per second (TPS) per user barrier using the Llama 4 Maverick model and Blackwell GPUs. This accomplishment was independently verified by the AI benchmarking service Artificial Analysis, marking a significant milestone in large language model (LLM) inference speed.

Technological Advancements

The breakthrough was achieved on a single NVIDIA DGX B200 node equipped with eight NVIDIA Blackwell GPUs, which managed to handle over 1,000 TPS per user on the Llama 4 Maverick, a 400-billion-parameter model. This performance makes Blackwell the optimal hardware for deploying Llama 4, either for maximizing throughput or minimizing latency, reaching up to 72,000 TPS/server in high throughput configurations.

Optimization Techniques

NVIDIA implemented extensive software optimizations using TensorRT-LLM to fully utilize the Blackwell GPUs. The company also trained a speculative decoding draft model using EAGLE-3 techniques, resulting in a fourfold speed increase compared to previous baselines. These enhancements maintain response accuracy while boosting performance, leveraging FP8 data types for operations like GEMMs and Mixture of Experts, ensuring accuracy comparable to BF16 metrics.

Importance of Low Latency

In generative AI applications, balancing throughput and latency is crucial. For critical applications requiring rapid decision-making, NVIDIA’s Blackwell GPUs excel by minimizing latency, as demonstrated by the TPS/user record. The hardware’s ability to handle high throughput and low latency makes it ideal for various AI tasks.

Cuda Kernel and Speculative Decoding

NVIDIA optimized CUDA kernels for GEMMs, MoE, and Attention operations, utilizing spatial partitioning and efficient memory data loading to maximize performance. Speculative decoding was employed to accelerate LLM inference speed by using a smaller, faster draft model to predict speculative tokens, verified by the larger target LLM. This approach yields significant speed-ups, particularly when the draft model’s predictions are accurate.

Programmatic Dependent Launch

To further enhance performance, NVIDIA utilized Programmatic Dependent Launch (PDL) to reduce GPU idle time between consecutive CUDA kernels. This technique allows overlapping kernel execution, improving GPU utilization and eliminating performance gaps.

NVIDIA’s achievements underscore its leadership in AI infrastructure and data center technology, setting new standards for speed and efficiency in AI model deployment. The innovations in Blackwell architecture and software optimization continue to push the boundaries of what’s possible in AI performance, ensuring responsive, real-time user experiences and robust AI applications.

For more detailed information, visit the NVIDIA official blog.

Image source: Shutterstock


Credit: Source link

Previous ArticleGala Games Launches ‘VEXI at Work’ Leaderboard Event with $GALA Rewards
Next Article Ava Protocol Revolutionizes Agent-Driven Workflows with Verifiable Execution

Related Posts

INJ Price Prediction: Injective Eyes $3.26 Recovery Despite Bearish Momentum

March 27, 2026

Active Protection Mechanisms in Buy Programs — Redefining Stop-Loss and Deriving Exit Rules

March 27, 2026

Celo Hits 840K Daily Active Users One Year After Ethereum L2 Migration

March 26, 2026
Add A Comment
Leave A Reply Cancel Reply

Top Posts

INJ Price Prediction: Injective Eyes $3.26 Recovery Despite Bearish Momentum

March 27, 2026

Circle and Sasai Partner to Expand USDC Stablecoin Payments Across Africa – News Bytes Bitcoin News

March 27, 2026

Shiba Inu Could Have Made You A Billionaire: Here’s How

March 27, 2026

Subscribe to Updates

Get the latest Crypto, Blockchain and Airdrop News from us to Catch The Bull.

Advertisement Banner

Welcome to CatchTheBull, your trusted source for the latest Crypto News and Airdrops. We bring you real-time updates, expert insights, and opportunities to stay ahead in the crypto world. Discover trending projects, market analyses, and airdrop details all in one place.

Join us on this journey to navigate the ever-evolving blockchain universe!

Facebook X (Twitter) Instagram YouTube
Top Insights

Moonwell hit by governance attack — $1.08M at risk for $1,800 spend

Simon Gerovich Confirmed As A Bitcoin 2026 Speaker

How to Spot A Reputable Gaming Website

Get Informed

Subscribe to Updates

Get the latest Crypto, Blockchain and Airdrop News from us to Catch The Bull.

© 2026 CatchTheBull. All Rights Are Reserved.
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

Type above and press Enter to search. Press Esc to cancel.

  • bitcoinBitcoin(BTC)$67,694.00-2.48%
  • ethereumEthereum(ETH)$2,043.17-1.72%
  • tetherTether(USDT)$1.00-0.01%
  • binancecoinBNB(BNB)$619.65-1.51%
  • rippleXRP(XRP)$1.34-1.87%
  • usd-coinUSDC(USDC)$1.000.00%
  • solanaSolana(SOL)$84.97-3.44%
  • tronTRON(TRX)$0.313268-0.11%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.02-0.55%
  • dogecoinDogecoin(DOGE)$0.091063-0.48%
  • USDSUSDS(USDS)$1.000.00%
  • whitebitWhiteBIT Coin(WBT)$52.13-2.54%
  • bitcoin-cashBitcoin Cash(BCH)$465.190.05%
  • cardanoCardano(ADA)$0.251018-2.50%
  • HyperliquidHyperliquid(HYPE)$38.81-0.87%
  • leo-tokenLEO Token(LEO)$9.540.25%
  • chainlinkChainlink(LINK)$8.79-1.88%
  • moneroMonero(XMR)$333.60-1.08%
  • Ethena USDeEthena USDe(USDE)$1.00-0.01%
  • stellarStellar(XLM)$0.171872-0.43%
  • CantonCanton(CC)$0.1461126.00%
  • USD1USD1(USD1)$1.000.02%
  • daiDai(DAI)$1.00-0.01%
  • litecoinLitecoin(LTC)$54.55-0.63%
  • RainRain(RAIN)$0.008446-0.02%
  • hedera-hashgraphHedera(HBAR)$0.090437-0.55%
  • paypal-usdPayPal USD(PYUSD)$1.000.01%
  • avalanche-2Avalanche(AVAX)$8.94-3.70%
  • MemeCoreMemeCore(M)$2.10-9.28%
  • zcashZcash(ZEC)$219.23-1.16%
  • suiSui(SUI)$0.91-1.89%
  • shiba-inuShiba Inu(SHIB)$0.000006-1.68%
  • BittensorBittensor(TAO)$326.53-2.16%
  • crypto-com-chainCronos(CRO)$0.073442-0.50%
  • the-open-networkToncoin(TON)$1.25-3.50%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.097646-0.51%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • tether-goldTether Gold(XAUT)$4,427.040.11%
  • pax-goldPAX Gold(PAXG)$4,432.970.10%
  • mantleMantle(MNT)$0.68-2.86%
  • uniswapUniswap(UNI)$3.48-2.11%
  • polkadotPolkadot(DOT)$1.30-0.80%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Global DollarGlobal Dollar(USDG)$1.000.02%
  • okbOKB(OKB)$84.27-0.42%
  • Pi NetworkPi Network(PI)$0.177810-5.02%
  • Falcon USDFalcon USD(USDF)$1.00-0.10%
  • SkySky(SKY)$0.070700-2.25%
  • AsterAster(ASTER)$0.660.48%
  • aaveAave(AAVE)$105.15-1.35%