Close Menu
CatchTheBullCatchTheBull
  • Home
  • Crypto News
  • Bitcoin
  • Altcoin
  • Blockchain
  • Airdrops News
  • NFT News
What's Hot

Top cryptocurrency tax tips to optimize your 2026 filing

March 21, 2026

Hyperliquid Season 3 Farming: A Complete Guide

March 21, 2026

Bitcoin Shows Steady Stream Of Outflows On Binance — What This Means

March 21, 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
CatchTheBullCatchTheBull
  • Home
  • Crypto News
  • Bitcoin
  • Altcoin
  • Blockchain
  • Airdrops News
  • NFT News
CatchTheBullCatchTheBull
Blockchain

NVIDIA Run:ai GPU Fractioning Delivers 77% Throughput at Half Allocation

By WebDeskFebruary 18, 20263 Mins Read
NVIDIA Run:ai GPU Fractioning Delivers 77% Throughput at Half Allocation
Share
Facebook Twitter LinkedIn Pinterest Email


Darius Baruo
Feb 18, 2026 18:31

NVIDIA and Nebius benchmarks show GPU fractioning achieves 86% user capacity on 0.5 GPU allocation, enabling 3x more concurrent users for mixed AI workloads.





NVIDIA’s Run:ai platform can deliver 77% of full GPU throughput using just half the hardware allocation, according to joint benchmarking with cloud provider Nebius released February 18. The results demonstrate that enterprises running large language model inference can dramatically expand capacity without proportional GPU investment.

The tests, conducted on clusters with 64 NVIDIA H100 NVL GPUs and 32 NVIDIA HGX B200 GPUs, showed fractional GPU scheduling achieving near-linear performance scaling across 0.5, 0.25, and 0.125 allocations.

Hard Numbers from Production Testing

At 0.5 GPU allocation, the system supported 8,768 concurrent users while maintaining time-to-first-token under one second—86% of the 10,200 users supported at full allocation. Token generation hit 152,694 tokens per second, compared to 198,680 at full capacity.

Smaller models pushed these gains further. Phi-4-Mini running on 0.25 GPU fractions handled 72% more concurrent users than full-GPU deployment, achieving approximately 450,000 tokens per second with P95 latency under 300 milliseconds on 32 GPUs.

The mixed workload scenario proved most striking. Running Llama 3.1 8B, Phi-4 Mini, and Qwen-Embeddings simultaneously on fractional allocations tripled total concurrent system users compared to single-model deployment. Combined throughput exceeded 350,000 tokens per second at full scale with no cross-model interference.

Why This Matters for GPU Economics

Traditional Kubernetes schedulers allocate whole GPUs to individual models, leaving substantial capacity stranded. The benchmarks noted that even Qwen3-14B, the largest model tested at 14 billion parameters, occupies only 35% of an H100 NVL’s 80GB capacity.

Run:ai’s scheduler eliminates this waste through dynamic memory allocation. Users specify requirements directly; the system handles resource distribution without preconfiguration. Memory isolation happens at runtime while compute cycles distribute fairly among active processes.

This timing coincides with broader industry moves toward GPU partitioning. SoftBank and AMD announced validation testing on February 16 for similar fractioning capabilities on AMD Instinct GPUs, where single GPUs can split into up to eight logical devices.

Autoscaling Without Latency Spikes

Nebius tested automatic scaling with Llama 3.1 8B configured to add GPUs when concurrent users exceeded 50. Replicas scaled from 1 to 16 with clean ramp-up, stable utilization during pod warm-up, and negligible HTTP errors.

The practical implication: enterprises can run multiple inference models on existing GPU inventory, scale dynamically during peak demand, and reclaim idle capacity during off-hours for other workloads. For organizations facing fixed GPU budgets, fractioning transforms capacity planning from hardware procurement into software configuration.

Run:ai v2.24 is available now. NVIDIA plans to discuss the Nebius implementation at GTC 2026.

Image source: Shutterstock


Credit: Source link

Previous ArticleLime co-founder Brad Bao named in $100M federal RICO lawsuit alleging “one of the largest crypto frauds in history”
Next Article FutureBit Debuts U.S.-Built Apollo III Home Bitcoin Miner

Related Posts

NEAR Price Prediction: Protocol Tests $1.38 Resistance as Bulls Eye March Breakout

March 21, 2026

XLM Price Prediction: Stellar Targets $0.18-$0.20 Range by April 2026

March 21, 2026

TRX Price Prediction: TRON Targets $0.35 Breakout Amid Overbought Signals

March 21, 2026
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Top cryptocurrency tax tips to optimize your 2026 filing

March 21, 2026

Hyperliquid Season 3 Farming: A Complete Guide

March 21, 2026

Bitcoin Shows Steady Stream Of Outflows On Binance — What This Means

March 21, 2026

Subscribe to Updates

Get the latest Crypto, Blockchain and Airdrop News from us to Catch The Bull.

Advertisement Banner

Welcome to CatchTheBull, your trusted source for the latest Crypto News and Airdrops. We bring you real-time updates, expert insights, and opportunities to stay ahead in the crypto world. Discover trending projects, market analyses, and airdrop details all in one place.

Join us on this journey to navigate the ever-evolving blockchain universe!

Facebook X (Twitter) Instagram YouTube
Top Insights

Tokenization Hearing Confirmed, CLARITY Act Stablecoin Deal Done “In Principle”: Big Week for Crypto

BTC Price Holds $70K as Analysts Spot Cycle Reset Signs

What AI Says About SHIB If ETF Passes Will Surprise You

Get Informed

Subscribe to Updates

Get the latest Crypto, Blockchain and Airdrop News from us to Catch The Bull.

© 2026 CatchTheBull. All Rights Are Reserved.
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

Type above and press Enter to search. Press Esc to cancel.

  • bitcoinBitcoin(BTC)$70,485.000.59%
  • ethereumEthereum(ETH)$2,155.931.07%
  • tetherTether(USDT)$1.00-0.01%
  • rippleXRP(XRP)$1.440.38%
  • binancecoinBNB(BNB)$642.430.43%
  • usd-coinUSDC(USDC)$1.000.00%
  • solanaSolana(SOL)$90.141.28%
  • tronTRON(TRX)$0.3106030.23%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.00-0.28%
  • dogecoinDogecoin(DOGE)$0.0942580.66%
  • USDSUSDS(USDS)$1.000.00%
  • whitebitWhiteBIT Coin(WBT)$54.98-0.30%
  • cardanoCardano(ADA)$0.2661290.89%
  • HyperliquidHyperliquid(HYPE)$40.252.80%
  • bitcoin-cashBitcoin Cash(BCH)$468.020.04%
  • leo-tokenLEO Token(LEO)$9.240.19%
  • chainlinkChainlink(LINK)$9.100.87%
  • moneroMonero(XMR)$347.651.60%
  • Ethena USDeEthena USDe(USDE)$1.000.01%
  • CantonCanton(CC)$0.1453350.39%
  • stellarStellar(XLM)$0.1656440.98%
  • USD1USD1(USD1)$1.000.01%
  • litecoinLitecoin(LTC)$55.970.37%
  • daiDai(DAI)$1.000.01%
  • avalanche-2Avalanche(AVAX)$9.530.16%
  • RainRain(RAIN)$0.008585-5.63%
  • paypal-usdPayPal USD(PYUSD)$1.00-0.01%
  • hedera-hashgraphHedera(HBAR)$0.0932700.70%
  • zcashZcash(ZEC)$232.620.22%
  • suiSui(SUI)$0.960.56%
  • shiba-inuShiba Inu(SHIB)$0.0000061.37%
  • crypto-com-chainCronos(CRO)$0.0752150.41%
  • the-open-networkToncoin(TON)$1.261.30%
  • MemeCoreMemeCore(M)$1.63-2.84%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0972554.86%
  • BittensorBittensor(TAO)$274.012.66%
  • tether-goldTether Gold(XAUT)$4,494.81-0.01%
  • polkadotPolkadot(DOT)$1.50-0.32%
  • mantleMantle(MNT)$0.750.63%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • uniswapUniswap(UNI)$3.580.16%
  • pax-goldPAX Gold(PAXG)$4,508.61-0.07%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Pi NetworkPi Network(PI)$0.1979283.72%
  • okbOKB(OKB)$88.06-0.45%
  • SkySky(SKY)$0.0787095.71%
  • Global DollarGlobal Dollar(USDG)$1.000.01%
  • Falcon USDFalcon USD(USDF)$1.00-0.02%
  • nearNEAR Protocol(NEAR)$1.320.25%
  • aaveAave(AAVE)$112.101.50%