Close Menu
CatchTheBullCatchTheBull
  • Home
  • Crypto News
  • Bitcoin
  • Altcoin
  • Blockchain
  • Airdrops News
  • NFT News
What's Hot

Want Shiba Inu to Lead? Watch Bitcoin and Ethereum First

March 10, 2026

APT Price Prediction: Targets $1.05-$1.24 by March End

March 10, 2026

TRON Joins Agentic AI Foundation to Support Open Infrastructure for Autonomous AI Systems

March 10, 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
CatchTheBullCatchTheBull
  • Home
  • Crypto News
  • Bitcoin
  • Altcoin
  • Blockchain
  • Airdrops News
  • NFT News
CatchTheBullCatchTheBull
Blockchain

NVIDIA CUDA 13.2 Expands Tile Programming to Ampere and Ada GPUs

By WebDeskMarch 9, 20263 Mins Read
NVIDIA CUDA 13.2 Expands Tile Programming to Ampere and Ada GPUs
Share
Facebook Twitter LinkedIn Pinterest Email


Iris Coleman
Mar 09, 2026 23:00

CUDA 13.2 extends tile-based GPU programming to older architectures, adds Python profiling tools, and delivers up to 5x speedups with new Top-K algorithms.





NVIDIA’s CUDA 13.2 release extends its tile-based programming model to Ampere and Ada architectures, bringing what the company calls its largest platform update in two decades to a significantly broader hardware base. The update also introduces native Python profiling capabilities and new algorithms delivering up to 5x performance improvements for specific workloads.

Previously limited to Blackwell-class GPUs, CUDA Tile now supports compute capability 8.X architectures (Ampere and Ada), alongside existing 10.X and 12.X support. NVIDIA indicated that a future toolkit release will extend full support to all GPU architectures starting with Ampere, potentially covering millions of deployed professional and consumer GPUs.

Python Gets First-Class Treatment

The release significantly expands Python tooling. cuTile Python, the DSL implementation of NVIDIA’s tile programming model, now supports recursive functions, closures with capture, lambda functions, and custom reduction operations. Installation has been simplified to a single pip command that pulls all dependencies without requiring a system-wide CUDA Toolkit installation.

A new profiling interface called Nsight Python brings kernel profiling directly to Python developers. Using decorators, developers can automatically configure, profile, and plot kernel performance comparisons across multiple configurations. The tool exposes performance data through standard Python data structures for custom analysis.

Perhaps more significant for debugging workflows: Numba-CUDA kernels can now be debugged on actual GPU hardware for the first time. Developers can set breakpoints, step through statements, and inspect program state using CUDA-GDB or Nsight Visual Studio Code Edition.

Algorithm Performance Gains

The CUDA Core Compute Libraries (CCCL) 3.2 release introduces several optimized algorithms. The new cub::DeviceTopK provides up to 5x speedups over full radix sort when selecting the K largest or smallest elements from a dataset—a common operation in recommendation systems and search applications.

Fixed-size segmented reduction shows even more dramatic improvements: up to 66x faster for small segment sizes and 14x for large segments compared to the existing offset-based implementation. The cuSOLVER library adds FP64-emulated calculations that leverage INT8 throughput, achieving up to 2x performance gains for QR factorization on B200 systems when matrix sizes approach 80K.

Enterprise and Embedded Updates

Windows compute drivers now default to MCDM instead of TCC mode starting with driver version R595. This change addresses compatibility issues where some systems displayed errors at startup. MCDM enables WSL2 support, native container compatibility, and advanced memory management APIs previously reserved for WDDM mode. NVIDIA acknowledged that MCDM currently has slightly higher submission latency than TCC and is working to close that gap.

For embedded systems, the same Arm SBSA CUDA Toolkit now works across all Arm targets, including Jetson Orin devices. Jetson Thor gains Multi-Instance GPU support, allowing the integrated GPU to be partitioned into two isolated instances—useful for robotics applications that need to separate safety-critical motor control from heavier perception workloads.

The toolkit is available now through NVIDIA’s developer portal. Developers using Ampere, Ada, or Blackwell GPUs can access the cuTile Python Quickstart guide to begin experimenting with tile-based programming.

Image source: Shutterstock


Credit: Source link

Previous ArticleXRP Sees Major Liquidity Expansion Across Daily Trading Activity – Here’s What Could Play Out Next
Next Article NVIDIA Megatron Core Gets Falcon-H1 Hybrid AI Architecture Support

Related Posts

APT Price Prediction: Targets $1.05-$1.24 by March End

March 10, 2026

AI Marketing Tools 2026 – From Content Bots to Autonomous Campaign Agents

March 10, 2026

NVIDIA Megatron Core Gets Falcon-H1 Hybrid AI Architecture Support

March 9, 2026
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Want Shiba Inu to Lead? Watch Bitcoin and Ethereum First

March 10, 2026

APT Price Prediction: Targets $1.05-$1.24 by March End

March 10, 2026

TRON Joins Agentic AI Foundation to Support Open Infrastructure for Autonomous AI Systems

March 10, 2026

Subscribe to Updates

Get the latest Crypto, Blockchain and Airdrop News from us to Catch The Bull.

Advertisement Banner

Welcome to CatchTheBull, your trusted source for the latest Crypto News and Airdrops. We bring you real-time updates, expert insights, and opportunities to stay ahead in the crypto world. Discover trending projects, market analyses, and airdrop details all in one place.

Join us on this journey to navigate the ever-evolving blockchain universe!

Facebook X (Twitter) Instagram YouTube
Top Insights

AI Marketing Tools 2026 – From Content Bots to Autonomous Campaign Agents

Hyperliquid Oil Futures Hit $1.2B Trading Volume Amid Middle East Warfare

NVIDIA Megatron Core Gets Falcon-H1 Hybrid AI Architecture Support

Get Informed

Subscribe to Updates

Get the latest Crypto, Blockchain and Airdrop News from us to Catch The Bull.

© 2026 CatchTheBull. All Rights Are Reserved.
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

Type above and press Enter to search. Press Esc to cancel.

  • bitcoinBitcoin(BTC)$70,843.003.10%
  • ethereumEthereum(ETH)$2,055.031.86%
  • tetherTether(USDT)$1.000.02%
  • binancecoinBNB(BNB)$646.921.87%
  • rippleXRP(XRP)$1.402.52%
  • usd-coinUSDC(USDC)$1.000.00%
  • solanaSolana(SOL)$87.142.61%
  • tronTRON(TRX)$0.284422-0.17%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.04-0.72%
  • dogecoinDogecoin(DOGE)$0.0966955.53%
  • whitebitWhiteBIT Coin(WBT)$56.002.01%
  • USDSUSDS(USDS)$1.000.00%
  • cardanoCardano(ADA)$0.2662803.28%
  • bitcoin-cashBitcoin Cash(BCH)$449.84-0.79%
  • leo-tokenLEO Token(LEO)$9.181.34%
  • HyperliquidHyperliquid(HYPE)$35.109.36%
  • chainlinkChainlink(LINK)$9.082.05%
  • moneroMonero(XMR)$346.050.75%
  • Ethena USDeEthena USDe(USDE)$1.000.01%
  • CantonCanton(CC)$0.1480180.27%
  • stellarStellar(XLM)$0.1617356.71%
  • USD1USD1(USD1)$1.000.03%
  • daiDai(DAI)$1.00-0.06%
  • RainRain(RAIN)$0.008837-0.50%
  • hedera-hashgraphHedera(HBAR)$0.0970031.89%
  • litecoinLitecoin(LTC)$54.280.25%
  • avalanche-2Avalanche(AVAX)$9.462.72%
  • paypal-usdPayPal USD(PYUSD)$1.000.02%
  • suiSui(SUI)$0.985.87%
  • zcashZcash(ZEC)$228.157.42%
  • shiba-inuShiba Inu(SHIB)$0.0000066.93%
  • the-open-networkToncoin(TON)$1.34-0.47%
  • crypto-com-chainCronos(CRO)$0.0768151.86%
  • tether-goldTether Gold(XAUT)$5,181.882.25%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.1025062.05%
  • pax-goldPAX Gold(PAXG)$5,227.572.37%
  • polkadotPolkadot(DOT)$1.510.69%
  • uniswapUniswap(UNI)$3.920.72%
  • MemeCoreMemeCore(M)$1.42-7.31%
  • mantleMantle(MNT)$0.704.09%
  • Pi NetworkPi Network(PI)$0.2205901.96%
  • okbOKB(OKB)$97.440.32%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • BittensorBittensor(TAO)$199.181.15%
  • SkySky(SKY)$0.0774374.92%
  • Falcon USDFalcon USD(USDF)$1.00-0.07%
  • AsterAster(ASTER)$0.70-0.54%
  • aaveAave(AAVE)$112.605.62%
  • Global DollarGlobal Dollar(USDG)$1.000.01%