Close Menu
CatchTheBullCatchTheBull
  • Home
  • Crypto News
  • Bitcoin
  • Altcoin
  • Blockchain
  • Airdrops News
  • NFT News
What's Hot

2 Powerful Reasons to Go Long on Shiba Inu Before the Next Rally

March 26, 2026

Circle unfreezes one wallet after controversial USDC freeze

March 26, 2026

Chainlink (LINK) Price Today: Live Data & Market Overview

March 26, 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
CatchTheBullCatchTheBull
  • Home
  • Crypto News
  • Bitcoin
  • Altcoin
  • Blockchain
  • Airdrops News
  • NFT News
CatchTheBullCatchTheBull
Blockchain

NVIDIA Launches Granary Dataset to Enhance Multilingual Speech AI

By WebDeskAugust 15, 20253 Mins Read
NVIDIA Launches Granary Dataset to Enhance Multilingual Speech AI
Share
Facebook Twitter LinkedIn Pinterest Email


Jessie A Ellis
Aug 15, 2025 09:01

NVIDIA introduces the Granary dataset and models designed to improve speech recognition and translation across 25 European languages, addressing data scarcity in AI language models.





NVIDIA has unveiled a new open dataset and models aimed at advancing multilingual speech AI, addressing the limited language support in existing AI language models. The Granary dataset, alongside the NVIDIA Canary and Parakeet models, seeks to enhance speech recognition and translation capabilities for 25 European languages, including underrepresented ones such as Croatian, Estonian, and Maltese, according to NVIDIA’s blog.

Granary Dataset: A New Resource for AI Developers

The Granary dataset is a comprehensive collection of multilingual speech datasets, encompassing approximately a million hours of audio. This includes nearly 650,000 hours dedicated to speech recognition and over 350,000 hours for speech translation. The dataset is accessible on Hugging Face, providing a valuable resource for developers to scale AI applications globally, facilitating the creation of multilingual chatbots, customer service voice agents, and real-time translation services.

Developed in collaboration with Carnegie Mellon University and Fondazione Bruno Kessler, the Granary dataset utilizes NVIDIA’s NeMo Speech Data Processor toolkit to transform unlabeled audio into structured, high-quality data. This innovative processing pipeline allows for enhanced public speech data without the need for extensive human annotation, making it a critical resource for AI training in the European Union’s official languages, plus Russian and Ukrainian.

Introducing NVIDIA Canary and Parakeet Models

The NVIDIA Canary-1b-v2 and Parakeet-tdt-0.6b-v3 models, trained on the Granary dataset, offer powerful tools for transcription and translation. Canary-1b-v2, a billion-parameter model, supports high-quality transcription of European languages and translation between English and 24 other languages. Meanwhile, Parakeet-tdt-0.6b-v3, with 600 million parameters, is optimized for real-time or large-volume transcription tasks.

Both models are designed to provide accurate punctuation, capitalization, and word-level timestamps in their outputs. Canary-1b-v2 is particularly notable for its efficiency, offering transcription and translation quality comparable to models three times its size, while running inference up to ten times faster.

Advancing Speech AI Innovation

By sharing the methodology behind Granary and its associated models, NVIDIA is empowering the global speech AI developer community to adapt similar data processing workflows to other automatic speech recognition (ASR) or automatic speech translation (AST) models, thereby accelerating innovation in the field. The models and dataset are publicly available under a permissive license, encouraging widespread use and adaptation.

The Granary dataset and NVIDIA’s new models represent a significant step forward in addressing the challenges of data scarcity in speech AI, particularly for languages that have been historically underrepresented in AI language models. This initiative not only broadens the scope of multilingual speech recognition and translation but also enhances the inclusivity and effectiveness of AI technologies globally.

The Granary dataset and models are available for exploration on Hugging Face, and further details can be accessed on NVIDIA’s blog.

Image source: Shutterstock


Credit: Source link

Previous ArticleBessent BTC U-Turn, Liquidations Ease, More
Next Article Ethereum’s Rise May Trigger a Massive Shiba Inu (SHIB) Rally

Related Posts

UNI Price Prediction: Uniswap Eyes $4.16 Resistance Test as Technical Indicators Show Mixed Signals

March 26, 2026

Operationalization of Moving Average Interaction Classification — Risk Systematization and Optimal Entry-Exit Point Derivation

March 26, 2026

GitHub Shifts Copilot Data Policy to Train AI on User Code by Default

March 25, 2026
Add A Comment
Leave A Reply Cancel Reply

Top Posts

2 Powerful Reasons to Go Long on Shiba Inu Before the Next Rally

March 26, 2026

Circle unfreezes one wallet after controversial USDC freeze

March 26, 2026

Chainlink (LINK) Price Today: Live Data & Market Overview

March 26, 2026

Subscribe to Updates

Get the latest Crypto, Blockchain and Airdrop News from us to Catch The Bull.

Advertisement Banner

Welcome to CatchTheBull, your trusted source for the latest Crypto News and Airdrops. We bring you real-time updates, expert insights, and opportunities to stay ahead in the crypto world. Discover trending projects, market analyses, and airdrop details all in one place.

Join us on this journey to navigate the ever-evolving blockchain universe!

Facebook X (Twitter) Instagram YouTube
Top Insights

CFTC’s first self-custody no-action letter signals new era for XRP derivatives

What’s Really Going On With Ripple’s XRP Ledger And Are Investors Coming Back?

GitHub Shifts Copilot Data Policy to Train AI on User Code by Default

Get Informed

Subscribe to Updates

Get the latest Crypto, Blockchain and Airdrop News from us to Catch The Bull.

© 2026 CatchTheBull. All Rights Are Reserved.
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

Type above and press Enter to search. Press Esc to cancel.

  • bitcoinBitcoin(BTC)$69,346.00-2.89%
  • ethereumEthereum(ETH)$2,071.46-4.95%
  • tetherTether(USDT)$1.00-0.03%
  • binancecoinBNB(BNB)$629.13-2.91%
  • rippleXRP(XRP)$1.37-3.91%
  • usd-coinUSDC(USDC)$1.000.00%
  • solanaSolana(SOL)$87.57-5.75%
  • tronTRON(TRX)$0.3121890.68%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.02-0.92%
  • dogecoinDogecoin(DOGE)$0.091293-5.89%
  • USDSUSDS(USDS)$1.000.00%
  • whitebitWhiteBIT Coin(WBT)$53.36-3.39%
  • cardanoCardano(ADA)$0.257202-6.03%
  • HyperliquidHyperliquid(HYPE)$39.17-4.62%
  • bitcoin-cashBitcoin Cash(BCH)$462.02-3.35%
  • leo-tokenLEO Token(LEO)$9.530.61%
  • chainlinkChainlink(LINK)$8.94-5.42%
  • moneroMonero(XMR)$336.64-0.61%
  • Ethena USDeEthena USDe(USDE)$1.00-0.04%
  • stellarStellar(XLM)$0.172774-2.70%
  • CantonCanton(CC)$0.138127-1.20%
  • USD1USD1(USD1)$1.00-0.09%
  • daiDai(DAI)$1.000.00%
  • litecoinLitecoin(LTC)$54.58-3.33%
  • RainRain(RAIN)$0.008386-1.06%
  • avalanche-2Avalanche(AVAX)$9.22-5.34%
  • hedera-hashgraphHedera(HBAR)$0.090942-4.41%
  • paypal-usdPayPal USD(PYUSD)$1.000.00%
  • MemeCoreMemeCore(M)$2.1420.84%
  • zcashZcash(ZEC)$220.83-6.91%
  • suiSui(SUI)$0.93-4.57%
  • shiba-inuShiba Inu(SHIB)$0.000006-4.87%
  • BittensorBittensor(TAO)$335.84-6.44%
  • the-open-networkToncoin(TON)$1.29-3.54%
  • crypto-com-chainCronos(CRO)$0.073369-3.06%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.098020-4.19%
  • tether-goldTether Gold(XAUT)$4,437.47-2.67%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • mantleMantle(MNT)$0.70-5.00%
  • pax-goldPAX Gold(PAXG)$4,442.72-2.94%
  • uniswapUniswap(UNI)$3.53-4.93%
  • polkadotPolkadot(DOT)$1.32-4.77%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Pi NetworkPi Network(PI)$0.187243-0.59%
  • Global DollarGlobal Dollar(USDG)$1.000.00%
  • okbOKB(OKB)$84.50-4.02%
  • Falcon USDFalcon USD(USDF)$1.000.00%
  • SkySky(SKY)$0.071548-5.42%
  • AsterAster(ASTER)$0.66-0.97%
  • aaveAave(AAVE)$106.45-8.05%