Close Menu
CatchTheBullCatchTheBull
  • Home
  • Crypto News
  • Bitcoin
  • Altcoin
  • Blockchain
  • Airdrops News
  • NFT News
What's Hot

Leading Bitcoin DeFi Projects for Backers (2026)

February 4, 2026

XRP Just Hit A Golden Pocket, Relief Bounce Puts Price At $2.5

February 4, 2026

VeChain Gains Zero, Falls 97% From Peak: Is It At Its Bottom?

February 4, 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
CatchTheBullCatchTheBull
  • Home
  • Crypto News
  • Bitcoin
  • Altcoin
  • Blockchain
  • Airdrops News
  • NFT News
CatchTheBullCatchTheBull
Blockchain

Together AI Launches DSGym Framework for Training Data Science AI Agents

By WebDeskJanuary 26, 20262 Mins Read
Together AI Launches DSGym Framework for Training Data Science AI Agents
Share
Facebook Twitter LinkedIn Pinterest Email


Rebeca Moen
Jan 26, 2026 23:09

Together AI’s DSGym framework benchmarks LLM agents on 90+ bioinformatics tasks and 92 Kaggle competitions. Their 4B parameter model matches larger rivals.





Together AI has released DSGym, a comprehensive framework for evaluating and training AI agents designed to perform data science tasks autonomously. The framework includes over 90 bioinformatics challenges and 92 Kaggle competition datasets, providing standardized benchmarks that address fragmentation issues plaguing existing evaluation methods.

The standout claim: Together AI’s 4 billion parameter model, trained using DSGym’s synthetic trajectory generation, achieves performance competitive with models 50 times its size on certain benchmarks.

Benchmark Results Show Surprising Efficiency

The published benchmarks reveal interesting performance dynamics across model sizes. Together AI’s Qwen3-4B-DSGym-SFT-2k model—fine-tuned using the framework—scored 59.36% on QRData-Verified and 77.78% on DABStep-easy tasks. That puts it ahead of the base Qwen3-4B-Instruct model (45.27% and 58.33% respectively) and competitive with models like Deepseek-v3.1 and GPT-OSS-120B on several metrics.

Claude 4.5 Sonnet currently leads the pack on harder tasks, hitting 37.04% on DABStep-hard compared to the fine-tuned 4B model’s 33.07%. But the gap narrows considerably given the massive difference in model scale.

Kimi-K2-Instruct posted the highest QRData-Verified score at 63.68%, while GPT-4o achieved 92.26% on DAEval-Verified—suggesting different architectures excel at different task types.

Why This Matters for AI Development

DSGym tackles a real problem in the AI agent space. Current benchmarks suffer from inconsistent evaluation interfaces and limited task diversity, making it difficult to compare agent performance meaningfully. The framework’s modular architecture allows researchers to add new tasks, agent scaffolds, and tools without rebuilding from scratch.

The execution-verified data synthesis pipeline is particularly notable. Rather than training on static datasets, the system generates synthetic training trajectories that are validated through actual code execution—reducing the garbage-in-garbage-out problem that hampers many AI training pipelines.

For companies building AI-powered data analysis tools, DSGym provides a standardized way to measure progress. The bioinformatics focus (DSBio) and prediction task coverage (DSPredict) extend beyond generic coding benchmarks into domain-specific applications where AI agents could deliver real productivity gains.

What’s Next

The framework is positioned as an evolving testbed rather than a static benchmark suite. Together AI has emphasized the extensibility angle, suggesting they’ll continue adding task categories and evaluation metrics. With AI agent development accelerating across the industry, having a common evaluation standard could help separate genuine capability improvements from benchmark gaming—though that’s always easier said than done.

Image source: Shutterstock


Credit: Source link

Previous ArticleJapan Set to Approve First Crypto ETFs by 2028
Next Article US Institutions Step Back From Ethereum: Coinbase Premium Flashes Caution

Related Posts

AAVE Price Prediction: Targets $137-142 by February Despite Current Bearish Momentum

February 4, 2026

LDO Price Prediction: Targets $0.53-$0.75 Recovery by March 2026

February 4, 2026

Tether Posts $10B Profit in 2025, Treasury Holdings Hit $141B

February 3, 2026
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Leading Bitcoin DeFi Projects for Backers (2026)

February 4, 2026

XRP Just Hit A Golden Pocket, Relief Bounce Puts Price At $2.5

February 4, 2026

VeChain Gains Zero, Falls 97% From Peak: Is It At Its Bottom?

February 4, 2026

Subscribe to Updates

Get the latest Crypto, Blockchain and Airdrop News from us to Catch The Bull.

Advertisement Banner

Welcome to CatchTheBull, your trusted source for the latest Crypto News and Airdrops. We bring you real-time updates, expert insights, and opportunities to stay ahead in the crypto world. Discover trending projects, market analyses, and airdrop details all in one place.

Join us on this journey to navigate the ever-evolving blockchain universe!

Facebook X (Twitter) Instagram YouTube
Top Insights

XRP Down 56% From 2025 All-Time High: Buy Now For Big Gains?

Nvidia’s $20B OpenAI Push & The Rise of SUBBD Token ($SUBBD)

DitGold’s DITAU Token to Begin Spot Trading on Biconomy

Get Informed

Subscribe to Updates

Get the latest Crypto, Blockchain and Airdrop News from us to Catch The Bull.

© 2026 CatchTheBull. All Rights Are Reserved.
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

Type above and press Enter to search. Press Esc to cancel.

  • bitcoinBitcoin(BTC)$75,060.00-3.46%
  • ethereumEthereum(ETH)$2,200.49-3.65%
  • tetherTether(USDT)$1.00-0.05%
  • binancecoinBNB(BNB)$740.68-3.98%
  • rippleXRP(XRP)$1.56-2.46%
  • usd-coinUSDC(USDC)$1.00-0.01%
  • solanaSolana(SOL)$93.96-8.06%
  • tronTRON(TRX)$0.2843010.44%
  • staked-etherLido Staked Ether(STETH)$2,261.91-3.75%
  • dogecoinDogecoin(DOGE)$0.105387-1.70%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.040.14%
  • whitebitWhiteBIT Coin(WBT)$53.995.83%
  • cardanoCardano(ADA)$0.291000-2.50%
  • bitcoin-cashBitcoin Cash(BCH)$521.54-0.46%
  • Wrapped stETHWrapped stETH(WSTETH)$2,773.10-3.50%
  • USDSUSDS(USDS)$1.00-0.10%
  • wrapped-bitcoinWrapped Bitcoin(WBTC)$76,114.00-3.34%
  • Binance Bridged USDT (BNB Smart Chain)Binance Bridged USDT (BNB Smart Chain)(BSC-USD)$1.00-0.01%
  • wrapped-beacon-ethWrapped Beacon ETH(WBETH)$2,461.67-3.85%
  • leo-tokenLEO Token(LEO)$8.840.37%
  • HyperliquidHyperliquid(HYPE)$33.36-6.48%
  • Wrapped eETHWrapped eETH(WEETH)$2,462.49-3.64%
  • moneroMonero(XMR)$387.002.92%
  • CantonCanton(CC)$0.182131-4.21%
  • chainlinkChainlink(LINK)$9.38-2.63%
  • Ethena USDeEthena USDe(USDE)$1.000.02%
  • Coinbase Wrapped BTCCoinbase Wrapped BTC(CBBTC)$76,331.00-3.26%
  • stellarStellar(XLM)$0.171466-2.82%
  • USD1USD1(USD1)$1.000.08%
  • WETHWETH(WETH)$2,263.38-3.80%
  • litecoinLitecoin(LTC)$59.17-1.03%
  • zcashZcash(ZEC)$272.05-4.48%
  • USDT0USDT0(USDT0)$1.00-0.13%
  • sUSDSsUSDS(SUSDS)$1.08-0.06%
  • daiDai(DAI)$1.000.00%
  • avalanche-2Avalanche(AVAX)$9.84-2.41%
  • suiSui(SUI)$1.10-3.47%
  • hedera-hashgraphHedera(HBAR)$0.0916510.55%
  • shiba-inuShiba Inu(SHIB)$0.000007-2.75%
  • Ethena Staked USDeEthena Staked USDe(SUSDE)$1.220.07%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.1325932.15%
  • paypal-usdPayPal USD(PYUSD)$1.000.03%
  • tether-goldTether Gold(XAUT)$4,977.721.48%
  • the-open-networkToncoin(TON)$1.38-0.35%
  • crypto-com-chainCronos(CRO)$0.081957-0.64%
  • RainRain(RAIN)$0.008877-6.10%
  • MemeCoreMemeCore(M)$1.48-2.92%
  • polkadotPolkadot(DOT)$1.47-3.16%
  • uniswapUniswap(UNI)$3.77-3.80%
  • mantleMantle(MNT)$0.70-2.91%