Top 10 Most Popular AI Models

Together AI Platform Rankings ยท February 2026

๐Ÿš€ Together AI Platform

๐Ÿ“ˆ Key Insights

TL;DR: Chinese models dominate with DeepSeek-R1 leading (87.5% AIME). Together AI hosts 200+ open-source models with focus on reasoning, coding, and agent capabilities.

10
Models
60%
Chinese Models
256K
Max Context
#1
DeepSeek-R1-0528
DeepSeekChat / Reasoning87.5% AIME

Upgraded DeepSeek-R1 with better reasoning, function calling, and coding. Uses 23K-token thinking to score 87.5% on AIME. The reasoning champion.

#2
Kimi K2 Instruct-0905
Moonshot AIChat / Agent1T params

State-of-the-art mixture-of-experts agentic intelligence model with 1T parameters, 256K context, and native tool use.

#3
Qwen3 235B A22B FP8
Alibaba/QwenChat / Reasoning232Bx22B MoE

Hybrid instruct + reasoning model optimized for high-throughput, cost-efficient inference and distillation.

#4
Qwen3 235B A22B Instruct
Alibaba/QwenChat262K context

235B MoE model with 22B activation featuring enhanced instruction following, reasoning, and 262K context.

#5
Llama 4 Maverick
MetaChat / Vision128-expert MoE

SOTA 128-expert MoE powerhouse for multilingual image/text understanding, creative writing, and enterprise-scale applications.

#6
Arcana v2
Together AIAudio / TTS300+ voices

Expressive TTS with 300+ voices across English, Spanish, French, German with multi-lingual code-switching and paralinguistic features.

#7
Gemma 3 27B
GoogleChat / Vision

Lightweight model with vision-language input, multilingual support, visual reasoning, and top-tier performance per size.

#8
DeepSeek-V3-0324
DeepSeekChat

DeepSeek's latest open Mixture-of-Experts model challenging top AI models at much lower cost.

#9
DeepSeek-V3.1
DeepSeekChat / Agent671B params

671B parameters (37B activated), 128K context, hybrid thinking/non-thinking modes, advanced tool calling, agent capabilities.

#10
Kimi K2.5
Moonshot AIChat / Agent / Vision50.2% HLE

1T parameter native multimodal thinking agent achieving 50.2% HLE with tools, Agent Swarm orchestration, vision-grounded coding, 256K context.