⏳ In Brief
- Ovis 2.5 released, with 2B and 9B variants for developers.
- Uses NaViT for native-resolution image processing, preserves fine details.
- Adds reflective reasoning training for stronger multimodal chains.
- Positions for STEM, charts, grounding, and video tasks.
- Available on Hugging Face, Apache-2.0 licensed.
Alibaba Releases Ovis 2.5 For Native-Resolution Vision
Ovis 2.5 is the latest open multimodal release from Alibaba’s AIDC-AI team, adding native-resolution vision and improved reasoning for images, text, and video across compact 2B and mainstream 9B configurations aimed at developers.
The models are promoted for dense visual content, including charts, documents, and diagrams, with the repository noting leading results under 40B parameters across STEM, chart analysis, grounding, and video understanding benchmarks.
Models, Availability, And Licensing
Two builds, Ovis 2.5-2B and Ovis 2.5-9B, are live on Hugging Face, with demo Spaces available for quick trials and early evaluation in browser environments for image plus text prompts.
Both releases follow an Apache-2.0 license through the main GitHub repository, which also documents the August 15, 2025 release timeline and a history of earlier Ovis series updates for context.
How Ovis 2.5 Works
At the core is NaViT, a native-resolution vision transformer that processes images at their original sizes, which avoids tiling, keeps global layout, and retains fine local details that matter for charts and technical figures.
Training adds reflective reasoning, beyond linear chain-of-thought, encouraging self-checking and revision so the model can improve multimodal answers when handling complex visual prompts that combine structure and text extraction.

Key Technical Stats
- NaViT image encoder for variable resolutions
- 2B and 9B vision-language variants
- Reflective reasoning plus standard CoT
- Targeted at charts, STEM, grounding, video
Benchmarks And Intended Use
Project notes highlight strong OpenCompass-style results for under-40B models, particularly on STEM reasoning, chart QA, and grounding. The team positions Ovis 2.5 as competitive among compact open VLMs for developer experimentation.
Inference guides are being updated in the repository. The Hugging Face cards and Space enable quick image-question tests, which help teams gauge quality before integrating Ovis 2.5 into prototyping pipelines or internal tools.
For enterprise buyers, compact architectures, permissive licensing, and browser-based demos reduce trial friction, while NaViT helps preserve fine detail on invoices, charts, and posters where down-sampling would otherwise erase critical signals.
Leadership Voice, On The Release
“Ovis 2.5 processes images at their original, variable resolutions, preserving fine details and global layout, which is crucial for visually dense content.” — AIDC-AI Ovis Team, model card.
A team statement also outlines reflective reasoning training to improve self-correction on hard problems, tying the approach to better multimodal reliability in developer use, including documents and structured graphics.
Implementation Notes And Open Questions
Early adopters should validate OCR behaviour, chart grounding, and stepwise reasoning on in-house samples, then compare latency and costs against existing VLMs under realistic sequence lengths and concurrent users.
Practical steps include testing NaViT at native sizes, checking tokenisation of embedded text, and evaluating reflective prompts for measurable gains, before committing to production workloads that depend on fine visual fidelity.
Conclusion
Ovis 2.5 advances open multimodal tooling with NaViT for native detail and reflective training, giving developers credible VLM choices beyond heavyweight stacks, while keeping licensing and distribution friendly for rapid trials.
If the models maintain accuracy on dense documents, complex charts, and mixed video, they could become a practical baseline for teams seeking compact, open alternatives that still handle high-detail visual reasoning.
📈 Trending News
18th August 2025
- Ex-Twitter CEO Parag Agrawal launches his own AI startup
- Curio arrives with voice-driven AI Plush Toys
- Cognition raises $500 million after the Windsurf deal to scale AI coding
- Studio Atelico raises $5 million to build AI-first games
- Inside Woohoo: Where an AI chef designs menus and ambience
For more AI stories, visit AI News on our site.