See How Visible Your Brand is in AI Search Get Free Report

Alibaba’s AIDC-AI releases Ovis 2.5 on Hugging Face, targeting Charts, STEM tasks, and video understanding

  • August 22, 2025
    Updated
alibabas-aidc-ai-releases-ovis-2-5-on-hugging-face-targeting-charts-stem-tasks-and-video-understanding

⏳ In Brief

  • Ovis 2.5 released, with 2B and 9B variants for developers.
  • Uses NaViT for native-resolution image processing, preserves fine details.
  • Adds reflective reasoning training for stronger multimodal chains.
  • Positions for STEM, charts, grounding, and video tasks.
  • Available on Hugging Face, Apache-2.0 licensed.


Alibaba Releases Ovis 2.5 For Native-Resolution Vision

Ovis 2.5 is the latest open multimodal release from Alibaba’s AIDC-AI team, adding native-resolution vision and improved reasoning for images, text, and video across compact 2B and mainstream 9B configurations aimed at developers.

The models are promoted for dense visual content, including charts, documents, and diagrams, with the repository noting leading results under 40B parameters across STEM, chart analysis, grounding, and video understanding benchmarks.


Models, Availability, And Licensing

Two builds, Ovis 2.5-2B and Ovis 2.5-9B, are live on Hugging Face, with demo Spaces available for quick trials and early evaluation in browser environments for image plus text prompts.

Both releases follow an Apache-2.0 license through the main GitHub repository, which also documents the August 15, 2025 release timeline and a history of earlier Ovis series updates for context.


How Ovis 2.5 Works

At the core is NaViT, a native-resolution vision transformer that processes images at their original sizes, which avoids tiling, keeps global layout, and retains fine local details that matter for charts and technical figures.

Training adds reflective reasoning, beyond linear chain-of-thought, encouraging self-checking and revision so the model can improve multimodal answers when handling complex visual prompts that combine structure and text extraction.

How Ovis 2.5 Works

Key Technical Stats

  • NaViT image encoder for variable resolutions
  • 2B and 9B vision-language variants
  • Reflective reasoning plus standard CoT
  • Targeted at charts, STEM, grounding, video


Benchmarks And Intended Use

Project notes highlight strong OpenCompass-style results for under-40B models, particularly on STEM reasoning, chart QA, and grounding. The team positions Ovis 2.5 as competitive among compact open VLMs for developer experimentation.

Inference guides are being updated in the repository. The Hugging Face cards and Space enable quick image-question tests, which help teams gauge quality before integrating Ovis 2.5 into prototyping pipelines or internal tools.

For enterprise buyers, compact architectures, permissive licensing, and browser-based demos reduce trial friction, while NaViT helps preserve fine detail on invoices, charts, and posters where down-sampling would otherwise erase critical signals.


Leadership Voice, On The Release

“Ovis 2.5 processes images at their original, variable resolutions, preserving fine details and global layout, which is crucial for visually dense content.” — AIDC-AI Ovis Team, model card.

A team statement also outlines reflective reasoning training to improve self-correction on hard problems, tying the approach to better multimodal reliability in developer use, including documents and structured graphics.


Implementation Notes And Open Questions

Early adopters should validate OCR behaviour, chart grounding, and stepwise reasoning on in-house samples, then compare latency and costs against existing VLMs under realistic sequence lengths and concurrent users.

Practical steps include testing NaViT at native sizes, checking tokenisation of embedded text, and evaluating reflective prompts for measurable gains, before committing to production workloads that depend on fine visual fidelity.


Conclusion

Ovis 2.5 advances open multimodal tooling with NaViT for native detail and reflective training, giving developers credible VLM choices beyond heavyweight stacks, while keeping licensing and distribution friendly for rapid trials.

If the models maintain accuracy on dense documents, complex charts, and mixed video, they could become a practical baseline for teams seeking compact, open alternatives that still handle high-detail visual reasoning.


📈 Trending News

18th August 2025

For more AI stories, visit AI News on our site.

Was this article helpful?
YesNo
Generic placeholder image
Articles written 948

Khurram Hanif

Reporter, AI News

Khurram Hanif, AI Reporter at AllAboutAI.com, covers model launches, safety research, regulation, and the real-world impact of AI with fast, accurate, and sourced reporting.

He’s known for turning dense papers and public filings into plain-English explainers, quick on-the-day updates, and practical takeaways. His work includes live coverage of major announcements and concise weekly briefings that track what actually matters.

Outside of work, Khurram squads up in Call of Duty and spends downtime tinkering with PCs, testing apps, and hunting for thoughtful tech gear.

Personal Quote

“Chase the facts, cut the noise, explain what counts.”

Highlights

  • Covers model releases, safety notes, and policy moves
  • Turns research papers into clear, actionable explainers
  • Publishes a weekly AI briefing for busy readers

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *