Key Takeaways:
Microsoft isn’t resting its AI success on the laurels of its partnership with OpenAI.
Instead, the company, often known as Redmond for its headquarters in Washington state, has boldly released three new models in its evolving Phi series of language and multimodal AI.
These models are available for developers to download, use, and fine-tune on Hugging Face under a Microsoft-branded MIT License, allowing for unrestricted commercial usage and modification.
Really powerful model! But Nex still has unique advantage☺️
— Nex – AI Summarizer (100% FREE) (@nex_social) August 21, 2024
All three models boast near state-of-the-art performance across several third-party benchmarks, surpassing other AI providers, including Google’s Gemini 1.5 Flash, Meta’s Llama 3.1, and even OpenAI’s GPT-4o in some cases.
This performance and the permissive open license have garnered praise for Microsoft across social networks, particularly on X.
The Phi-3.5 Mini Instruct model is a lightweight AI model with 3.8 billion parameters, optimized for instruction adherence and supporting a 128k token context length.
The MoE model looks fascinating! we are excited to experiment with it.
🤞 Fingers crossed it won’t be as challenging to fine-tune as the Mistral MoE models, but we have a feeling it might be.
— Simform (@simform) August 21, 2024
This model is ideal for scenarios that demand strong reasoning capabilities in memory- or compute-constrained environments, such as code generation, mathematical problem-solving, and logic-based reasoning.
Despite its compact size, the Phi-3.5 Mini Instruct model demonstrates competitive performance in multilingual and multi-turn conversational tasks, reflecting significant improvements from its predecessors.
The Phi-3.5 MoE model, or Mixture of Experts, is the first in this model class from Microsoft, combining multiple different model types into one, each specializing in different tasks.
This model leverages an architecture with 42 billion active parameters and supports a 128k token context length, providing scalable AI performance for demanding applications.
The vision instruct model for multi-frame tasks seems extremely cool for its size! 😮
— Bram (@BramVanroy) August 20, 2024
However, according to Hugging Face documentation, it operates only with 6.6 billion active parameters.
The MoE model’s unique architecture allows it to maintain efficiency while handling complex AI tasks across multiple languages.
It impressively beats GPT-4o mini on the 5-shot MMLU (Massive Multitask Language Understanding) across subjects such as STEM, the humanities, and social sciences, at varying levels of expertise.
Completing the trio is the Phi-3.5 Vision Instruct model, which integrates both text and image processing capabilities.
This multimodal model is particularly suited for general image understanding, optical character recognition, chart and table comprehension, and video summarization.
Like the other models in the Phi-3.5 series, Vision Instruct supports a 128k token context length, enabling it to manage complex, multi-frame visual tasks.
Nice! Just when I get used to working with a vision model, a new one is deployed 🤩
— Mihai Chirculescu (@m_chirculescu) August 20, 2024
Microsoft highlights that this model was trained with synthetic and filtered publicly available datasets, focusing on high-quality, reasoning-dense data.
Training these models required massive computational resources.
The Phi-3.5 Mini Instruct model was trained on 3.4 trillion tokens using 512 H100-80G GPUs over ten days, while the Vision Instruct model was trained on 500 billion tokens using 256 A100-80G GPUs over six days.
Love it!
I find the Phi models to be perfect for my local gpu-poor setup (GTX3060).
— Frédéric Branchaud-Charron (@fbranchaud1) August 20, 2024
The Phi-3.5 MoE model, which features a mixture of experts architecture, was trained on 4.9 trillion tokens with 512 H100-80G GPUs over 23 days.
Microsoft’s commitment to the open-source community is evident as all three Phi-3.5 models are available under the MIT license.
This license allows developers to freely use, modify, merge, publish, distribute, sublicense, or sell copies of the software.
It’d be a lot cooler if they released some base models, too.
— Matthew Powers (@mpowers206) August 20, 2024
The license also includes a disclaimer that the software is provided “as is” without warranties. Microsoft and other copyright holders are not liable for any claims, damages, or other liabilities arising from the software’s use.
The release of the Phi-3.5 series represents a huge step forward in developing multilingual and multimodal AI.
I’ve already installed the 4-bit quantized version of Phi 3.5 Mini on my phone, and it’s excellent. It outperforms Phi 3 Mini in every way and will replace it for me. It even handles other languages impressively well now. I’m really impressed.
— Christian Schoppe (@ChristianS26469) August 20, 2024
By offering these models under an open-source license, Microsoft empowers developers to integrate cutting-edge AI capabilities into their applications, fostering innovation across both commercial and research domains.
For more news and trends, visit AI News on our website.