Microsoft launched several new “open” AI models on Wednesday. The most capable of it will compete with Openai’s O3-Mini on at least one benchmark.
All new licensed models (Phi 4 Mini Teasoning, Phi 4 Reasoning, and Phi 4 Reasoning Plus) are “inference” models. This means you can spend more time on fact-checking solutions for complex problems. They expanded Microsoft’s PHI “Small Model” family, and a year ago the company provided the foundation for Edge’s AI developer building apps.
PHI 4 Mini Inference was trained on approximately 1 million synthetic mathematical problems generated by the R1 inference model of Chinese AI startup DeepSeek. With a size of approximately 3.8 billion parameters, the Phi 4 Mini Reasoning is designed for educational applications, Microsoft says, like “embedded tutoring” for lightweight devices.
Parameters roughly correspond to the model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.
The Phi 4 Reasoning, a Phi 4 Parameter model, was trained using “high quality” web data and Openai’s aforementioned “curation demonstration” of O3-Mini. According to Microsoft, it is ideal for mathematics, science and coding applications.
For Phi 4 Reasoning Plus, the previously released Microsoft PHI-4 model has adapted to the inference model to improve the accuracy of certain tasks. Microsoft claims that the PHI 4 Reasoning Plus is approaching the R1 performance level. R1 is a model with quite a few parameters (671 billion). The company’s internal benchmark also includes the PHI 4 reasoning, which matches Omnimath’s O3-Mini, a mathematical skills test.
Phi 4 Mini Reasoning, Phi 4 Reasoning, and Phi 4 Reasoning Plus embrace the AI Dev platform with detailed technical reports.
TechCrunch Events
Berkeley, California
|
June 5th
Book now
“Balance size and performance of these (new) models using distillation, reinforcement learning, and high-quality data,” Microsoft wrote in a blog post. “It’s small enough for low latency environments, but still has strong inference capabilities comparable to much larger models. This blend allows complex inference tasks to be efficiently performed on resource-limited devices.”