Apple has announced an update to its AI model. This has announced that it will feature Apple Intelligence features across iO, MacO and more. However, according to the company’s own benchmarks, the models perform older models from rival high-tech companies, including Openai.
In a blog post Monday, Apple said that the Human Tester rated the quality of the text generated by the latest “Apple On-Device” model running offline on products including the iPhone as “equivalently” and not greater than text on comparable sized Google and Alibaba models. Meanwhile, these same testers are designed to evaluate Apple’s more capable new model called “Apple Server” and run in the company’s data center behind Openai’s one-year-old GPT-4o.
According to Apple, in another test that assesses Apple’s model’s ability to analyze images, human raters preferred Meta’s Llama 4 Scout model over Apple Server. That’s a bit surprising. In many tests, the Llama 4 Scout performs worse than the major models in AI labs such as Google, Anthropic, and Openai.
The benchmark results add credibility to the report that suggests that Apple’s AI research division is struggling to keep up with its Cutthroat AI race competitors. Apple’s AI capabilities have been overwhelmed in recent years, and the promised Siri upgrades have been delayed indefinitely. Some customers are suing Apple and accusing them of marketing AI capabilities for products that are not yet offered.
In addition to text generation, Apple On-Device, which has a size of approximately 3 billion parameters, drives features such as summaries and text analysis. (Parameters roughly correspond to the model’s problem-solving skills, and models with models with more parameters generally perform better than those with fewer parameters.) As of Monday, third-party developers can take advantage of it via Apple’s Foundation Models Framework.
Apple says that both Apple On-Device and Apple Server have improved tool usage and efficiency compared to its predecessors, allowing you to understand around 15 languages. This is thanks to an extended training dataset that includes image data, PDFs, documents, manuscripts, infographics, tables and charts.