AI Development Platform Hugging Face’s team has released something that can analyze images, short videos, and textbooks.
SMOLVLM-256M and SMOLVLM-500M models are designed to properly operate on “restricted devices” such as laptops equipped with RAMs with less than 1GB. The team states that it is ideal for developers who are trying to handle a large amount of data at a very low price.
The size of SMOLVLM-256M and SmolVLM-500M is only 256 million parameters and 500 million parameters. (Parameters are almost compatible with model problem -solving skills, such as mathematical test results.) Both models explain images and video clips, pdf and elements (scanned texts and graphs, etc. You can execute tasks such as answering questions regarding).
In order to train SMOLVLM-256M and SmolVLM-500M, the Hugging Face team is a set of 50 “high quality” images and text data sets, and a file scan combined with detailed captions. I used docmatix. Both were created by Hugging Face M4 teams developing multi -modal AI technology.

The research team is a benchmark that contains AI2D, which tests the ability of a model that analyzes elementary school-level scientific diagrams, and is much better than the very large model, iDefics 80B. I claim. SMOLVLM-256M and SmolVLM-500M are available on the Web and can be used without restrictions because they can be downloaded from Huging Face based on Apache 2.0 license.
Small models such as smolVLM-256M and smolvlm-500M are inexpensive and large, but large models may contain not so noticeable defects. Recent research by Google Deepmind, Microsoft Research, and Mila Research in Quebec revealed that many small models are more performance than expected in complex reasoning tasks. Researchers speculated that this is because small models recognize the surface -level patterns of data, but to apply that knowledge to new contexts.
TechCrunch has a newsletter focusing on AI. If you sign up from here, you will reach the receiving box every Wednesday.