researchers Artificial intelligence At Microsoft They have recently unveiled a new model that can analyze and understand the content of images and has other capabilities such as solving puzzles, recognizing text and understanding language commands.
What are the features of Microsoft’s artificial intelligence Kosmos-1?
Microsoft’s new artificial intelligence, which consists of several modules, Kosmos-1 It is called Its creators believe that this tool is a key step towards achieving artificial general intelligence (AGI) that can perform general human-level tasks. The important point is that Microsoft’s new experimental product can use different types of input such as text, sound, image and even video to receive commands and in this sense it is slightly different from the current examples.
The published image examples of the artificial intelligence mentioned in the related research article show that this tool has the possibility of examining images and answering questions related to their content. Reading text in photos and writing descriptions around them are other things it can do. Although there is currently a lot of media attention on artificial intelligence LLMs; But experts believe that tools based on multiple modules can have more potential to become comprehensive artificial intelligence. Achieving such a tool is the ultimate goal of many companies active in this field, including OpenAI, the maker of ChatGPT, which also works closely with Microsoft.
Of course, it seems that Microsoft’s new artificial intelligence was developed independently of OpenAI and only by Redmonds. Its creators describe their product as a massive multi-language model (MLLM); Because despite the support of various command inputs, its final processing is still based on the LLM text-based model and similar to current tools such as ChatGPT. For this reason, it is quite natural that for this artificial intelligence to understand images, their content must first be converted into text.
Microsoft has used the data available on the Internet to train its new artificial intelligence, among which sources can be mentioned The Pile (a collection of 800 GB of English texts) and Common Crawl. After the initial exercises, the performance of the discussed tool was evaluated in various tests such as language comprehension tests, image character recognition, writing descriptions for photos, answering questions from images or web pages, etc. According to Microsoft’s claim, this product has been able to pass the current models in use in many tests.
One of the interesting tests in which Kosmos-1 was able to show an acceptable performance, Raven’s test (Raven) has been Raven’s progressive matrix is designed to evaluate IQ based on the prediction of image sequences and is included in the category of non-verbal group intelligence tests that are often used in educational centers. Microsoft’s artificial intelligence has been able to successfully answer 22-26% of questions in its efforts; Although the amount seems low, it is significantly far from the random response rate of 17%.
Microsoft’s new product is in its early stages, and it is expected that in the future, with further optimizations, it can show better performance. Such AI models, which can support various inputs, have great potential to be used as artificial assistants by users. It is possible that Microsoft researchers will be able to add other features, including speech, by expanding their invention. Microsoft has announced plans to make Kosmos-1 available to developers; But he has not yet announced an exact date for the release of the codes.