Because vision intelligence can understand what it sees, contextualize that information, make decisions based on the information, and change or alter the appearance of what is there. Credit: eXeX From its Darwin AI acquisition to recent reports claiming Apple might work with Google and others to support a wider array of generative AI (genAI) tools than it plans to introduce, it’s pretty clear the company has chosen to be focused in where it creates its own AI technologies. At least one of these focus areas reflects work the company has been doing since before AI became a buzzword — and that’s vision intelligence. Intimations of life By this, I specially mean AI that can understand what it sees, contextualize that information, make decisions based on it, change or alter the view, and so on. You might already be making use of this kind of AI: Each time you photograph a document and Apple lets you copy the text to paste into another document. When your iPhone can tell you where the doors of a building are. When you tap the ‘I’ button in Photos to get connected to descriptions of what is visible. When your iPhone tells you the meaning of a laundry label you expose it to. When you use Translate to decipher text on signs around you. When the LiDAR sensor provides you with a room map. There are many other examples. There may even be better illustrations that demonstrate the direction of travel. Electron blues Apple’s researchers recently published a white paper that has generated consternation and comment since its release. It describes a technology called MM1, which is a Multimodal Model for Text and Image Data. That means it can train large language models (LLMs) using both text and images and is being called a “significant advance” for AI. The models using the tech performed excellently at such tasks as image captioning, visual question answering, and natural language inference. The system also showed strong in context-learning capabilities. In other words, it can learn fast by being exposed to text/words and images, which also means the tech could eventually handle really complex, open-ended problems. The latter is a holy grail for AI research, as achieving it means machines capable of solving problems in a highly contextual way. That’s all good, but what’s important here is the use of images. This is not the first time in recent months Apple has harnessed machine vision intelligence this way. Toward the end of 2023, its Keyframer animation tool shipped, and even earlier in 2023 we heard that part of what the company intended to build was AI capable of creating realistic immersive scenes for use in Vision Pro. Automated for the people And the latter product is of course the space in which so much of Apple’s vision for Generative Visual AI may make the biggest difference, as the implications are profound. Think how it makes it possible for one person wearing a Vision Pro to enter an environment — any environment — and while exploring that space build a perfect digital replica of that place that can also be shared with others. Thing is, this tool isn’t just a dumb representation of the place; armed with vision intelligence, the resulting shared experience wouldn’t just look like the place you were exploring, with a few parameter tweaks to correct any errors, it would effectively be a fully functioning digital representation of that space. This is useful in all kinds of situations, from traffic management to building and facilities management, but the capacity to build true-to-life, smart and intelligent representations of spaces also extends to architecture and design. And, of course, there are evident implications for health. None of these ideas may turn out to work quite the way I’m articulating, though I’m 100% certain Vision Pro’s place in building digital twins for multiple industries will turn out to be set in stone. Everybody hurts But the combination of new highly visual operating systems (visionOS) with a highly visual AI capable of deep contextual understanding and response isn’t something that’s just catching up with the famed Tom Cruise movie, Minority Report. It is a tech deployment about to happen in real time that is moving beyond the visions of the futurologists who advised on that movie. No wonder the entire industry now wants to move in Apple’s direction — it’s got to hurt to see the company get there fastest. But everybody hurts, sometimes. Please follow me on Mastodon, or join me in the AppleHolic’s bar & grill and Apple Discussions groups on MeWe. Related content news analysis EU commissioner slams Apple Intelligence delay Margrethe Vestager, Europe's chief gatekeeper, takes a shot at Apple's decision to delay rolling out the company's AI. By Jonny Evans Jun 28, 2024 7 mins Regulation Apple Generative AI news Hexnode CEO: Enterprises must get ready for app sideloading As iOS app sideloading unfurls in Europe, companies all need to figure out how to protect themselves, said Hexnode CEO Apu Pavithran. By Jonny Evans Jun 27, 2024 5 mins Apple App Store Enterprise Mobile Management Mobile Device Management news analysis OpenAI brings its ChatGPT app to all Mac users You no longer need to pay to use the ChatGPT app on an Apple Silicon Mac. By Jonny Evans Jun 26, 2024 4 mins Mac Chatbots Apple opinion Why Apple is now in the server market It's not just about security, it's also about scale and efficiency. By Jonny Evans Jun 25, 2024 5 mins Apple CPUs and Processors Generative AI Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe