Google has taken a significant step forward in the AI chatbot race by introducing multimodal capabilities to its search-centric AI Mode chatbot.
The tech giant announced today that the upgraded system can now “see” and interpret images, combining the power of a custom Gemini AI model with Google’s established Lens image recognition technology.
The enhanced AI Mode allows users to either take a photo or upload an existing image and receive detailed, comprehensive responses about what’s in the picture, complete with relevant web links for further exploration. This feature is now accessible to users of the Google app on both Android and iOS platforms.
Robby Stein, VP of product for Google Search, emphasized how the update builds upon the company’s extensive background in visual search technology: “AI Mode builds on our years of work on visual search and takes it a step further.
With Gemini’s multimodal capabilities, AI Mode can understand the entire scene in an image, including the context of how objects relate to one another and their unique materials, colors, shapes, and arrangements.” The upgraded system utilizes what Google calls a “fan-out technique,” which essentially allows it to generate multiple queries about different aspects of an image simultaneously.
This approach enables the AI to provide nuanced and contextually relevant responses tailored to the visual input. For example, when shown a bookshelf, the system can identify specific titles, suggest similar books with positive ratings, and answer follow-up questions to refine recommendations further.
This development represents Google’s strategic response to competing AI search platforms like Perplexity and ChatGPT Search, which have gained popularity for their ability to generate AI-powered summaries drawn from vast online information sources. Google’s AI Mode similarly offers a conversational interface that can pull information from the company’s extensive search index to create comprehensive answers.
Access to AI Mode is also expanding significantly. Initially launched exclusively for Google One AI Premium subscribers as part of Google Labs last month, the company has now begun rolling out access to “millions more” Labs users across the United States, beyond the paying subscriber base.
This move suggests Google is gaining confidence in the system’s capabilities and reliability.
The multimodal update represents a breakthrough in how people are able to interact with search technology. Rather than typing in text descriptions of what they’re looking for, people can now simply point their camera at objects of interest and receive rich information in a natural conversational form.
This technology holds the potential to be particularly valuable in scenarios that range from shopping (locating products and alternatives) to learning (learning about landmarks or objects) and overall problem-solving.
The inclusion of visual understanding capabilities also reflects Google’s dedication to search technology leadership as AI continues to redefine user expectations. By leveraging its established strengths in image recognition capabilities with more recent AI language models, Google is constructing a more natural search experience that understands content in a range of forms.
As competition in the AI search space heats up, this update reflects Google’s determination to push its core search functionality beyond the traditional text-based query. The move to a broader user base will give it valuable feedback that will further hone the system before a possible global launch.
To be able to make use of this feature, they need to make sure that they are in AI Mode and search on their Google mobile app for a new feature under their AI Mode option to upload a photo.