Google Assistant is about to get a lot smarter in how it speaks to you, understands you, and sees the world. Google engineer Behshad Behzadi gave us a peek at some of the assistant’s improved natural language understanding and computer vision this week at Google Developer Days in Krakow, Poland.
Behzadi called his demo a “mixture of things which are live and launched,” and said each new feature would become available “within the next few months or the next year.” Here are some of the features that do not yet appear to be live or widely available.
Google Lens computer vision
We’ve known Google Assistant would work with Lens computer vision since it was first announced at the company’s I/O developer conference in June.
Google Lens can identify objects, text, and buildings just by pointing your camera at something. With Lens, Google Assistant gets the ability to see, then talk to you about what it identifies with your camera.
One example showcased at I/O was Lens pointed at a theater, then looking for tickets to shows at the theater online. Behzadi showed Lens doing even more, including pointing the camera at an apple to get the number of calories and at paper money to convert the value to a different currency. Beyond ecommerce in augmented reality to sell theater tickets, the ability to tell a person the amount of sugar in an apple or number of calories in a food is a bit more everyday, and therefore more powerful for a personal assistant.
To showcase this feature, Behzadi told Google Assistant “Be my Vietnamese translator” and the assistant performed real-time translation, both with text on the phone and through the Assistant’s voice. No details were provided about the number of languages that would be available with on-the-spot-translation.
This feature surfaces shortly before Apple is expected to announce the launch of iOS 11 later this month, which will include real-time translation with Siri.
Better contextual understanding
Google Assistant is also learning how to focus on the intent of your first question, then continue to answer follow-up questions about the initial topic. Once available, this feature will mean users can go deeper into understanding a topic without the need to restate their intent after every question.
So you will be able to say “Where is the Empire State building?” Then “I want to see pictures” or “Who built it?,” and you’ll get Empire State building results. Say “What are the Italian restaurants around there?” and Google Assistant will serve up listings near the Empire State building.
Following the same logic with image searches, say “Show me Thomas” and you’ll get a picture of the most popular result, Thomas the Tank Engine, but say “Bayern Munchen team roster,” and then say “Show me Thomas,” and you’ll get photos of Thomas Müller, a player on the team roster.
The ability to answer follow-up questions was first spotted in Amazon’s Alexa in late 2016.
Vague, longer queries will also be better understood with better natural language understanding.
Say “What is the name of the movie where Tom Cruise acts in it and he plays pool and while he plays pool he dances?” and Google Assistant will respond with the name of the movie (it’s The Color of Money), a summary, and the cast.
“This is possible by merging the power of search — the signals coming from Google search — with machine learning,” Behzadi said.
You can already tell Google Assistant to remember the name of your favorite sports team. In the future you will be able to ask “How is my team doing?,” and in updates to come, you will be able to teach Google Assistant more about your preferences. Onstage, Behzadi told Google Assistant “When the weather is more than 25 degrees (Celcius) I can swim in the lake of Zurich,” to which the assistant replied “OK, understood.” In the next question, he asks “Can I go swim in the lake of Zurich this weekend?” and Google says “No, you can’t. The temperature is less than 25 degrees.”
Better understanding in loud environments
No specific details about improved speech recognition were provided but Behzadi said Google Assistant is getting better at understanding voices in loud environments.
“We actually have spent lots of time on trying to improve the speech recognition in noisy environments, added lots of data to the machine learning systems behind automatically generated noise like fake noise of a stadium or people or cars and that’s actually how we’ve managed to significantly improve this,” he said.