Google DeepMind has launched two innovative artificial intelligence models intended to empower robots to tackle intricate general tasks and perform reasoning that was once considered impossible.
Earlier this year, the firm unveiled the first version of Gemini Robotics, tailored from its Gemini large language model specifically for robotic applications. This advancement permitted machines to think and carry out fundamental tasks in real-world scenarios. For instance, the banana test demonstrated how the AI could follow simple commands like “place this banana in the basket,” allowing a robotic arm to complete the assignment. The new models enable a robot to sort different fruits into containers by color, as seen when the Aloha 2 robot proficiently organized a banana, an apple, and a lime while verbally explaining its actions.
As Jie Tan, a senior staff research scientist at DeepMind, remarked, the design empowers the robot to think. It observes its surroundings, executes tasks systematically, and accomplishes complex operations. Although the demonstration might appear simple, it showcases significant potential for advanced humanoid robots destined for more challenging activities.
The capabilities of the robot extend to identifying items spatially, recognizing colors, linking fruits to plates, and articulating its reasoning in natural language. This functionality stems from the cooperation of the AI models, which operate in a supervisor-worker relationship. The upgraded Google Robotics-ER 1.5 model serves as the brain, interpreting spatial data and verbal commands, while Google Robotics 1.5 evaluates visuals, plans tasks, and provides feedback on its reasoning. These improvements surpass earlier versions, allowing the robots to utilize resources like Google Search to perform effectively, as demonstrated when Aloha sorted items according to local recycling guidelines.
The ainewsarticles.com article you just read is a brief synopsis; the original article can be found here: Read the Full Article…


