A creepy demo of “Figure 01,” a humanoid, conversational robot, has hit the internet — and I can’t believe it’s not a deleted scene from I, Robot.
In the demo, Figure 01, packed with OpenAI-tech, is asked what it can “see.” Showing off its visual recognition prowess, the avant-garde robot accurately explains what’s in front of it: a red apple, a drying rack with dishes, and the man who asked Figure 01 the question.
OK, a bit uncanny, but it’s nothing we haven’t seen before, right? For example, last year, Google showed off how the AI model Gemini could recognize stimuli placed in front of it, from a blue rubber duck to various hand-drawn illustrations (though it was later discovered that slick editing slightly exaggerated its capabilities).
But then, the man asks, “Can I have something to eat?” Figure 01 grabs the apple, clearly recognizing that it’s the only edible object on the table, and hands it to him.
Er, are we sure that Will Smith isn’t going to pop up any time soon?
How does the Figure 01 robot work?
What, exactly, is underpinning Figure 01’s seamless interaction with a human? It’s a new Visual Language Model (VLM) transforming Figure 01 from a clunky hunk of junk to a sci-fi-esque, futuristic robot that is a little too human-like. (The VLM stems from a collaboration with OpenAI and Figure, the startup behind Figure 01.)
After handing over the apple, Figure 01 reveals that it can tackle several tasks at the same time when asked, “Can you explain why you [gave me the apple] while you pick up this trash?”
While recognizing what’s trash (and what’s not) and placing the proper items into what Figure 01 identifies as a bin, the robot explains that it offered the man an apple because it was the only thing in front of him that can be eaten. That’s some impressive multitasking!
Finally, the man asks Figure 01 how well it thinks it did. In a conversational manner, the robot says, “I-I think I did pretty well. The apple found its new owner, the trash is gone, and the tableware is right where it belongs.”
According to Brett Adcock, the founder of Figure, Figure 01 has onboard cameras that feed the VLM data that helps it “understand” the scene in front of it, allowing the robot to smoothly interact with the human in front of it. Alongside Adcock, Figure 01 is the brainchild of several key players from Boston Dynamics, Tesla, Google Deep Mind, and Archer Aviation.
Taking a dig at Elon Musk’s Optimus robot, Adcock boasted that Figure 01 is not teleoperated. In other words, unlike Optimus, which went viral for folding a shirt, Figure 01 can operate independently.
Adcock’s ultimate goal? To train a super-advanced AI system to control billions of humanoid robots, potentially revolutionizing multiple industries. Looks like I, Robot is a lot more real than we thought.