Google researchers claim they have built a Transporter Network AI model architecture that allows object-grasping robots to reason about which visual signals are relevant and how they should be rearranged in a scene. The researchers say their Transporter Networks have achieved “superior” efficiency during experiments on a variety of tasks, including stacking a block pyramid, assembling kits, manipulating ropes, and moving small object piles.
Grasping robotics is a challenge. Robots, for instance, fail to do what is called “mechanical search,” which is when they have to locate an object from within a stack of other objects and pick it up. Most robots are not particularly adaptable, and there is a shortage of sufficiently capable AI models in mechanical quest to direct robot pincers, an issue that has come to the fore as the pandemic causes businesses to consider implementing automation.
The Google study coauthors say Transporter Networks don’t need any prior 3D model, pose, or class category knowledge of the objects to be manipulated, instead relying only on information found within partial depth camera data. They’re also capable of generalizing to new artifacts and configurations and, for certain activities, learning from a single demonstration. In reality, Transporter Networks, trained from scratch, ostensibly achieved over 90% performance on most tasks with artifacts in new configurations on 10 specific tabletop manipulation tasks using 100 expert video demonstrations of the tasks.
On datasets of demonstrations ranging in number from one demonstration to 1,000 per mission, the researchers trained Transporter Networks. They first deployed them on Ravens, a virtual benchmark learning environment with a suction gripper overlooking a 0.5 x 1 meter workspace consisting of a Universal Robot UR5e device. Then, using true UR5e robots with suction grippers and cameras including an Azure Kinect, they validated the Transporter Networks on package assembly tasks.
The researchers conducted their experiments by using a Unity-based software that allows individuals to remotely teleoperate robots due to pandemic-related lockdowns. The teleoperators were charged with assembling and disassembling a package of five small bottled mouthwashes or nine uniquely formed wooden toys repeatedly for one experiment, using either a virtual reality headset or mouse and keyboard to mark picking and placing poses. The Transporter Networks, which were trained on all tasks by 13 human operators with 11,633 pick-and – place acts in total, achieved 98.9 percent success in assembling bottled mouthwash kits.
“We presented the Transporter Network in this work, a simple model architecture that infers spatial displacements that can parameterize visual input robot behavior,” the researchers wrote. It allows no object assumptions, exploits spatial symmetries, and is more effective in learning vision-based manipulation tasks in order of magnitude than end-to – end alternatives … In terms of its current limitations: it is responsive to camera-robot tuning, and how to combine torque and force behavior with spatial action spaces remains uncertain. Overall, we are enthusiastic about this course and plan to expand it to high-rate monitoring in real-time, as well as tool-use activities.
The coauthors state they plan to release Ravens (and an associated API) code and open-source in the near future.
No Comments
Leave a comment Cancel