Robots are learning to do housework from videos of humans doing chores

Silicon Valley’s next great leap may be built on videos of people folding laundry.

Start-ups and entrepreneurs including Tesla CEO Elon Musk are trying to make robots smart enough to help with chores around the home. But adapting artificial intelligence to new tasks requires example data, like the online text and photos that enabled chatbots to start generating high-quality documents and images.

DoorDash, a food-delivery service, has joined a cottage industry of companies and researchers gathering data for the robot revolution in the form of videos of people doing tasks like folding clothes or washing dishes. Gig workers can now earn as much as $25 an hour by recording themselves doing chores for DoorDash.

Here’s how that video is used and why it’s so valuable.

How to train robots to fold clothes

People fold laundry while wearing a head-mounted smartphone. The phone records video, which is processed to track movement of the head, hands and fingers.

To help transfer the skill from human fingers to robot movement commands, AI researchers record themselves folding clothes using robotic limbs.

To help AI algorithms translate the skill from human fingers to robot movement commands, data is also collected from expert robot operators folding clothes using remote-controlled robotic limbs.

Researchers feed all the data into machine learning algorithms, which learn to predict from what a robot sees through its cameras and sensors what movements it should make to fold an item of clothing.

Finally, the AI model is loaded into robots that attempt to fold clothes autonomously.

Videos from EgoMimic, EgoVerse and Nvidia

People fold laundry while wearing a head-mounted smartphone. The phone records video, which is processed to track movement of the head, hands and fingers.

To capture a variety of situations a robot may need to understand, people are recruited around the world to record themselves as they fold different types of clothes on different surfaces.

To help transfer the skill from human fingers to robot movement commands, AI researchers record themselves folding clothes using robotic limbs.

Finally, the AI model is loaded into robots that attempt to fold clothes autonomously.

Videos from EgoMimic, EgoVerse and Nvidia

People fold laundry while wearing a head-mounted smartphone. The phone records video, which is processed to track movement of the head, hands and fingers.

To capture a variety of situations a robot may need to understand, people are recruited around the world to record themselves as they fold different types of clothes on different surfaces.

To help transfer the skill from human fingers to robot movement commands, AI researchers record themselves folding clothes using robotic limbs.

Researchers feed the data into a machine learning process, where an AI model learns to predict what movements to take next to fold a shirt.

Finally, the AI model is loaded into robots that attempt to fold clothes autonomously.

Videos from EgoMimic, EgoVerse and Nvidia

The household chores data grab is a bet on what AI insiders call scaling laws. Researchers have found that AI models for working with text or images get progressively better the more data they were trained on, and researchers hope the same is true for robotics.

“There is evidence that a lot of data would help” robots do more complex tasks, said Ken Goldberg, a roboticist and distinguished chair of engineering at the University of California at Berkeley.

But unlike for chatbots, there isn’t an easy place to get oceans of relevant data. “There’s no internet for robot data,” Goldberg said.

Chatbots learn to generate coherent sentences by analyzing human-written text, raw material that is readily available from the web, books or numerous other sources.

Training robot control software is more complicated. To take on household chores, a robot needs to decipher data from its sensors, predict which actions will achieve a goal like folding a shirt, and send commands to limbs and grippers to make the appropriate motions. There’s no ready-made repository of data demonstrating how to do that. Even videos of people doing chores don’t have all the elements needed.

Size of AI training datasets

Sizes of AI training datasets

= 10 years of human effort

~5 years to watch

(½ square)

Dataset for training chatbots

The chatbot estimate assumes a 1.5 trillion token dataset, at 1.33 tokens per word and a reading pace of 238 words per minute.

Source: Estimate by robotics researcher Kevin Black of UC-Berkeley and Physical Intelligence

Sizes of AI training datasets

= 10 years of human effort

Dataset for training chatbots

~5 years to watch (½ square)

The chatbot estimate assumes a 1.5 trillion token dataset, at 1.33 tokens per word and a reading pace of 238 words per minute.

Source: Estimate by robotics researcher Kevin Black of UC-Berkeley and Physical Intelligence

One way to gather training material is to record data while humans manually operate robots. “Robot teleoperation data is probably considered the highest quality of data,” because it includes robot motion commands, said Simar Kareer, a robotics researcher at Georgia Tech who helped pioneer training robots on human videos.

But “it’s just the most expensive to collect,” Kareer said, because you have to pay people to operate an expensive robot, and “the person is completing these tasks much, much slower than they would if they were using their own hands.”

Kareer is working to show that a large collection of cheaper human video data can provide AI a baseline understanding of how to do tasks that can be refined with a smaller pool of expensive teleoperation data to teach software how to make specific robotic actions.

Other researchers and companies are trying different tactics to reduce the cost of gathering the training data needed for a robot revolution.

One is to give humans a handheld version of a robot gripper to make it easier and quicker to demonstrate tasks in a way easily translated to robot control software. Others build robots to be as similar to humans as possible. If a machine has the same number of fingers and joints as humans, the thinking goes, it will be easier for AI software to transfer skills from human videos to robots. Another idea is to let robots experiment and learn in a simulated environment, like a video game, before transferring the control software onto real robots.

Ultimately, the best data for making robots better at folding your clothes will come after they begin to be deployed and doing real tasks in the world. But it’s not clear how soon that will be possible.

How long until a robot can do your laundry? “Maybe in two years, three, five, 10, 20,” Goldberg said. “Or longer.”

Robots are learning to do housework from videos of humans doing chores

Tags: