Advances in physical AI mean machines are learning skills previously thought impossible
“This is not science fiction,” declared Jensen Huang, boss of chip giant Nvidia, in June, referring to the use of AI to instruct robots to carry out real world tasks. “The next wave of AI is physical AI. AI that understands the laws of physics, AI that can work among us.”
In many ways this robotics revolution seems long overdue. For decades, people have envisioned living alongside humanoid domestic droids capable of doing their mundane chores.
But after years of slow progress, researchers now appear to be on the cusp of the dramatic advances required to create a new generation of automatons. AI has powered a number of recent research breakthroughs that have brought complex tasks that had previously separated humans from robots within reach.
While the multi-purpose machine helper capable of doing everything a human can, only better, is still a way off, the fact a robot can put a T-shirt on a coat hanger is a sign that this is now possible. Such developments could be transformative in the fields of homecare, health and manufacturing — and investors are taking note.
The excitement around recent advances is attracting growing interest and large sums of cash from a welter of researchers, big tech companies and investors, even if the quantum leap in funding has not quite arrived. More than $11bn of robotics and drone venture capital deals had been done as of late October, surpassing last year’s $9.72bn but not quite reaching the $13.23bn of 2022, according to PitchBook.
“The floodgates have really opened,” says Russ Tedrake, a professor at the Massachusetts Institute of Technology and the vice-president of robotics research at the Toyota Research Institute, reflecting on the interest in the rapid developments in the field. “The tech giants are jumping in, the start-ups are just springing up . . . everybody’s optimistic that it’s coming.”
Multiple breakthroughs
Robots have long inspired both excitement and fear. Science fiction has for a century feasted on the idea of machines as loyal servants and terrible masters. The word robot’s Slavonic roots, roughly translating as “forced labour”, give a sense of that uncomfortable relationship.
Robots play contrasting roles in popular culture, from the save-the-day heroism of C3PO and R2D2 in the Star Wars franchise to the relentless T-1000 assassin in The Terminator. More recently, in the 2023 Oscar-nominated feature film Robot Dreams, a mail order automaton becomes a life-changing buddy for a lonely New Yorker, while the Wild Robot’s Rozzum Unit 7134 saves the life of a tiny, fluffy gosling while developing the most fundamental of human feelings — love.
Yet in the real world, getting robots to perform even mundane tasks has proved difficult. Interacting with people remains particularly challenging, given robots need to navigate our dynamic spaces and understand the subtle ways humans communicate intentions. The fact that Elon Musk’s humanoid Optimus robots — which were seen serving drinks at a Tesla event — were actually operated remotely by humans is a case in point.
Limitations of hardware and, especially, software have restricted robot abilities, even as they have transformed some industrial processes, such as automating warehouses. Previous generations of machines had to be programmed using complicated code or were taught slowly through trial and error, techniques that resulted in limited abilities in narrowly defined tasks performed in highly controlled environments.
But thanks to advances in AI, the past two years have been different, even for those who have been working in the field for some time. “There’s an excitement in the air, and we all think this is something that is advancing at a much faster pace than we thought,” says Carolina Parada, head of Google DeepMind’s robotics team. “And that certainly has people energised.”
Some of the biggest leaps in the field have been in software, in particular in the way robots are trained.
AI-powered behaviour cloning methods — where a task is demonstrated to a robot multiple times by a human — have produced remarkable results. Researchers at the Toyota Research Institute can now teach robot arms complex movements within hours instead of weeks.
Key to this learning process is “diffusion”. Well-known within the world of AI image generation, the technique has been further developed by roboticists.
Instead of using diffusion to generate images, roboticists have begun to use it to produce actions.
This means robots can learn a new task — such as how to use a hammer or turn a screw — and then apply it in different settings.
When used for robot manipulation tasks, noise is applied to a training dataset in the form of random trajectories in a similar way to pixels being added to images.
The diffusion model then removes each of these random trajectories until it has a clear path that can be followed by the robot.
Researchers at Stanford University and Google DeepMind have had success with similar diffusion techniques. “We picked three of the most dexterous tasks we could think of . . . and tried to see if we could train a policy [the robot AI system] to do each,” said Stanford assistant professor Chelsea Finn in a post on X. “All of them worked!”
The team had taught a previous version of the robot to autonomously cook shrimp, clean up stains and call a lift.
Building on LLMs
The extraordinary progress in generating text and images using AI over the past two years have been accomplished by the invention of large language models (LLMs), the system underpinning chatbots.
Roboticists are now building on these and their cousins, visually conditioned language models, sometimes called vision-language models, which connect textual information and imagery.
With access to huge existing troves of text and image data, researchers can “pre-train” their robot models on the nuances of the physical world and how humans describe it, even before they begin to teach their machine students specific actions.
But in the chaotic, ever-changing real world, machines not only need to be able to execute individual tasks such as these, they also need to carry out a multitude of jobs in different settings.
Those at the heart of robotics believe the answer to this generalisation is to be found in foundation models for the physical world, which will draw on growing databases relating to movement — banks of information recording robot actions. The hope is these large behaviour models, once big enough, will help machines adapt to new and unpredictable environments, such as commercial and domestic settings, rapidly transforming businesses and our home lives.
But these models face numerous challenges beyond what is required for linguistic generative AI. They have to drive actions that obey the laws of physics in a three dimensional world and adapt to dynamic environments occupied by other living things.
The current challenge for developing these large behaviour models is the scarcity of data — a difficulty also facing large language models as a result of human information sources being exhausted. But a great communal effort is under way among the robotics community to generate new sets of training data.
“We’ve been seeing more data be available, including data for very dexterous tasks — and seeing a lot of the fruits of putting that data in,” says Stanford’s Finn. “It suggests that if we are able to scale things up further, then we might be able to make significant breakthroughs in allowing robots to be successful in real world environments.”
Some of the latest research has found that this action data does not need to have originated from the same type of robot or even the exact skill a robot is trying to learn in order for it to be successful. It can be from different machines executing a wide range of tasks in different environments. Essentially, robots appear to be able to teach other robots even if they do not look like them.
Tedrake at Toyota Research Institute says the recent proliferation of robot hardware, from automatons on factory floors to home devices such as Roomba vacuum cleaners, should also help fill this data gap. The more robots that exist, the more visual data of them completing actions will be available.
Toyota’s newly announced research partnership with one of the world-leaders in hardware, Boston Dynamics, aims to take this a step further — developing whole-body behaviour models for some of the world’s most advanced humanoids.
Embodiment advances
One of these is the hydraulic Atlas humanoid robot, famous on social media for its parkour and dance moves, which illustrates how Boston Dynamics has pushed robot hardware forward over the past three decades.
The story of the company’s dog-like robot Spot is an indication of how much progress has been made — and the challenges that remain. Spot has “athletic intelligence”, its creators say, and with more than 1,500 of the quadruped machines now working for businesses and other organisations, they are already playing a key role in industrial processes. They have also been seen patrolling the grounds of President-elect Donald Trump’s Mar-a-Lago resort in Palm Beach, Florida.
Well suited to jobs that are repetitive, arduous or potentially hazardous for humans, the robot dogs can be deployed to help with disaster search and rescue operations, nuclear decommissioning and bomb disposal — but their number one use is industrial inspections. Pharmaceuticals giant GSK uses a bespoke Spot to check over tanks of the propellant used in the company’s Ventolin inhalers, for example.
Spot’s AI abilities also allow it to learn quickly. When brewing giant AB InBev’s version of the canine machine encountered slippery floors while looking for air leaks in canning lines, its Boston Dynamics masters used machine learning simulations to teach it to cope. The work proved so effective, it has been rolled out to all Spots worldwide.
“So just like your iPhone gets a new software update, your robot does too,” says Nikolas Noel, vice-president of marketing and communications at Boston Dynamics, now owned by South Korean carmaker Hyundai Motor Group.
Yet the goal for many in the field of robotics remains a fully adaptable multi-skilled industrial machine, capable of tasks such as embroidery and assembly for small batch production — a breakthrough that would revolutionise manufacturing industries. Although AI has brought that ambition closer, it has not yet fulfilled it, says Nick Hawes, professor of artificial intelligence and robotics at Oxford university.
“The whole deep learning revolution in AI has unlocked the ability to generalise across a very wide range of inputs, by using these very large neural networks and a huge amount of data and compute,” Hawes says. “[But] I don’t think we’ve seen the general purpose rollout of these things . . . that still is off there in the future.”
Real world help
Boston Dynamics bade farewell earlier this year to Atlas. A video montage showed the machine doing backflips and long leaps, but also falling off a platform and rolling uncontrollably down a grassy slope. The metaphor of robotics as a field of promise, pitfalls under the glare of intense public interest was hard to miss.
Engineers have now introduced an electrified successor, Electric Atlas, complete with agile joints and enhanced AI abilities, raising the prospect of companion robots that can take on the kinds of tasks people now do.
“I think we all have dreams of, like, Rosey the Robot [referring to the android maid from The Jetsons] being able to get our trash and do our dishes and make our meals,” says Noel. “They’ll be future iterations. They’ll look totally different in all likelihood. But yes, that’s the long term ambition.”
The success of the coat hanger task is a sign of these possibilities, says Google DeepMind’s Parada. If these skills can be generalised to allow the robot to work with different clothing and in new settings, “you could start imagining these pieces coming together”, she says. “And then you have a robot that really could do your entire laundry, right?”
DeepMind taught a robot dog to open doors with its paw rather than bodying through it. Once learnt, it became part of the foundation model and didn’t have to be taught again — not so different from training a real pet.
How robots learn to live alongside humans remains a tougher problem. Parada gives the example of a machine putting away groceries in a house full of people. The robot would have to understand signals from living occupants, such as when and how they are about to move. Humans often flag such intentions non-verbally in ways that might be tough for an automaton to detect. “You’re going to have to be able to interact with them [humans] proactively, not just passively, and also be able to read the room — which is extremely hard for a robot to do,” Parada says.
Despite these challenges, the conceptual leap to a world where domestic super-robots play a real part in our lives appears smaller than it once was, she adds.
“Many people, when I started working on robotics early on, would say having a robot in your home that is doing your laundry and cleaning around the house is not feasible in this lifetime. I would say most people think now: ‘Oh, yeah, I think we can do that before I retire’.”
New horizons
Longer term, even more advanced ideas are already percolating. One is “liquid” neural networks, which act more like the biological brain than traditional neural networks and use dynamic connections to continuously adapt to and learn from new data.
Such an approach requires fewer neurons, or signal carriers, and therefore less computational capacity. This should make the hardware more compact and therefore less cumbersome for robots to carry.
“How can we bring these two worlds together?” asks Daniela Rus, leader of neural network research at MIT. “And the thesis is that we can use liquid networks — which is a new model for AI.”
These liquid networks are already showing impressive results in the research sphere of self-driving vehicles.
Traditionally, such vehicles struggle at transition times when conditions change, such as at dawn and dusk. While conventional AI models focus a lot of attention on features at the sides of a road, such as vegetation, to instruct the vehicle to drive between them, liquid neural networks focus more on the middle distance of the highway and potential obstacles.
Research shows that liquid networks are better able to distinguish between the crucial and irrelevant aspects of the driving task and is much closer to how humans drive. This has the advantage of making the model, in theory, easier to understand and calibrate.
Rus says we are starting to see the physical realisations in robotics of a lot of things previously thought impossible. But challenges remain in ensuring guardrails are in place, she says. After all, there was a reason the first two laws of robotics laid down by I, Robot author Isaac Asimov focused on preventing harm to humans and ensuring robots followed people’s orders.
“All these cool things that we only dreamed of, we can now begin to realise,” says Rus. “Now we have to make sure that what we do with all these superpowers is good.”
Sources: Robot images and video provided by Boston Dynamics, Google DeepMind, Massachusetts Institute of Technology, Oxford Robotics Institute, Princeton University, Stanford University and the Toyota Research Institute. Dog images from Dreamstime and AI imagery generated using Adobe Photoshop. Some videos of robots have been sped up.
Additional work by Caroline Nevitt, Peter Andringa, Dan Clark and Sam Joiner