Are we there yet?
These days, I'm the proud owner of a Roomba vacuum cleaner, and just like the taste of fresh coffee in the morning, there is a taste of victory every time it does its thing on its own; no one spent any time vacuuming, yet floors are always clean. Pure magic.
My awe is long in the making. Like many kids in my generation, I grew up watching reruns of Hannah Barbera's “Los Supersónicos” (The Jetsons); I remember craving to build my own “Robotina” to take care of everyone's chores. When the time came to pick a field for college, I jumped at first sight of Mechatronics, only to change it in dismay over the prospects in the early 2000s. While working in manufacturing, steep investment requirements shattered my automation ambitions. And a decade later, tinkering with Arduinos and Smart-Speaker Assistants led only to orchestrated party tricks.
On paper, the feat tends to appear feasible with the technologies at our disposal; however, upon closer look, their performance and economics are not where we need them to be. At least not yet.
So every now and then, when the inner kid wakes up asking—are we there yet? Are we finally within striking distance from having general-purpose robots?—I'm happy to go and have a look.
If there is one thing of which we can be confident, it is our ability to transform materials into tools. When a sufficiently motivated individual can 3D-print a wind turbine,1 a car chassis,2 or even print a 3D printer,3 there is little concern about our chances of manufacturing the mechanical and electrical components for general-purpose robots.
Things are similarly looking bright on the silicon side; Fifty-seven years in, and Moore's law is still going strong, perhaps with a slight decrease in vigor within the last decade, yet the number of transistors continues to double every two years or so. We expect the momentum to continue not only through miniaturization but with the advancements in chip architectures.4
At this point, if you can imagine it, we can figure out how to build it. And while state-of-the-art hardware tends to carry hefty price tags5, our track record suggests that once we supply the appropriate demand, the system will react with the required capacity and optimizations. After all, capitalism has taught us that where there's money to be made, the opportunities get seized swiftly.
Take the auto industry during its infancy. The Ford Motor Company made purchasing a car 10x cheaper in less than 15 years.6 Comparing the 1920s market size for automobiles in the U.S. with the global market for general-purpose robots of today, a 10x cost reduction within a decade driven by manufacturing economies of scale seems the lower bound of what may be possible.
For a benchmark from this century, look at cloud training. The cost to train the current state-of-the-art-sized language model dropped from $875 million to $4.6 million from 2015 to 2020, with a ~60% year-over-year reduction expected to reach about $500 by 2030.7
So we can economically build the hardware. But can we power it?
How much energy is necessary to power this reality? - Perhaps it is not the right lens to look through.
Take, for example, my beloved Roomba: It charges in about 3 hours at 30 watts, consumes 5 watts on standby mode, and its base station draws 1 watt while the robot is cleaning; Totaling ~53 kWh in a year when programmed to vacuum every other day. Thus, saving me 30 minutes every time it runs, we could say the Roomba's efficiency is 103 human minutes per kilowatt-hour.
My upright vacuum cleaner, for comparison, by consuming 1,600 watts for 30 minutes, saves me about 10 minutes per session (using a broom and dustpan as a baseline) and has an efficiency of 12.5 human minutes per kilowatt-hour.
But, I wouldn't grab a vacuum every other day of the year—not because I don't enjoy clean surfaces, but because I don't have the time for it—hence, the robot ends up increasing the frequency of the activity instead of maintaining the existing pattern.
This effect is not exclusive to robotic vacuum cleaners. We see throughout history that the more efficiently we use energy, the more total energy we consume. Not less.8
We can then be confident that robots will take on more than we do today, driven by efficiency and bounded by energy availability. Which, under our current circumstances, may seem like a showstopper.
However, the current energy crisis driven by the ongoing conflict between nation states is jolting the landscape. Pre-existing plans to decommission nuclear plants are now being reconsidered or delayed,9 and dependency on other nations is an issue of National Security.10
At this point, asserting whether Nuclear Energy sees a grand awakening or whether all the investment flowing into Fusion Energy leads to a breakthrough within the decade is a topic deserving its own analysis.11 Nevertheless, an industry that has been shackled by regulation is now getting a lot of attention. Both capital and governments are motivated to build up capacity.12 Robots may sweeten the deal by making good use of all the extra juice during demand valleys.
It is a great time to jump into energy development.
After not one, but two “Artificial Intelligence Winters,” Deep Neural Networks reinvigorated the field in the early 2010s. From Convolutional Neural Networks achieving super-human performance on vision tasks to the Transformer Architecture, opening the floodgates for Large Language Models making headlines every other week. Model architecture breakthroughs and parallelization techniques signal that the limiting factor is no longer compute power, but creativity.13
In robotics, Navigation and Dexterity training may continue primarily by manufacturers, driven by the benefits of a tighter iteration loop between hardware and software. The initial wave of general-purpose robots will show up in controlled industrial settings.14 Meanwhile, community, industry, and academia continue advancing Reinforcement Learning and Virtual World Simulators as they prove effective training agents for decades worth of iterations within hours of real-world time, closing the gap towards environment generalization.15
Similarly, contextual grounding—outside the usual autonomous driving scenario—already shows promise, with techniques combining Large Language Models, Reinforcement Learning, and Behavioral Cloning already achieving a 61% success execution rate across 101 tasks feasible and contextually appropriate for a given robot.16
But of course, we can think of more than 101 things to do. Task training may come from the public, along the lines of what we see today with content creators. However, it's unlikely we'll only rely on blindly crunching through online videos, as we did for language. With the quality and quantity of the data directly correlating to the results,17 we would want the field experts to be the ones programming the explanations.
And yes, anyone that has seen Codex or DALL-E 2 realizes it is no longer a castle in the sky. Natural language-based programming is already a $10-per-month product in the hands of developers,18 where the implementation of algorithms is delegated to artificial intelligence, freeing programmers to focus on higher-level decisions and fine-tuning.
Thus, over this decade, we expect an explosion of abstractions that allow us to offload an increasing share of the activities. Activities we previously deemed too technical or associated with human understanding and creativity; Removing all barriers for anyone to train a robot to do anything.
So, are we there yet?—well, it sure smells like the table is served. Grab a chair.