The race is on to bring the technology behind ChatGPT to the smartphone in your pocket. And to judge from the surprising speed at which the technology is advancing, the latest moves in AI could transform mobile communications and computing far faster than seemed likely just months ago.
As tech companies rush to embed generative AI into their software and services, they face significantly higher computing costs. The concern has weighed in particular on Google, with Wall Street analysts warning that the company’s profit margins could be squeezed if internet search users come to expect AI-generated content in standard search results.
Running generative AI on mobile handsets, rather than through the cloud on servers operated by Big Tech groups, could answer one of the biggest economic questions raised by the latest tech fad.
Google said last week that it had managed to run a version of PaLM 2, its latest large language model, on a Samsung Galaxy handset. Though it did not publicly demonstrate the scaled-down model, called Gecko, the move is the latest sign that a form of AI that has required computing resources only found in a data centre is quickly starting to find its way into many more places.
The shift could make services such as chatbots far cheaper for companies to run and pave the way for more transformative applications using generative AI.
“You need to make the AI hybrid — [running in both] the data centre and locally — otherwise it will cost too much money,” Cristiano Amon, chief executive of mobile chip company Qualcomm told the Financial Times. Tapping into the unused processing power on mobile handsets was the best way to spread the cost, he said.
When the launch of ChatGPT late last year brought generative AI to widespread attention, the prospect of bringing it to handsets seemed distant. Besides training the so-called large language models behind such services, the work of inferencing — or running the models to produce results — is also computationally demanding. Handsets lack the memory to hold large models like the one behind ChatGPT, as well as the processing power required to run them.
Generating a response to a query on a device, rather than waiting for a remote data centre to produce a result, could also reduce the latency, or delay, from using an application. When a user’s personal data is used to refine the generative responses, keeping all the processing on a handset could also enhance privacy.
More than anything, generative AI could make it easier to carry out common activities on a smartphone, for instance when it comes to things that involve producing text. “You could embed [the AI] in every office application: You get an email, it suggests a response,” said Amon. “You’re going to need the ability to run those things locally as well as on the data centre.”
Rapid advances in some of the underlying models have changed the equation. The biggest and most advanced, such as Google’s PaLM 2 and OpenAI’s GPT-4, have hogged the headlines. But an explosion of smaller models has made some of the same capabilities available in less technically demanding ways. These have benefited in part from new techniques for tuning language models based on a more careful curation of the data sets they are trained on, reducing the amount of information they need to hold.
According to Arvind Krishna, chief executive of IBM, most companies that look to use generative AI in their own services will get much of what they need by combining a number of these smaller models. Speaking last week as IBM announced a technology platform to help its customers tap into generative AI, he said that many would opt to use open-source models, where the code was more transparent and could be adapted, in part because it would be easier to fine-tune the technology using their own data.
Some of the smaller models have already demonstrated surprising capabilities. They include LLaMa, an open-source language model released by Meta, which is claimed to have matched many of the features of the largest systems.
LLaMa comes in various sizes, the smallest of which has only 7bn parameters, far fewer than the 175bn of GPT-3, the breakthrough language model OpenAI released in 2020 — the number of parameters in GPT-4, released this year, has not been disclosed. A research model based on LLaMa and developed at Stanford University has already been shown running on one of Google’s Pixel 6 handsets.
As well as their far smaller size, the open-source nature of models like this has also made it easier for researchers and developers to adapt them for different computing environments. Qualcomm earlier this year showed off what it claimed was the first Android handset running Stable Diffusion’s image-generation model, which has about 1bn parameters. The chipmaker had “quantised”, or cut down the model’s size to run it more easily on a handset without losing any of its accuracy, said Ziad Asghar, a senior vice-president at Qualcomm.
With most of the work on tailoring the models to handsets still at an experimental stage, it was too early to assess whether the efforts would lead to truly useful mobile applications, said Ben Bajarin, an analyst at Creative Strategies. He predicted relatively rudimentary apps, like voice-controlled photo-editing functions and simple question-answering, from the first wave of mobile models with between 1bn and 10bn parameters.
Zoubin Ghahramani, vice-president at Google DeepMind, the internet company’s AI research arm, said that its Gecko mobile model could process 16 tokens per second — a measure based on the number of short text units large language models work with. Most large models use 1-2 tokens per word generated, suggesting that Gecko might produce about 10-15 words per second on a handset, potentially making it suitable for suggesting text messages or short email responses.
The particular requirements of mobile handsets meant that attention was likely to shift quickly to so-called multimodal models that can work with a range of image, text and other inputs, said Asghar at Qualcomm. Mobile applications were likely to turn heavily on speech and images, he added, rather than text-heavy applications more common on a personal computer.
The surprising speed with which generative AI is starting to move to smartphones, meanwhile, is set to increase the attention on Apple, which has so far stood apart from the speculative frenzy around the technology.
Well-known flaws in generative AI, such as the tendency of large models to “hallucinate” — or when the chatbot responds with fabricated information — meant Apple was unlikely to embed the technology into the iPhone’s operating system for some time, said Bajarin. Instead, he predicted that the company would look for ways to make it easier for app developers to start experimenting with the technology in their own services.
“This is the posture you’ll see from Microsoft and Google as well: they’ll all want to give developers the tools to go and compete [with generative AI applications,]” Bajarin said.
With Apple’s Worldwide Developers Conference set to begin on June 5, preceded by Microsoft’s own event for developers called Build, the fight for developer attention is about to get intense. Generative AI may still be in its infancy, but the rush to get into many more users’ hands — and pockets — is already moving into overdrive.