Apple seemed slow to jump on the generative AI bandwagon, but new research related to contextual understanding might make Siri better than ChatGPT.
The tech giant was conspicuously quiet during the meteoric rise of ChatGPT and the subsequent barrage of generative AI tools and features from companies like Google, Microsoft, and Meta. But Apple researchers have a new model that could give Siri the generative AI upgrade Apple fans have been hoping for.
“Human speech typically contains ambiguous references such as ‘they’ or ‘that,’ whose meaning is obvious (to other humans) given the context,” said the researchers. The paper proposes a model called ReALM (Reference Resolution As Language Modeling) that tackles the problem of large language models (LLMs) not always being able to understand context when it comes to on-screen, conversational, and background references (e.g., apps or features running in the background) with the goal of achieving a “true hands-free experience in voice assistants.”
While ChatGPT is pretty good and certain kinds of context understanding, researchers said ReALM outperforms GPT-3.5 and GPT-4 (which power free and paid versions of ChatGPT) on all of its context tests. Here’s what that could mean for Siri.
1. On-screen context clues
Apple researchers trained ReALM using “on-screen” data from web pages, including contact information, enabling the model to comprehend text within screenshots (e.g., addresses and bank account details). While GPT-4 can also understand images, it wasn’t trained on screenshots, which the paper argues makes ReALM better at understanding on-screen information that Apple users would be asking Siri for help with.
2. Conversational and background understanding
Conversational references mean something that’s relevant to the conversation, but maybe not explicitly mentioned in the prompt. From training ReALM on data like lists of businesses, the model can understand prompts like “call the bottom one” in reference to a list of nearby pharmacies shown on the screen, without needing to provide more specific instructions.
ReALM is capable of understanding “background entities,” which means something running in the background of a device “that might not necessarily be a direct part of what the user sees on their screen or their interaction with the virtual agent,” such as music playing or an alarm going off.
3. Completely on-device
Last but not least, ReALM is designed to be on-device, which would be a big deal since LLMs require lots of computing power and are therefore mostly cloud-based. Instead, ReALM is a smaller LLM, “but fine-tuned for specifically and explicitly for the task of reference resolution.” Apple has historically touted its commitment to privacy as a selling point for its devices, so a generative AI version of Siri that runs completely on the device would be both very on-brand and a major achievement for devices with AI capabilities.
Apple has been predictably tight-lipped about its AI plans, but CEO Tim Cook said a big AI announcement is expected later this year, so all eyes are on Apple’s Worldwide Developers Conference (WWDC) on June 10.