Researchers from Apple are probing whether it’s possible to use artificial intelligence to detect when a user is speaking to a device like an iPhone, thereby eliminating the technical need for a trigger phrase like “Siri,” according to a paper published on Friday.
In a study, which was uploaded to Arxiv and has not been peer-reviewed, researchers trained a large language model using both speech captured by smartphones as well as acoustic data from background noise to look for patterns that could indicate when they want help from the device. The model was built in part with a version of OpenAI’s GPT-2, “since it is relatively lightweight and can potentially run on devices such as smartphones,” the researchers wrote. The paper describes over 129 hours of data and additional text data used to train the model, but did not specify the source of the recordings that went into the training set. Six of the seven authors list their affiliation as Apple, and three of them work on the company’s Siri team according to their LinkedIn profiles. (The seventh author did work related to the paper during an Apple internship.)
Advertisement
The results were promising, according to the paper. The model was able to make more accurate predictions than audio-only or text-only models, and improved further as the size of the models grew larger. Beyond exploring the research question, it’s unclear if Apple plans to eliminate the “Hey Siri” trigger phrase.
This story is only available to subscribers.
Don’t settle for half the story.
Get paywall-free access to technology news for the here and now.
Neither Apple, nor the paper’s researchers immediately returned requests for comment.
Currently, Siri functions by holding small amounts of audio and does not begin recording or preparing to answer user prompts until it hears the trigger phrase. Eliminating that “Hey Siri” prompt could increase concerns about our devices “always listening”, said Jen King, a privacy and data policy fellow at the Stanford Institute for Human-Centered Artificial Intelligence.
The way Apple handles audio data has previously come under scrutiny by privacy advocates. In 2019, reporting from The Guardian revealed that Apple’s quality control contractors regularly heard private audio collected from iPhones while they worked with Siri data, including sensitive conversations between doctors and patients. Two years later, Apple responded with policy changes, including storing more data on devices and allowing users to opt-out of allowing their recordings to be used to improve Siri. A class action suit was brought against the company in California in 2021 that alleged Siri is being turned on even when not activated.
The “Hey Siri” prompt can serve an important purpose for users, according to King. The phrases provide a way to know when the device is listening, and getting rid of that might mean more convenience, but less transparency from the device, King told MIT Technology Review. The research did not detail if the trigger phrase would be replaced by any other signal that the AI assistant is engaged.
“I’m skeptical that a company should mandate that form of interaction,” King says.
The paper is one of a number of recent signals that Apple, which is perceived to be lagging behind other tech giants like Amazon, Google, and Facebook in the artificial intelligence race, is planning to incorporate more AI into its products. According to news first reported by VentureBeat, Apple is building a generative AI model called MM1 that can work in text and images, which would be the company’s answer to Open AI’s ChatGPT and a host of other chatbots by leading tech giants. Meanwhile, Bloomberg reported that Apple is in talks with Google about using the company’s AI model Gemini in iPhones, and on Friday the Wall Street Journal reported that it had engaged in talks with Baidu about using that company’s AI products.