this low-Cost Audio Detector Keeps Voice Assistants from Snooping on You

Nelson Régo

4 years ago

Meet LeakyPick, the low-cost audio spy detector for network-connected devices.

Researchers have developed a device that may one day allow users to take back their privacy by warning when these devices are mistakenly or intentionally snooping on you.

LeakyPick is placed in various rooms of a home or office to detect the presence of devices that stream nearby audio to the Internet. By periodically emitting sounds and monitoring subsequent network traffic (it can be configured to send the sounds when users are away), the ~$40 prototype detects the transmission of audio with 94-percent accuracy. The device monitors network traffic and provides an alert whenever the identified devices are streaming ambient sounds.

LeakyPick also tests devices for wake word false positives, i.e., words that incorrectly activate the assistants. So far, the researchers’ device has found 89 words that unexpectedly caused Alexa to stream audio to Amazon.

“For many privacy-conscious consumers, having Internet-connected voice assistants [with] microphones scattered around their homes is a concerning prospect, despite the fact that smart devices are promising technology to enhance home automation and physical safety,” Ahmad-Reza Sadeghi, one of the researchers who designed the device, said. “The LeakyPick device identifies smart home devices that unexpectedly record and send audio to the Internet and warns the user about it.”

Voice-controlled devices typically use local speech recognition to detect wake words, and for usability, the devices are often programmed to accept similar-sounding words. When a nearby utterance resembles a wake word, the assistants send audio to a server that has more comprehensive speech recognition. Besides falling to these inadvertent transmissions, assistants are also vulnerable to hacks that deliberately trigger wake words that send audio to attackers or carry out other security-compromising tasks.

YOU MIGHT ALSO LIKE Unboxing and Setting up the Aira Glasses for the Visually Impaired

In a paper published early this month, Sadeghi and other researchers—from Darmstadt University, the University of Paris Saclay, and North Carolina State University—wrote:

The goal of this paper is to devise a method for regular users to reliably identify IoT devices that 1) are equipped with a microphone, and 2) send recorded audio from the user’s home to external services without the user’s awareness. If LeakyPick can identify which network packets contain audio recordings, it can then inform the user which devices are sending audio to the cloud, as the source of network packets can be identified by hardware network addresses. This provides a way to identify both unintentional transmissions of audio to the cloud, as well as above-mentioned attacks, where adversaries seek to invoke specific actions by injecting audio into the device’s environment.
Achieving all of that required the researchers to overcome two challenges. The first is that most assistant traffic is encrypted. That prevents LeakyPick from inspecting packet payloads to detect audio codecs or other signs of audio data. Second, with new, previously unseen voice assistants coming out all the time, LeakyPick also has to detect audio streams from devices without prior training for each device.

To clear the hurdles, LeakyPick periodically transmits audio in a room and monitors the resulting network traffic from connected devices. By temporarily correlating the audio probes with observed characteristics of the network traffic that follows, LeakyPick enumerates connected devices that are likely to transmit audio. One way the device identified likely audio transmissions is by looking for sudden bursts of outgoing traffic. Voice-activated devices typically send limited amounts of data when inactive. A sudden surge usually indicates a device has been activated and is sending audio over the Internet.

YOU MIGHT ALSO LIKE Alexa is on Fire!

Using bursts alone is prone to false positives. To weed them out, LeakyPick employs a statistical approach based on an independent two-sample t-test to compare features of a device’s network traffic when idle and when it responds to audio probes. This method has the added benefit of working on devices the researchers have never analyzed. The method also allows LeakyPick to work not only for voice assistants that use wake words, but also for security cameras and other Internet-of-things devices that transmit audio without wake words.

So far, LeakyPick—which gets its name from its mission to pick up the audio leakage of network-connected devices, has uncovered 89 non-wake words that can trigger Alexa into sending audio to Amazon. With more use, LeakyPick is likely to find additional words in Alexa and other voice assistants. The researchers have already found several false positives in Google Home.

Besides detecting inadvertent audio transmissions, the device will spot virtually any activation of a voice assistant, including those that are malicious. An attack demonstrated last year caused devices to unlock doors and start cars when they were connected to a smart home by shining lasers at the Alexa, Google Home, and Apple Siri devices. Sadeghi said LeakyPick would easily detect such a hack.

The prototype hardware consists of a Raspberry Pi 3B connected by Ethernet to the local network. It’s also connected by a headphone jack to a PAM8403 amplifier board, which in turn connects to a single generic 3W speaker. The device captures network traffic using a TP-LINK TL-WN722N USB Wi-Fi dongle that creates a wireless access point using hostapd and dnsmasq as the DHCP server. All wireless IoT devices in the vicinity will then connect to that access point.

YOU MIGHT ALSO LIKE How to Know Which iPhone You Have

To give LeakyPick Internet access, the researchers activated packet forwarding between the ethernet (connected to the network gateway) and wireless network interfaces. The researchers wrote LeakyPick in Python. They use tcpdump to record packets and Google’s text-to-speech engine to generate the audio played by the probing device.