ZJU_SLM
Lurker
We are researchers on the speech privacy. Our team finds the vulnerability of smartphones against eavesdropping via the built-in zero-permission inertial sensors. More information is referred to our paper that is submitted to IEEE INFOCOM 2022.
Introduction
Do you know your smartphone may still be exposed to eavesdropping risk even if you don't give any privacy-related permission to the APPs or websites? The culprit is the built-in inertial measurement units (IMUs) on your smartphone.
Different from sensors like microphones, cameras, and GPS that have been regarded as sensitive ones and have more rigorous permission control, accessing IMUs on a smartphone requires little or zero permission on both OS and user levels. With a high enough sampling rate, IMUs can pick up speech signals from the on-board loudspeakers in the same smartphone. Some reported attacks exploit this vulnerability to get access to IMUs data stealthy and perform private speech theft by eavesdropping. State-of-the-art attacks can obtain an alarming accuracy on speech recognition of 81% and speaker identification of 78%, so the phone calls, audio media, and responses of voice assistants that may mention locations and daily schedules are under high privacy threat.
To minimize the risk of private speech leakage, Google has placed a restriction on IMUs that the sampling rate is limited to 200 Hz, which is commonly believed as an effective countermeasure. But does this restriction work? Unfortunately, our experiments proved that it seems to make no sense.
Threat
We observe IMUs still perform speech leakage because of aliasing distortions. Taking Huawei P40 as an example, its accelerometer can respond to audio signals of up to 6 kHz, which means part of the high-frequency components of the speech played on speakers would fall into low-frequency bands and makes it possible for us to recover the speech from IMUs' readings sampled within 200 Hz.
Despite sampling rate limitations!
With strictly limiting the sampling rate of IMUs to 200 Hz, we use our spy App to collects IMU readings in the background and achieve a high speech digits recognition accuracy of 78.8%. It proves that Google's restriction on IMUs' sampling rate cannot prevent this kind of eavesdropping on smartphones. To study the defending effectiveness of a restricted sampling rate against IMUs eavesdropping, we perform our attack under lower ones. We are shocked to find out that even the limitation of 40 Hz sampling rate is still at risk since our attack maintains the top-1 accuracy of 49.2% and the top-5 accuracy of exceeding 90%.
Device-independent attacks!
We train an ML-based model using data from one device and test it on other devices (12 different smartphones from mainstream brands like Huawei, Samsung, Apple, Google, Vivo, Oppo, etc.). By using a range of data process techniques to eliminate the influence of hardware diversity between different devices, our attack supports a cross-device accuracy of 33.1% on average, with a peak of 49.8%, which reveals that a high privacy leakage risk remains even when adversaries cannot identify user's device model.
Despite volume setting
We further expand our attack on both the top and bottom speakers of smartphones and study the impact of volume settings. It shows out that our attack distinguishes fewer digits on both speakers as the volume reduces. But even at the worst conditions of the lowest volumes (20%), half of the digits can be recognized successfully, and it keeps top-5 accuracy of at least 89%.
Motion robustness
To evaluate the robustness of our attack against the motion, we collect IMUs data under various real-world scenarios. When the model is trained and tested using handhold setting data, the digits recognition accuracy slightly drops to 72.9%, which is still a threatening result. In an end-to-end attack case, we suppose that a spied victim requests a password from a remote caller while sitting or walking, it recognizes over 60% of digits in passwords in both scenarios. Moreover, with top-3 accuracy exceeding 80%, it affords a significant key space reduction in practical attacks on password eavesdropping.
Countermeasure
Our attack proves the scalability of IMUs-based zero-permission eavesdropping in reality. We appeal for people to take necessary countermeasures to resist its threat, so we summarize existing defenses and propose a practical method with neither additional hardware modification nor inconvenience for users at last.
Sampling rate limitation and secure filters
As mentioned above, the limitation on IMUs sampling rate shows poor performance for speech privacy protection. The aliasing distortion and insecure filters are to blame. It is a plausible solution to using a secure analogy filter and implementing access control on IMUs. However, the former requires hardware modification on the filter circuit, while a low sampling rate and additional access control on IMUs block their convenience and efficient perception.
Damping and isolating
Another idea is to shield built-in IMUs from speech signals. They are expected to be isolated physically or encircled by acoustic dampening materials. However, these methods are unpractical particularly in mobile devices for additional modification, space, and cost.
Our suggestion: Resonant noise
Accelerometers and gyroscopes are sensitive to the acoustic noise of their resonant frequencies. Accelerometers in Samsung Galaxy S8, for instance, resonate with frequencies centered approximately 6.5 kHz. So, users can proactively induce resonant noise using on-board speakers at a low volume to jam IMUs during speeches. These resonant acoustics bring about significant noise into multiple axes simultaneously and effectively confuses recognition with miniature hearing interference on humans.
Introduction
Do you know your smartphone may still be exposed to eavesdropping risk even if you don't give any privacy-related permission to the APPs or websites? The culprit is the built-in inertial measurement units (IMUs) on your smartphone.
Different from sensors like microphones, cameras, and GPS that have been regarded as sensitive ones and have more rigorous permission control, accessing IMUs on a smartphone requires little or zero permission on both OS and user levels. With a high enough sampling rate, IMUs can pick up speech signals from the on-board loudspeakers in the same smartphone. Some reported attacks exploit this vulnerability to get access to IMUs data stealthy and perform private speech theft by eavesdropping. State-of-the-art attacks can obtain an alarming accuracy on speech recognition of 81% and speaker identification of 78%, so the phone calls, audio media, and responses of voice assistants that may mention locations and daily schedules are under high privacy threat.
To minimize the risk of private speech leakage, Google has placed a restriction on IMUs that the sampling rate is limited to 200 Hz, which is commonly believed as an effective countermeasure. But does this restriction work? Unfortunately, our experiments proved that it seems to make no sense.
Threat
We observe IMUs still perform speech leakage because of aliasing distortions. Taking Huawei P40 as an example, its accelerometer can respond to audio signals of up to 6 kHz, which means part of the high-frequency components of the speech played on speakers would fall into low-frequency bands and makes it possible for us to recover the speech from IMUs' readings sampled within 200 Hz.
Despite sampling rate limitations!
With strictly limiting the sampling rate of IMUs to 200 Hz, we use our spy App to collects IMU readings in the background and achieve a high speech digits recognition accuracy of 78.8%. It proves that Google's restriction on IMUs' sampling rate cannot prevent this kind of eavesdropping on smartphones. To study the defending effectiveness of a restricted sampling rate against IMUs eavesdropping, we perform our attack under lower ones. We are shocked to find out that even the limitation of 40 Hz sampling rate is still at risk since our attack maintains the top-1 accuracy of 49.2% and the top-5 accuracy of exceeding 90%.
Device-independent attacks!
We train an ML-based model using data from one device and test it on other devices (12 different smartphones from mainstream brands like Huawei, Samsung, Apple, Google, Vivo, Oppo, etc.). By using a range of data process techniques to eliminate the influence of hardware diversity between different devices, our attack supports a cross-device accuracy of 33.1% on average, with a peak of 49.8%, which reveals that a high privacy leakage risk remains even when adversaries cannot identify user's device model.
Despite volume setting
We further expand our attack on both the top and bottom speakers of smartphones and study the impact of volume settings. It shows out that our attack distinguishes fewer digits on both speakers as the volume reduces. But even at the worst conditions of the lowest volumes (20%), half of the digits can be recognized successfully, and it keeps top-5 accuracy of at least 89%.
Motion robustness
To evaluate the robustness of our attack against the motion, we collect IMUs data under various real-world scenarios. When the model is trained and tested using handhold setting data, the digits recognition accuracy slightly drops to 72.9%, which is still a threatening result. In an end-to-end attack case, we suppose that a spied victim requests a password from a remote caller while sitting or walking, it recognizes over 60% of digits in passwords in both scenarios. Moreover, with top-3 accuracy exceeding 80%, it affords a significant key space reduction in practical attacks on password eavesdropping.
Countermeasure
Our attack proves the scalability of IMUs-based zero-permission eavesdropping in reality. We appeal for people to take necessary countermeasures to resist its threat, so we summarize existing defenses and propose a practical method with neither additional hardware modification nor inconvenience for users at last.
Sampling rate limitation and secure filters
As mentioned above, the limitation on IMUs sampling rate shows poor performance for speech privacy protection. The aliasing distortion and insecure filters are to blame. It is a plausible solution to using a secure analogy filter and implementing access control on IMUs. However, the former requires hardware modification on the filter circuit, while a low sampling rate and additional access control on IMUs block their convenience and efficient perception.
Damping and isolating
Another idea is to shield built-in IMUs from speech signals. They are expected to be isolated physically or encircled by acoustic dampening materials. However, these methods are unpractical particularly in mobile devices for additional modification, space, and cost.
Our suggestion: Resonant noise
Accelerometers and gyroscopes are sensitive to the acoustic noise of their resonant frequencies. Accelerometers in Samsung Galaxy S8, for instance, resonate with frequencies centered approximately 6.5 kHz. So, users can proactively induce resonant noise using on-board speakers at a low volume to jam IMUs during speeches. These resonant acoustics bring about significant noise into multiple axes simultaneously and effectively confuses recognition with miniature hearing interference on humans.