Are you in a bad relationship with your smart speaker?

Are you in a bad relationship with your smart speaker or virtual assistant? Maybe you have already given up on conversations with Alexa or Siri and communication has been reduced to a series of nouns and verbs shouted out. Well some good news, new data from vocalize.ai shows that it’s probably not your fault. It could be that your AI powered virtual assistant just can’t hear you.

vocalize.ai is developing a test suite that enables us to evaluate the hearing and understanding capabilities of voice first services such as Google’s Assistant, Amazon’s Alexa and Apple’s Siri. Our key differentiator; a test suite based on audiology and cognitive training procedures commonly used to evaluate human performance. This enables fast and accurate results which can be mapped to a well understood human benchmark. For example, our smart speaker Speech In Noise (SIN) test results are categorized in levels of human hearing loss: normal, mild, moderate, severe.

Even simple spoken requests could be very frustrating when conversing with a smart speaker that exhibits signs of hearing loss. However, the recent example with Amazon’s Alexa shows the consequences can be much higher when a virtual assistant records your conversation and sends it to someone in your contact list… without your knowledge! According to WIRED this simple case of mishear, misinterpret, mishap has prompted two US senators tasked with investigating consumer privacy to send a letter to Amazon CEO Jeff Bezos demanding answers.

Speech In Noise (SIN)

Why is SIN so important? Think about the smart speakers around your home. Each room can present a unique challenge of background noise. For example, Netflix in the living room, dishwasher in the kitchen, music in the bedroom, or maybe just the typical air conditioner hum throughout the house. SIN enables us to quantify how well an automatic speech recognition system handles these noisy variables. It is worth noting that the results point to a clear winners and losers for the key performance metric of Speech In Noise (SIN).

The test consists of a series of sentences presented with varying levels of background noise. The virtual assistant performance is scored by correctly recognizing the key words in each sentence. The test starts with low background noise level, but on consecutive sentences the noise level is increased. At its maximum, the level of background noise is equal to the level of the desired speech. Results for some very popular smart speakers are shown below. A lower score closer to 0 dB SNR Loss represents a capability equal to normal human hearing. A higher score of 7 dB SNR Loss or greater represents a device with moderate to severe hearing loss.

For our initial tests we used two types of noise: four talker babble and pink noise. However, any noise type can be supported e.g. road noise when driving in a car.

What’s Next?

Tests for Speech In Noise (SIN) and Speech Recognition Threshold (SRT) are complete and evaluation services are immediately available for interested partners. Moving forward we are adding cognitive training tests for Rapid Speech (RS) and Competing Speech (CS). In parallel we are creating new datasets with accented speech (e.g. English with Mandarin accent). The end goal is to create a comprehensive AI audiogram which efficiently provides a hearing and understanding performance assessment across gender, age and accent. A new capability that enables us to quantify performance and detect/correct for bias in the system.

Words In Quiet

In April vocalize.ai released our first report that included the results our smart speaker evaluations. Those tests were focused on a well-known audiology procedure; Speech Recognition Threshold (SRT). That test gives an insight on a system’s hearing capabilities of isolated words in an optimal environment with extremely low background noise. This test can be thought of as “Words In Quiet.” For example, asking Alexa for the time in a quiet bedroom. That complete report and results are available on request.

Post

Are you in a bad relationship with your smart speaker?

Speech In Noise (SIN)

What’s Next?

Words In Quiet

Can We Talk?