Intuitive and simple user interface drives the growing demand for voice control, either complementing or replacing keyboards, touchscreens and other traditional controls.
The multiplicity of solutions available for Voice Activity Detection (VAD), combined with the absence of vendor-independent benchmarks, makes it hard for any purchaser to select the best solution. Today, neither unified performances nor benchmark exist to define the quality of a Voice Activity Detector, leaving OEM and Fabless developers without the capability to appreciate performances.
Dolphin Integration fills this benchmarking gap by making publicly available MIWOK™, a vendor-independent benchmark based on objective criteria to assess the performance of any VAD solutions.
For those who are not satisfied by the subjectivity of a mere demonstration, MIWOK™ provides companies embedding, using or developing a VAD, with a benchmark to statistically assess and specify the key voice activity detection performances.
What is MIWOK™?
MIWOK™ is a public and open-source benchmark, under the “Creative Commons ShareAlike” license. It contains a set of words, representative of language characteristics, and a set of background noises, representative of multiple near-field and far-field environments, combined by a script.
What can be measured with MIWOK™?
With MIWOK™ benchmark, the following performances may be measured:
- Detection Latency (DL): it must be short
- Noise detected as Voice (NDV): it must be low
- Voice detected as Voice (VDV): it must be high
On which type of VAD implementation can MIWOK™ be applied?
MIWOK™ is applicable to any VAD implementation:
- Hardware implementation with high-level language models described in Matlab, Verilog, Verilog-A,…
- Software code (C/C++…) for DSP or Application Processor
- Final product with an analog, I2S or PDM interface and featuring an access to the wake-up interrupt signal
How does MIWOK guarantee the representation of the language in its entirety?
MIWOK™ benchmark encompasses a set of words representing the common first phonemes used in a given language. These words have been recorded from both men and women to ensure a realistic spectrum distribution on the audio bandwidth.
MIWOK™ is available in diverse languages (e.g. in Chinese and soon in English) and is proposed with a set of both Near-Field and Far-Field background noises that can be combined by any developer using a provided script.
This allows Dolphin Integration to launch its voice activity detector – WhisperTrigger™ – to design voice systems which are both less power consuming and better in VAD performances. The WhisperTrigger™ is proposed as a stand-alone and hard block, for integration on silicon either in a digital microphone, in an analog microphone or with an application processor, to wake-up the system without voice data loss.
Diverse WhisperTriggers™ are proposed in numerous processes from 180 nm down to 16 nm. Thanks to their short detection latency, they are ideally suited for triggering “key word spotting” or “voice recognition algorithms”, enabling a fast awakening of the voice subsystem.