Abstract
Voice communication is a growing technology for human interaction with smart devices. Keyword spotting (KWS) plays a crucial role in voice communication systems because it is an always-on part that should accurately detect specific keywords to trigger and turn on other parts. In this paper, we investigate how the classification accuracy, memory footprint, and latency of a compact binary KWS architecture vary by changing the kernel channel size (depth) of its convolutional layers. The investigated architecture is a quantized fully integer architecture with 8-bit input and output data. Based on our evaluations for kernel depth between 1 to 10, the increase of depth quadratically increases memory footprint but the accuracy is saturated for depths greater than 4.
Original language | English |
---|---|
Title of host publication | HORA 2024 - 6th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, Proceedings |
ISBN (Electronic) | 9798350394634 |
DOIs | |
State | Published - 2024 |
Event | 6th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, HORA 2024 - Istanbul, Turkey Duration: May 23 2024 → May 25 2024 |
Publication series
Name | HORA 2024 - 6th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, Proceedings |
---|
Conference
Conference | 6th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, HORA 2024 |
---|---|
Country/Territory | Turkey |
City | Istanbul |
Period | 5/23/24 → 5/25/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Keywords
- Compact Deep Neural Networks
- Edge Computing
- Quantization
- Quantized Inference
- Real-Time Operation
- Short Time Fourier Transform
- Spotting
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Signal Processing
- Control and Optimization
- Human-Computer Interaction