Trainable windowing coefficients in DNN for raw audio classification
No Thumbnail Available
Date
2020
Journal Title
Journal ISSN
Volume Title
Publisher
Cloud computing, big data y emerging topics
Abstract
An artificial neural network for audio classification is pro posed. This includes the windowing operation of raw audio and the
calculation of the power spectrogram. A windowing layer is initialized
with a hann window and its weights are adapted during training. The
non-trainable weights of spectrogram calculation are initialized with the
discrete Fourier transform coefficients. The tests are performed on the
Speech Commands dataset. Results show that adapting the windowing
coefficients produces a moderate accuracy improvement. It is concluded
that the gradient of the error function can be propagated through the
neural calculation of the power spectrum. It is also concluded that the
training of the windowing layer improves the model’s ability to general iz
Description
Keywords
Deep learning, Deep neural network, Speech recognition, Raw audio
Citation
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International