Trainable windowing coefficients in DNN for raw audio classification

No Thumbnail Available

Date

2020

Journal Title

Journal ISSN

Volume Title

Publisher

Cloud computing, big data y emerging topics

Abstract

An artificial neural network for audio classification is pro posed. This includes the windowing operation of raw audio and the calculation of the power spectrogram. A windowing layer is initialized with a hann window and its weights are adapted during training. The non-trainable weights of spectrogram calculation are initialized with the discrete Fourier transform coefficients. The tests are performed on the Speech Commands dataset. Results show that adapting the windowing coefficients produces a moderate accuracy improvement. It is concluded that the gradient of the error function can be propagated through the neural calculation of the power spectrum. It is also concluded that the training of the windowing layer improves the model’s ability to general iz

Description

Keywords

Deep learning, Deep neural network, Speech recognition, Raw audio

Citation

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International