Trainable windowing coefficients in DNN for raw audio classification

García , Mario Alejandro; Destefanis, Eduardo; Rosset, Ana Lorena

Trainable windowing coefficients in DNN for raw audio classification

Files

Primary Trainable Windowing Coefficients in DNN for Raw Audio Classification.pdf (484.28 KB)

Date

2020

Authors

García , Mario Alejandro

Destefanis, Eduardo

Rosset, Ana Lorena

Publisher

Cloud computing, big data y emerging topics

Abstract

An artificial neural network for audio classification is pro posed. This includes the windowing operation of raw audio and the calculation of the power spectrogram. A windowing layer is initialized with a hann window and its weights are adapted during training. The non-trainable weights of spectrogram calculation are initialized with the discrete Fourier transform coefficients. The tests are performed on the Speech Commands dataset. Results show that adapting the windowing coefficients produces a moderate accuracy improvement. It is concluded that the gradient of the error function can be propagated through the neural calculation of the power spectrum. It is also concluded that the training of the windowing layer improves the model’s ability to general iz

Keywords

Deep learning, Deep neural network, Speech recognition, Raw audio

URI

https://hdl.handle.net/20.500.12272/13952

Collections

UTN- FRC -Producción Académica de Investigación y Desarrollo - Artículos

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International

Full item page

Trainable windowing coefficients in DNN for raw audio classification

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

SECRETARÍAS

ENLACES UTN

ENLACES EXTERNOS