TY - GEN
T1 - The multilayer perceptron vector quantized variational autoencoder for spectral envelope quantization
AU - Srikotr, Tanasan
AU - Mano, Kazunori
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/1
Y1 - 2020/1
N2 - Recently, deep generative learning has been introduced to replace the conventional mathematical models. In speech processing, the vector quantization was the effective compression method to reduce the amount of speech data before transmitting. In this paper, we propose The Multilayer Perceptron Vector Quantized Variational Autoencoder (MLP-VQ-VAE) to manage the flexibility of controlling the number of z-latent vectors to quantize and embedding space size efficiently. The MLP-VQVAE replaces the Convolutional Neural Network (CNN) with Multilayer Perceptron (MLP) in the encoder network and the decoder network of Vector Quantized Variational AutoEncoder (VQ-VAE) to receive the size of the effectively z-latent vectors for quantization and also the ability of dimensional reduction. In the experiments, the MLP-VQ-VAE is applied to quantize spectral envelope parameters from the 48 kHz high-quality vocoder named WORLD. The MLP-VQ-VAE reduces the memory sizes of the representation of z-latent or the length of vectors to quantize and embedding space size or codebook size around 1.6 times compared to the conventional vector quantization and 21.4 times for VQVAE. The proposed method decreases the Log Spectral Distortion around 1.1 dB lower than the conventional VQ and around 2.5 dB than the VQ-VAE.
AB - Recently, deep generative learning has been introduced to replace the conventional mathematical models. In speech processing, the vector quantization was the effective compression method to reduce the amount of speech data before transmitting. In this paper, we propose The Multilayer Perceptron Vector Quantized Variational Autoencoder (MLP-VQ-VAE) to manage the flexibility of controlling the number of z-latent vectors to quantize and embedding space size efficiently. The MLP-VQVAE replaces the Convolutional Neural Network (CNN) with Multilayer Perceptron (MLP) in the encoder network and the decoder network of Vector Quantized Variational AutoEncoder (VQ-VAE) to receive the size of the effectively z-latent vectors for quantization and also the ability of dimensional reduction. In the experiments, the MLP-VQ-VAE is applied to quantize spectral envelope parameters from the 48 kHz high-quality vocoder named WORLD. The MLP-VQ-VAE reduces the memory sizes of the representation of z-latent or the length of vectors to quantize and embedding space size or codebook size around 1.6 times compared to the conventional vector quantization and 21.4 times for VQVAE. The proposed method decreases the Log Spectral Distortion around 1.1 dB lower than the conventional VQ and around 2.5 dB than the VQ-VAE.
UR - http://www.scopus.com/inward/record.url?scp=85082579365&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082579365&partnerID=8YFLogxK
U2 - 10.1109/ICCE46568.2020.9043006
DO - 10.1109/ICCE46568.2020.9043006
M3 - Conference contribution
AN - SCOPUS:85082579365
T3 - Digest of Technical Papers - IEEE International Conference on Consumer Electronics
BT - 2020 IEEE International Conference on Consumer Electronics, ICCE 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Consumer Electronics, ICCE 2020
Y2 - 4 January 2020 through 6 January 2020
ER -