Abstract: In this paper, we propose a method to improve the accuracy of speech emotion recognition (SER) by using vision transformer (ViT) to attend to the correlation of frequency (y-axis) with time ...
Abstract: The increasing ability of deep learning models to produce realistic-sounding synthetic speech poses serious problems for privacy, public trust, and digital security. To counter this danger, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results