Skip to main content


Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: [STUDY] Efficient Neural Audio Coding with Psychoacoustical Calibration (Read 525 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

[STUDY] Efficient Neural Audio Coding with Psychoacoustical Calibration

It is common to leverage human perception of sound, or psychoacoustics, in conventional audio coding technology, to reduce the bitrate while preserving the perceptual quality in the decoded audio signals. When it comes to using deep neural networks for this compression task, however, the objective nature of the loss function tends to lead to a suboptimal sound quality as well as a high run-time complexity due to the large model size. In this work, we present psychoacoustic calibration schemes to re-define the loss functions of neural audio coding systems, so that the decoded signals are perceptually more similar to the original. Moreover, the proposed psychoacoustic optimization can also result in a more streamlined system. To this end, we derive novel loss functions from the empirically found global masking threshold. Experimental results show that the proposed psychoacoustic loss functions yield better performances than an ordinary autoencoding-based baseline codec, which has up to 100% more parameters and consumes 36.5% more bits per second. The performance is comparable with the commercial MPEG-1 Audio Layer III codec in 112 kbps.
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: