Hello there,
somewhat inspired by the arrival of "simple and somewhat good enough" audio codecs in the hobbyist space, I tinkered with my own contribution to the codec proliferation situation.
The result is the "TOY"-codec, available over at https://github.com/maikmerten/toy-audio-codec
This codec is an experimental learning device. The bitstream may change at any time. Do not use to store audio unless you spin your own frozen version.
Features:
- A bare-bones MDCT-approach with one single (somewhat short) MDCT size (currently 256)
- Huffman-coding of quantized MDCT coefficients
- up to 256 channels
- mid-side stereo coding
- sample-accurate length of decoded audio
- aiming at a low decoder implementation complexity
The encoder is a mess and has no psychoacoustic model - it doesn't really do a proper analysis of the incoming audio data. In ABR mode, it'll just increase quantizers for all bands until the Huffman-coded bits fit into the budget. In VBR mode, it'll increase per-band quantizers until a noise target is reached - and the per-band noise calculation is not rooted at all in some kind of proper model. For stereo streams, the encoder will always choose mid-side-coding and fiddle with bitrate/noise targets for mid and side, without actually looking at the audio. There is no kind of transient detection. Bad choices all around.
In summary, encoder and bitstream are really primitive and the fact that the codec produces somewhat okay audio at ca. 200 kbps is testament to the MDCT's robustness, not testament to the quality of this particular implementation.
If anybody is curious, this encodes a 16-bit 44.1 or 48 kHz WAV file to a TOY stream with default settings:
java -jar ToyCodec.jar -i music.wav -o music.toy
This decodes the TOY stream back to a WAV file:
java -jar ToyCodec.jar -d -i music.toy -o music-decoded.wav
Full list of options:
java -jar ToyCodec.jar
+--------------------------------------------+
| TOY-codec, enjoy (but don't seriously use) |
+--------------------------------------------+
usage: <OPTIONS>
-d,--decode decode compressed file
-i,--input <arg> input file to process
-l,--lowpass <arg> encoder lowpass in Hz
-o,--output <arg> write output to this file
-q,--quality <arg> encoding quality, VBR operation
-r,--ratio <arg> approx. compression ratio, ~ABR operation
"Usable" ranges for "--ratio" (ABR) are 2 (high quality) to 8 (low quality), "usable" quality settings for "--quality" (VBR) are 1 (high quality) to 50 (low quality). Currently, ABR mode provides better quality as my approach to computing noise in VBR mode is way off.
Enjoy tinkering! (But don't seriously use, really)