Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: TOY, a lossy and experimental "do not use" codec (Read 10942 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

TOY, a lossy and experimental "do not use" codec

Hello there,

somewhat inspired by the arrival of "simple and somewhat good enough" audio codecs in the hobbyist space, I tinkered with my own contribution to the codec proliferation situation.

The result is the "TOY"-codec, available over at https://github.com/maikmerten/toy-audio-codec

This codec is an experimental learning device. The bitstream may change at any time. Do not use to store audio unless you spin your own frozen version.

Features:
  • A bare-bones MDCT-approach with one single (somewhat short) MDCT size (currently 256)
  • Huffman-coding of quantized MDCT coefficients
  • up to 256 channels
  • mid-side stereo coding
  • sample-accurate length of decoded audio
  • aiming at a low decoder implementation complexity

The encoder is a mess and has no psychoacoustic model - it doesn't really do a proper analysis of the incoming audio data. In ABR mode, it'll just increase quantizers for all bands until the Huffman-coded bits fit into the budget. In VBR mode, it'll increase per-band quantizers until a noise target is reached - and the per-band noise calculation is not rooted at all in some kind of proper model. For stereo streams, the encoder will always choose mid-side-coding and fiddle with bitrate/noise targets for mid and side, without actually looking at the audio. There is no kind of transient detection. Bad choices all around.

In summary, encoder and bitstream are really primitive and the fact that the codec produces somewhat okay audio at ca. 200 kbps is testament to the MDCT's robustness, not testament to the quality of this particular implementation.

If anybody is curious, this encodes a 16-bit 44.1 or 48 kHz WAV file to a TOY stream with default settings:

Code: [Select]
java -jar ToyCodec.jar -i music.wav -o music.toy 

This decodes the TOY stream back to a WAV file:

Code: [Select]
java -jar ToyCodec.jar -d -i music.toy -o music-decoded.wav

Full list of options:

Code: [Select]
java -jar ToyCodec.jar 
+--------------------------------------------+
| TOY-codec, enjoy (but don't seriously use) |
+--------------------------------------------+
usage: <OPTIONS>
 -d,--decode          decode compressed file
 -i,--input <arg>     input file to process
 -l,--lowpass <arg>   encoder lowpass in Hz
 -o,--output <arg>    write output to this file
 -q,--quality <arg>   encoding quality, VBR operation
 -r,--ratio <arg>     approx. compression ratio, ~ABR operation

"Usable" ranges for "--ratio" (ABR) are 2 (high quality) to 8 (low quality), "usable" quality settings for "--quality" (VBR) are 1 (high quality) to 50 (low quality). Currently, ABR mode provides better quality as my approach to computing noise in VBR mode is way off.

Enjoy tinkering! (But don't seriously use, really)

Re: TOY, a lossy and experimental "do not use" codec

Reply #1
Good to see that there's still people who make experimental things.

Re: TOY, a lossy and experimental "do not use" codec

Reply #2
Hello,

I wanted to ask what you (as the developer) want as goals for this project- whether it be short-term and/or long-term?

It's regarding any other members' expectations of what direction the project is aiming to accomplish. Understand, I get more than a little bummed out over requests and suggestions that are unreasonable (which don't serve much to help the project along but serve individuals' self-interest).

Happy New Year to all :)
"Something bothering you, Mister Spock?"

Re: TOY, a lossy and experimental "do not use" codec

Reply #3
Hi Destroid,

my goals are not well-defined or set in stone. I whipped up that code to mash very basic audio coding technologies into a "working" audio codec. I find that actually programming helps me understand things in greater detail than just keeping thing in the realm of theory.

I'm also somewhat inspired by QOA, but feel that by making the leap into the frequency domain one can achieve improved coding efficiency with modest increase in complexity. However, the proof is in the pudding, so I needed an experimental vessel.

As for goals and non-goals:
  • TOY as a format won't compete with AAC-LC, Vorbis or Opus in terms of coding efficiency. I hope that TOY remains simple enough so a skillful person could write a decoder in an afternoon or two. I somewhat like the idea of "single-file" decoders that applications with audio-compression needs can just ingest.
  • I don't expect TOY to compete with AAC/Vorbis/Opus for music library storage. Perhaps its useful in some "embedded" scenarios like video games. Apart from dealing with pre-echo, I've chosen a rather short transform width to keep algorithmic delay low-ish, so TOY may be useful in networked scenarios.
  • Of course, I'd love discussion on how to improve things where the current format and/or implementation just blunders. For instance, my current approach to compute "noise" for quantized coefficients during VBR encoding doesn't map to psychoacoustics at all. My approach to determine whether quantizers are "similar enough" to avoid transmitting quantizer information is pure madness. I'm also somewhat suspicious of my approach to entropy coding etc.
  • Ideally TOY should be somewhat well-documented and thus invite experimentation. Thus https://github.com/maikmerten/toy-audio-codec/blob/main/format.md

Re: TOY, a lossy and experimental "do not use" codec

Reply #4
I'm playing with this "TOY" right now. It seems to produce better quality than MPEG Audio Layer 2 (twolame encoder) for ~160-128kbps.  :)

 

Re: TOY, a lossy and experimental "do not use" codec

Reply #5
problems found...  :-\
(i used ratio 7.7 for toy encoding, somewhat near to 160kbps)
btw, when there aren't problems, toy is comparable to mp3, though encoding is much slower.

Re: TOY, a lossy and experimental "do not use" codec

Reply #6
problems found...  :-\
(i used ratio 7.7 for toy encoding, somewhat near to 160kbps)
btw, when there aren't problems, toy is comparable to mp3, though encoding is much slower.

The problem of 4 appears to be exclusive to ratio mode (more obvious with higher (worse) ratio values). FYI, TOY is much worse than MP3 at lower bitrates (by its nature) (excluding ratio mode because of this bug, I missed this).

Also, these cutoff points (yes, including MP3's) are too high for most people.

Re: TOY, a lossy and experimental "do not use" codec

Reply #7
By the way, are the band points defined solely as real-word frequencies (same for all sampling rates) @maikmerten ? And, if yes, as lower sampling rates get fewer bands this way, does the space for other bands get wasted?

I also think that frequencies below 900Hz are in the need of more bands as there's not enough precision for them (to me), and using only one band for everything below 150Hz is not a good idea. At least, for music.

Re: TOY, a lossy and experimental "do not use" codec

Reply #8
@a.ok.in Well, thanks for testing - personally I guess the performance is more like MPEG-1 Layer II, not MP3 - simply because the encoder has no psychoacoustic model. Also, I guess the entropy coding is plain stupid. I completely expect MP3 to outperform TOY and don't trust TOY with bitrates below 200 kbps.

@Klymins The current encoder determines bands according to frequencies. Basically, for anything lower than 44.1 kHz, some bands would go unused. The current frequency layout is a vague approximation of the Opus band layout.

The band layout is transmitted in the TOY file and encoders can choose their own band layout. A proper, non-experimental codec would just go for a predetermined, fixed layout.

Re: TOY, a lossy and experimental "do not use" codec

Reply #9
I think [0, 40, 60, 90, 135, 200, 300, 450, 675, 1020, 1530, 2295, 3825, 6375, 10625, 15000] would be better for 29400Hz and above sampling rates. For 24kHz and below, simply multiplying these values (to reduce them) should be okay while not being the best choice.

I didn't try this because I couldn't compile the Java files (it says "invalid flag"). And this happened after I manually navigated to the folder which has the compiler, as Windows didn't even see it in the path.

Re: TOY, a lossy and experimental "do not use" codec

Reply #10
I now thought that [0, 40, 80, 120, 180, 270, 405, 608, 912, 1368, 2052, 3078, 4617, 6924, 10386, 15579] will probably do better, I wish I could edit my previous post.

Note: Nuances in the previous post still apply, and these band points are intended for a longer block length like 2304 samples.

Also,
Quote
A proper, non-experimental codec would just go for a predetermined, fixed layout
I can't fully agree with this.

Re: TOY, a lossy and experimental "do not use" codec

Reply #11
Keep in mind that the MDCT-width plays into the band layout. TOY currently uses a single 256-width MDCT. This means that for a 44.1 kHz input signal, each MDCT line represents ca. 86 Hz of audio bandwidth.

Re: TOY, a lossy and experimental "do not use" codec

Reply #12
Keep in mind that the MDCT-width plays into the band layout. TOY currently uses a single 256-width MDCT. This means that for a 44.1 kHz input signal, each MDCT line represents ca. 86 Hz of audio bandwidth.

I know, and this is why I wrote that note. Also, according to your documentation, TOY doesn't force an MDCT width, but the encoder does.

Do you know how can I compile the Java files?

Re: TOY, a lossy and experimental "do not use" codec

Reply #13
You need a somewhat recent (not much older than 10 years) Java Development Kit and the Maven build system.

Then a simple

Code: [Select]
mvn clean compile assembly:single

will yield an executable jar archive in the "target" directory.

It's true that current TOY has no fixed MDCT-width, but it also has no block-size switching mechanism. If you increase frequency resolution by making the MDCT wider, you'll lose time resolution.

Re: TOY, a lossy and experimental "do not use" codec

Reply #14
I did it, and don't think there's a perceptible loss of time resolution. Ratio mode is still buggy (maybe even more) but it sounds pretty good at ~130kbps for an encoder with no psychoacoustic model. But the problems are: I can't lower the bitrate further (I could in the original version), and the encoding process takes too long (it wasn't fast enough in the original version too). Do you know the reasons?

I attached the test samples (not very short unfortunately - 22 seconds): Sonic Heroes by Crush 40. Because of the absence of mind, though, the original PCM is generated from a 128kbps MP3 which is generated from a LAME V0 MP3. I know this is strongly frowned upon, but the MP3-to-MP3 transcode was my only safer option (for another situation), and while the last one is not, as I already use low quality in TOY, I don't think this creates a problem.

Re: TOY, a lossy and experimental "do not use" codec

Reply #15
Now I tried 4000 as the MDCT width and there is a background noise. It exists in the 2304 version too but I noticed it just now. I don't think this is normal for MDCT. Vorbis for example never makes that sound. Do you know the cause?

Re: TOY, a lossy and experimental "do not use" codec

Reply #16
Much of the speed decrease comes from how the MDCT is implemented in TOY: It's using the naive O(n^2) base-algorithm. Basically, every time you double the width, the MDCT will take four times as long.

An optimized MDCT approach can bring this down to O(n * log(n)), but those implementations usually (always?) require fixed MDCT widths.

As for the noise, I think that's a result of the reduced time resolution. With a width of 4000, your frames are nearly 100 ms long. Any quantization noise within the frames will spread out within its frame. Any percussive sound will get its energy smeared across a ca. 100 ms time frame, and I guess that's what you're hearing.