Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Quite OK Audio (QOA)... anyone ? (Read 9832 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Quite OK Audio (QOA)... anyone ?

Just discovered this new "fast, lossy audio compression" format that claims:
Quote
QOA is fast. It decodes audio 3x faster than Ogg-Vorbis, while offering better quality and compression (278 kbits/s for 44khz stereo) than ADPCM.

QOA is simple. The reference en-/decoder fits in about 400 lines of C. The file format specification is… not yet released.

They provide online samples to evaluate it: https://qoaformat.org/samples/

Official blog: https://phoboslab.org/

Official website: https://qoaformat.org/

Official GIT: https://github.com/phoboslab/qoa
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://www.forart.it/HyMPS/

 

Re: Quite OK Audio (QOA)... anyone ?

Reply #1
I'd wonder how this compares to, say, AptX. or the elephant in the room, mp3?
(And FWIW, given that an ancient chip like, say, dual core ARMv4 at 100MHz can decode ogg vorbis at multiple times realtime, I'm not entirely sure if there's a use case for this if its only benefit is "It's fast" )

Re: Quite OK Audio (QOA)... anyone ?

Reply #2
If anyone wishes to give this a try: https://www.rarewares.org/files/QOA.zip

This contains qoaconv.exe, the encoder, and qoaplay.exe, the player. These are Windows x64 compiles and the input to the encoder is only compiled for .wav files.

Command line to encode: quoconv in.wav out.qoa

and, to play: qoaplay file.qoa

Tested on a couple of tracks and I have to say I have heard a lot worse!! ;)


Re: Quite OK Audio (QOA)... anyone ?

Reply #4
I'd be interested to see someone better at this and more patient than me ABX this.

If this is doing what I assume this is, when it isn't transparent it should be less annoying than something like mp3 getting it wrong, but how often do modern lossy codecs get it annoyingly wrong in the general vicinity of 256 kbps


Re: Quite OK Audio (QOA)... anyone ?

Reply #6
I'd wonder how this compares to, say, AptX. or the elephant in the room, mp3?
(And FWIW, given that an ancient chip like, say, dual core ARMv4 at 100MHz can decode ogg vorbis at multiple times realtime, I'm not entirely sure if there's a use case for this if its only benefit is "It's fast" )

Intended use cases seem to include audio in games, including music, sound effects, where ADPCM formats have been used, and other applications where the computation savings would count, I guess.
Doesn't seem to be meant to compete with more traditional lossy codecs for applications where only one or just a few concurrent streams are meant to be used.
https://phoboslab.org/log/2023/02/qoa-time-domain-audio-compression

Re: Quite OK Audio (QOA)... anyone ?

Reply #7
The same guy invented QOI, a simple lossless image codec. In that case he was competitive with PNG on size and much quicker, a lot of that is thanks to PNG being archaic. QOA is likely uncompetitive with complex audio codecs, but has a fighting chance of being competitive with quick codecs. It'll be interesting how they fare creating a lossy codec.

From the source:
Code: [Select]
/* The Least Mean Squares Filter is the heart of QOA. It predicts the next
sample based on the previous 4 reconstructed samples. It does so by continuously
adjusting 4 weights based on the residual of the previous prediction.
The next sample is predicted as the sum of (weight[i] * history[i]).
The adjustment of the weights is done with a "Sign-Sign-LMS" that adds or
subtracts the residual to each weight, based on the corresponding sample from
the history. This, surprisingly, is sufficient to get worthwhile predictions.
This is all done with fixed point integers. Hence the right-shifts when updating
the weights and calculating the prediction. */

Re: Quite OK Audio (QOA)... anyone ?

Reply #8
QOA specification is still not frozen last time I checked.

Re: Quite OK Audio (QOA)... anyone ?

Reply #9
Out of curiosity, I compared this against a 32kHz-downsampled, FLAC compliant "FSLAC -2" encoding (using this preliminary 1.3.4 binary), which results in a similar bitrate. The reason for the comparison against FLAC is that, as ktf demonstrated in his lossless codec analysis, the FLAC reference software is extremely fast at low and medium presets as well.

Due to an apparent lack of psychoacoustic noise shaping in QOA (the quantization noise is spectrally almost white) and high efficiency (due to the extremely simple compression algorithm), FSLAC sounds quite a bit better to my ears, and so does LossyWAV+FLAC, I would assume. Especially on samples such as "Triangle", see the FSLAC thread here.

Is there any other feature in QOA that F(S)LAC doesn't offer?

Chris
If I don't reply to your reply, it means I agree with you.

Re: Quite OK Audio (QOA)... anyone ?

Reply #10
Other than up to 255 channels and guarantees about footprint and consistency, no. Flac is very fast but qoa is so simple that it should be an order of magnitude faster when optimised, if IO allows. The reference qoa decoder processes one sample at a time which can probably be improved without using SIMD and there may also be other speedups from where it stands.

Re: Quite OK Audio (QOA)... anyone ?

Reply #11
Flac is very fast but qoa is so simple that it should be an order of magnitude faster when optimised, if IO allows.
I fail to see how this algorithm is much simpler than FLAC's. I haven't looked at this in detail, but having weights being updated each sample is usually something detrimental to SIMD optimizations.
Music: sounds arranged such that they construct feelings.

Re: Quite OK Audio (QOA)... anyone ?

Reply #12
I'm noodling trying to do multiple sample decodes at once (not full SIMD but packing into uint64_t), the residual is easy to unpack 4 at a time like that but haven't figured out the predictor yet. You may be right that the predictor cannot really be SIMD per channel, it definitely could be by decoding a single sample from every channel at once ("subchannels" are interleaved) but that involves more spread out memory access which may need twiddling and defeat the purpose, and limited benefit as most input is likely 2 channel. There's no stereo decorrelation which helps. FWIW the weights for a channel fit in a uint64_t, so does the history (which the ref updates separately to the output, but it looks like the output could be used directly which may or may not be a benefit).

What is a lot simpler are the memory accesses, they're fixed and so is the structure of the data. If we're really lucky a few common channel counts could be auto-vectorised but I don't have much faith in that. Order of magnitude may be pushing it, currently the ref takes half the user time to decode as flac -8 no md5 which admittedly may not be a fair fight.

Re: Quite OK Audio (QOA)... anyone ?

Reply #13
I haven't looked at this in detail, but having weights being updated each sample is usually something detrimental to SIMD optimizations.
Updating weights each frame, not each sample, sounds promising.

Re: Quite OK Audio (QOA)... anyone ?

Reply #14
Possible improvements include:
  • Adding noise shaping
  • Use vector quantization, like E8 lattice, or PVQ to minimize the average root-mean-square error.
  • Use QMF (Quadrature mirror filter) to split the input into subbands, and encode each subbands separately, like aptx do.

Re: Quite OK Audio (QOA)... anyone ?

Reply #15
Author of QOA here. Cool to see that it transpired to this forum - and that it's not met with immediate disgust!

To address some questions/remarks:

SIMD: yes, the algorithm doesn't seem to be very friendly to SIMD optimizations. I tried writing some intrinsics and only made it slower than my compiler's -O3. The problem is manly that the prediction needs a horizontal accumulation of all 4 weights * history and these are bog slow on x86. Updating the LMS state every sample in itself isn't too bad. On my aging i7-6700k decoding sits at around 3500x realtime (one thread). I'm still looking for ways to make it faster.

Noise shaping: there's an experimental branch where I implemented some very simplistic noise shaping. code here. I made a comparison page with all test samples with and without this noise shaping: https://phoboslab.org/files/qoa-samples/noiseshaping.html – The difference in 32_triangles-triangle_roll_stereo is night and day. Though I have a hunch that this noise shaping hurts some other samples. E.g. the the voice in julien_baker_sprained_ankle. But I'm not sure if I'm not making this up. My ears (and/or my equipment) are not the best. Feedback welcome!

QMF: I actually tried that and it didn't make a difference, but made the code much more complex. So, not terribly exited about that.

E8 lattice, or PVQ: I guess I have some reading to do...

Re: Quite OK Audio (QOA)... anyone ?

Reply #16
Very interesting improvements. I will follow this thread for sure!

Re: Quite OK Audio (QOA)... anyone ?

Reply #17
I have been interested in time-domain lossy audio compression for a long time, and this is a very cool idea and implementation! As mentioned in the blog there are two obvious competitors. On the simpler side there is 4.0 bps ADPCM and on the more complex side there is 2.5 bps WavPack lossy. I have done experimentation in the past and found that those two are roughly equivalent quality-wise, and I have successfully used both of them in embedded canned audio applications. I suspected from the blog description that this codec, at 3.2 bps, would fit right in between.

I ran some quick tests using a 44.1 kHz stereo music sample and a 16 kHz mono voice sample. For ADPCM I used my ADPCM-XQ encoder at the highest quality setting my patience would allow and for WavPack lossy I used the default mode with -x6. Both my ADPCM encoder and WavPack use dynamic noise shaping, so to make the comparison valid I turned that off (and verified that all three codecs generated flat noise).

In short, the results were exactly as I expected with all three codecs generating similar noise levels, within a dB or so. Of course, the encoding speed of ADPCM and WavPack were much slower than QOA, but that's irrelevant for canned audio. The decoding speed was too fast to measure with these samples and setup, but I imagine that they would line up according to complexity on embedded systems. Not sure where FLAC would fit in, but probably close to QOA.

In addition to noise shaping, which has been discussed, there is one other low-hanging quality improvement to consider: mid-side encoding (sometimes called joint stereo). The advantage that this can bring to this kind of lossy encoding is not obvious nor easy to measure, but in cases where there is a significant amount of centered audio (e.g., a lead vocal) then by using mid-side encoding most of the quantization noise will also be centered spatially, which makes it more easily masked by the source. Of course in cases where the left and right channels are completely different, then left-right encoding is better, so there has to be some sort of heuristic to choose. This is obviously impossible with ADPCM without breaking existing decoders, but WavPack lossy uses this by default, and in all the -x modes switches it in and out dynamically.

Re: Quite OK Audio (QOA)... anyone ?

Reply #18
Having a viable alternative to proprietary ADPCM codecs seems like a worthy goal. The initial QOA blog post mentions video game applications, but as far as I can tell, QOA is missing one key feature for this purpose -- looping. As an example, CRI's ADX has three playback modes:

  • No loop (play the whole file once)
  • Loop all (upon reaching the final sample, continue playback from the first sample)
  • Loop specific (upon reaching end sample y, continue playback from first sample x)

Method 2 requires you to cut the audio's start and end points ahead of time to ensure a seamless loop. Method 3 is required if the track has an introduction that is not repeated within the loop. In order to implement this in QOA, you'd need to allocate some space in the header for a loop type flag (0-2), as well as to store the starting and ending sample/block values.

ADX generally ignores the end loop position and treats the end of the file as the loop end, and then only uses the loop start position for the beginning of the loop. Since ADX was designed for CD-based games, it also requires that the loop start position lie on a CD sector boundary, i.e. you can only loop back to a sample that lies at the beginning of a 2048-byte CD sector. You probably don't need to replicate such a restriction in QOA, since most games no longer use optical media, but it might be useful if people are using QOA for homebrew games on older platforms. Built-in looping support would be a big selling point for using QOA in games, though.

Re: Quite OK Audio (QOA)... anyone ?

Reply #19
Thanks bryant! Your "this is a very cool idea" comment means a lot me :]

Mid-side encoding: it somehow never occured to me that this could improve quality. I always thought of it as a way to allow quantizing one channel even more (which, now that I write this, makes it obvious that the quality would increase if you don't quantize more). Would be cool to allow that on a per frame basis, but I'm not sure if the added complexity would be worth it for the intended use cases. I'm trying to keep it really simple.

Looping: maybe I'm not understanding the issue, but I fail to see why the file format needs to support this. Ultimately it's the application's choice how and where to loop a file; and this info should imho be provided out of band.

I also just finished a new draft of the file format specification:
https://qoaformat.org/qoa-specification-draft-02.pdf


Re: Quite OK Audio (QOA)... anyone ?

Reply #21
That plugin is not mine. I sadly had to stop using foobar when I switched to linux a few years ago.

The foobar plugin is being developed by pfusik, here: https://github.com/pfusik/qoa-ci - maybe open an issue there. Is there any documentation of how (if at all) the plugin API changed for v2? If there's only minor changes, it may not be a all too difficult.

Re: Quite OK Audio (QOA)... anyone ?

Reply #22
Looping: maybe I'm not understanding the issue, but I fail to see why the file format needs to support this. Ultimately it's the application's choice how and where to loop a file; and this info should imho be provided out of band.
The problem is that the loop start position will be different for every track, so unless the file format has a place to store the start sample number as metadata, you'd have to hardcode a table containing the start sample number for each track somewhere in an external file, which makes less sense than storing the metadata as part of the file itself, so that changes to the music do not require updating a separate file.

If QOA supports tags like ID3 or Vorbis comments, then you could just as easily store the start sample in a tag field, instead of the file header. I didn't see any mention of which (if any) tag format QOA uses, though. ADX just uses the file header, since it doesn't officially support any kind of tagging.

Re: Quite OK Audio (QOA)... anyone ?

Reply #23
That plugin is not mine. I sadly had to stop using foobar when I switched to linux a few years ago.

The foobar plugin is being developed by pfusik, here: https://github.com/pfusik/qoa-ci - maybe open an issue there. Is there any documentation of how (if at all) the plugin API changed for v2? If there's only minor changes, it may not be a all too difficult.

Will make a request over there then. Thnx

Re: Quite OK Audio (QOA)... anyone ?

Reply #24
I also just finished a new draft of the file format specification:
https://qoaformat.org/qoa-specification-draft-02.pdf
I see that you follow FLAC's channel order and allocation, where 4 resp. 5 channels are 4.0 and 5.0, and not 3.1 resp. 4.1.

I don't know what is actually the best, but you should think twice. The WAVEFORMATEXTENSIBLE channel order has LFE as channel four. And BL as channel five, so deciding that "an N channel signal MUST BE the first N in WAVEFORMATEXTENSIBLE" is not appropriate, at least not unless you can code one as "not present" for each block.

At https://xiph.org/flac/format.html there has "since forever" been a loose mention of SMPTE/ITU-R recommendations that aren't referenced properly - and also, those are superseded over time. Revision 9 of ITU-R BS.2159 is here: https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2159-9-2022-PDF-E.pdf . You see channel orders.
I have a hunch that no "standard" ever did prescribe FLAC's allocation - only the order among the channels that are actually included. At least it seems to be that way by now. Apparently, DVD-Audio can accommodate four channels as FL FR + any among the following four: (FC LFE), (LFE BC), (FC BC) or (BL BR).

So ... careful. Which means you might want to consider whether
uint8_t num_channels; // no. of channels
should be something else.