HydrogenAudio

Hydrogenaudio Forum => General Audio => Topic started by: bernhold on 2013-03-23 18:12:47

Title: Nine different codecs 100-pass recompression test
Post by: bernhold on 2013-03-23 18:12:47
Hi everyone!

I lately discovered this forum and enjoyed reading the listening tests. I decided to run a listening test myself. Have you ever wondered how different codecs are affected by re-encoding / re-compressing? Of course, recompressing audio is a bad idea, but sometimes can't be avoided. To clear things up, I did a test with the following encoders:

Quality settings: Low (~96 kbps) and high (~256 kbps)
Bitrate modes: CBR, ABR and VBR

I encoded the original sample with the respective encoder, decoded it back to WAV and encoded it again, for 100 times. Then I listened to the results to determine which encoder produced the best results.

RESULTS

AAC is the clear winner by far. It is virtually unaffected by the number of passes. All other codecs had degraded sound quality increasing with the number of encoding passes, especially at low bitrates.

At low bitrates, AAC was the only codec providing satisfactory results. All other encoders fall way behind and produce audible compression artifacts such as cracking noises, muffled sound and hissing. At high bitrates, LAME and Musepack can compete with AAC, but all other encoders fall way behind.

It's interesting to see how much encoders profit from an increased bitrate when recompressing many times. For AAC, as the clear winner, it didn't matter. That being said, Musepack placed 9th with low bitrate settings, but at high bitrate, it was almost as good as AAC and placed 4th. This is similar to LAME, which produced loud cracking noises at low bitrates and placed 8th, but sounded almost perfect at high bitrates and placed 3rd.

Other codecs were mainly unaffected by bitrate, such as WMA, the Fraunhofer MP3s encoder, Opus and OGG Vorbis. These codecs were mainly affected by the number of recompression passes.

In general, WMA and the Fraunhofer MP3s codec were the most disappointing. WMA produced loud hissing and cracking noises, while the Fraunhofer encoder sounded bland and muffling, discarding brilliance and detail. The only reason Fraunhofer placed decent is that it doesn't produce loud cracking or hissing noises, which to my ears is even worse than just muffled or dull sound. Of course, that's purely subjective.

Some encoders did not only degrade sound quality, but also had some other quirks. For example, the LAME encoder lowers the volume with every encoding pass. The 100th pass was virtually inaudible. I had to normalize the audio to hear anything at all. Other encoders produced erroneous files and garbage. The Fraunhofer encoder added silence to the beginning and end of each file and repeated parts of the sample at the end. After 100 passes, it created a 12 seconds file (the original file was 7 seconds). Winamp and Foobar2000 even reported a length of 1:02 minutes for the Fraunhofer file, however the playback ended after 12 seconds. The Vorbis encoder did a similar thing, which resulted in a reported length of 2 seconds, while the playback ended at 7 seconds. I can't really say if I did something fundamentally wrong or if it's the encoders fault, but in the end, the Fraunhofer and Vorbis encoders produced corrupted files. For the listening test, I tried to fix all errors like added silence or corrupted files, since I wanted to judge the sound quality only.

You can view the complete test on my homepage. There, I also have attached the test audio samples so you can hear them in your browser. I also visualized the waveform of each sample, it's very interesting to see.

http://bernholdtech.blogspot.de/2013/03/Ni...ssion-test.html (http://bernholdtech.blogspot.de/2013/03/Nine-different-audio-encoders-100-pass-recompression-test.html)

For example, this is the original file:

(https://sites.google.com/site/bernhold2k/comptest-png/sgirl.png)

This is after 100 re-encodings with Nero AAC:

(https://sites.google.com/site/bernhold2k/comptest-png/nero-vbr-low-100.png)

And this is after 100 re-encodings with OGG Vorbis:

(https://sites.google.com/site/bernhold2k/comptest-png/vorbis-vbr-low-100.png)

This is after 100 re-encodings with WMA (Windows Media Audio):

(https://sites.google.com/site/bernhold2k/comptest-png/wma-vbr-low-100.png)
Title: Nine different codecs 100-pass recompression test
Post by: zerowalker on 2013-03-23 18:35:19
Interesting.
Though i am surprised that Vorbis did so bad.

Have you tried with aoTuVb6.03?
Cause it should be more resilient then LibVorbis.
Title: Nine different codecs 100-pass recompression test
Post by: bernhold on 2013-03-23 18:37:50
I read somewhere (Wikipedia, I think) that the improvements of aoTuV are periodically merged back to the original Vorbis codec. So I assumed it won't make much of a difference. I'm not very familiar with Vorbis, though. If you say so, it may be worth testing that, too.
Title: Nine different codecs 100-pass recompression test
Post by: saratoga on 2013-03-23 18:42:00
You probably need to manually adjust the encoders gain in lame so that it does not change the volume when encoding.

If the audio shifted in time for vorbis something probably went wrong. Vorbis supports gap less playback by default so no change in length should occur.
Title: Nine different codecs 100-pass recompression test
Post by: me7 on 2013-03-23 18:46:07
Wow, you uploaded all files playable within the browser. Thank you for sharing this with us.
Title: Nine different codecs 100-pass recompression test
Post by: itisljar on 2013-03-23 19:17:17
I was always wondering about this, but neverh had enough spare time to do it. Thank you.
Title: Nine different codecs 100-pass recompression test
Post by: bernhold on 2013-03-23 19:20:08
You probably need to manually adjust the encoders gain in lame so that it does not change the volume when encoding.

If the audio shifted in time for vorbis something probably went wrong. Vorbis supports gap less playback by default so no change in length should occur.


Thank you, I will try that. Do you think this affected the sound quality of LAME? Regarding Vorbis, the length hasn't actually changed, it's just somehow reported wrong in the audio players I used. It shows as 0:02 in the playlist, but when I actually play it, it's perfectly normal (7 seconds). When I decode it back to WAV, the length is also correct. So I didn't bother much, it shouldn't make a difference regarding sound quality anyway.
Title: Nine different codecs 100-pass recompression test
Post by: alter4 on 2013-03-23 19:21:41
I read somewhere (Wikipedia, I think) that the improvements of aoTuV are periodically merged back to the original Vorbis codec. So I assumed it won't make much of a difference. I'm not very familiar with Vorbis, though. If you say so, it may be worth testing that, too.


Yes that is true. Vanilla vorbis has merged only beta2 code, but the recent Aotuv is beta 6. So it could be worth to test beta 6.
Thanks for interesting test.
Title: Nine different codecs 100-pass recompression test
Post by: zima on 2013-03-23 20:42:27
Hm, reencoding 100 times seems like a bit of an overkill, and generally not a very realistic test? (in that sense, it doesn't clear things up much)

I would assume more practical results would come from one-two passes, starting 1. from lossless source and 2. a lossy high bitrate encode (comparable to typical stuff bought from iTunes or Amazon), both with low bitrate as a target (say, for portable use), and comparing the two resulting files.
Title: Nine different codecs 100-pass recompression test
Post by: bernhold on 2013-03-23 20:53:50
Yes, 100 times is not practical

But there's a reason for it. I encoded 100 times because it's much easier to see how a codec performs. You may not be able to hear any difference after 1 or 2 re-encodes. And I assume that a codec which sounds better than another codec after 100 re-encodes will also sound better after 1 or 2 re-encodes. However, for the listening  test, I only used the results after 100 re-encodes.

I also added results for 10, 25 and 50 passes in my test, they are available in the "detailed results" section on the web page (scroll down). These results are less extreme as you may expect.
Title: Nine different codecs 100-pass recompression test
Post by: romor on 2013-03-23 21:04:36
I run your files through Python (from yesterday's waveform thread), as it also colors waveform on spectral intensity.

Here is result: http://db.tt/q9gXzysF (http://db.tt/q9gXzysF)
Title: Nine different codecs 100-pass recompression test
Post by: zima on 2013-03-23 21:10:53
I encoded 100 times because it's much easier to see how a codec performs. You may not be able to hear any difference after 1 or 2 re-encodes. And I assume that a codec which sounds better than another codec after 100 re-encodes will also sound better after 1 or 2 re-encodes.

Careful there, you assumption looks like it might go against certain rules here...

Not being able to hear any differences after a few reencodes is also a perfectly valid (and much more useful, vs. a bit artificial overkill scenario) result.

All that said, thank you for the effort (particularly for the "detailed results" section, 10 passes ;P ) - I also always wanted to do a similar test, but never got to it.
Title: Nine different codecs 100-pass recompression test
Post by: dgauze on 2013-03-23 21:56:52
It seems as though this test would be more suited to comparing different versions of the same codec, itf anything at all.

As it stands, you are using codecs with different encoding techniques on one particular sample for a few of the codecs tested, but not others.

In that sense, this test doesn't tell us much of anything as it stands.
Title: Nine different codecs 100-pass recompression test
Post by: saratoga on 2013-03-24 01:19:54
After 100 passes, rounding error probably starts to be a problem.  I wonder what effect the intermediate formats used by the decoder/encoder have on quality.  Software that can output/read float probably has an advantage here over 16 bit (or even 24 bit) PCM.
Title: Nine different codecs 100-pass recompression test
Post by: dhromed on 2013-03-24 02:16:43
This test is very comprehensive! Good job.

Here is result: http://db.tt/q9gXzysF (http://db.tt/q9gXzysF)


Excellent, but your results are sorted per passcount per codec, and I think it's more interesting to see the progress of the decay for each codec and setting. Perhaps the data reaches some kind of plateau after a certain number of transcode cycles, or instead accelerates toward Shannon's oblivion.

I've aligned the Vorbis-low images in Photoshop to 0, 10, 25, 50 and 100 measuring points, but there was little I could see because of the gap between 50 and 100.
Title: Nine different codecs 100-pass recompression test
Post by: kjoonlee on 2013-03-24 03:16:40
Does this belong under "Listening Tests"?

I don't think so...
Title: Nine different codecs 100-pass recompression test
Post by: greynol on 2013-03-24 05:16:54
No this does not belong in listening tests and will be moved shortly.

There have already been complaints that this discussion is not in keeping with TOS8 and I have a hard time disagreeing.

While I understand that this took time and effort, I do not agree that the results are particularly meaningful, let alone useful. It's a lot easier to push a few buttons and let the computer chug away than it is to actually conduct double blind tests.

This is a far cry from the level of analysis that members of this forum are capable of presenting.

Title: Nine different codecs 100-pass recompression test
Post by: Mach-X on 2013-03-24 06:48:12
While perhaps it doesnt belong in listening tests, I dont believe it should be binned, I found the results quite interesting, particularly how some codecs manage to keep some semblance of the source file while others destroy it almost beyond recognition. Of course nobody is going to encode a file 100 times but its an interesting test nonetheless.
Title: Nine different codecs 100-pass recompression test
Post by: greynol on 2013-03-24 10:16:01
Read my post again.  You will not see any mention of binning the discussion.
Title: Nine different codecs 100-pass recompression test
Post by: [JAZ] on 2013-03-24 11:07:53
@greynol: Could you clarify why this infringes the TOS #8 and from what point of view is this useless?

Concretely, it is a test of codec regression and I don't even need to listen to the samples from Ogg Vorbis and WMA to know that they will sound notably different, just by looking at those waveforms above. (Edit: Ok, probably the final table classification would need an abc-hr result to back it up)

You will probably also remember some tests made some years ago, that studied transcoding from one codec to another (http://www.hydrogenaudio.org/forums/index.php?showtopic=32440) , and that in that case, Musepack seemed to be the best source to transcode to mp3.
That test required a listening test because it was a single pass, not 100, and because it was testing inter-codec transcoding, instead of transcoding to self.


Concretely, this test can answer several things:

If an user is going to transcode some files, and the origin and destination formats are known, then there's an empirical way to know if it will degrade fast (so the decision of transcoding be less desiderable).

If there is a codec that, giving the interest of transcoding, will manage to add the less amount of artifacts and/or be more stable in doing so.



@bernhold: Like saratoga said, it would be interesting to change the gain that LAME applies by default (which i thought it no longer did), (--scale 1). Said that, which version of LAME is that? (and maybe of the other codecs and which tool was used).
Title: Nine different codecs 100-pass recompression test
Post by: IgorC on 2013-03-24 14:53:23
Don't take me wrong.

It's clear than there are varity of methods for testing audio codecs and everybody is free to adopt and defend any of them.

But as one could notice there is no comments from people who usually involved in listening tests from here.
Or everything is perfect and there is nothing to say, or everything is plain wrong and there is nothing to say.
Take a guess.
Title: Nine different codecs 100-pass recompression test
Post by: greynol on 2013-03-24 14:58:44
Sound quality of lossy codecs is determined though DBT, full stop.
Title: Nine different codecs 100-pass recompression test
Post by: romor on 2013-03-24 15:57:49

@bernhold: Like saratoga said, it would be interesting to change the gain that LAME applies by default (which i thought it no longer did), (--scale 1). Said that, which version of LAME is that? (and maybe of the other codecs and which tool was used).

This would be sensible, and perhaps lossless version of your 8s sample.
Title: Nine different codecs 100-pass recompression test
Post by: db1989 on 2013-03-24 17:34:59
What would be interesting IMO and, I think, much more useful to actual users, would be a test with repeated iterations of re-encoding material from various uncompressed and lossy settings to various other lossy settings, with DBTs after each, aiming to determine when degradation becomes audible and perhaps its extent compared to other workflows. Then again, I have a hunch that effects would become audible well before 100 passes, which I agree is a number so improbable in reality that it’s not useful in any concrete sense and is purely an abstract ‘what if’. The workload in the test I suggested would come much less from the number of passes and much more from the need to choose various source and destination encoders/settings and determine how to assess their effects and the resulting relative quality. Anyway… pure speculation.
Title: Nine different codecs 100-pass recompression test
Post by: romor on 2013-03-24 17:48:16
Maybe also, something similar to transcoding test linked by [JAZ], but perhaps cross-referenced table of selected 30s sample and selected bitrate - pass original signal to every encoder and yet pass again to every other. Table would look interesting to me and I might as well do that out of curiosity, but publicly this test would definitely need ABX report which is not needed here (in this thread test).
Title: Nine different codecs 100-pass recompression test
Post by: Arnold B. Krueger on 2013-03-25 13:42:24
Interesting.
Though i am surprised that Vorbis did so bad.

Have you tried with aoTuVb6.03?
Cause it should be more resilient then LibVorbis.


That AAC did so well is no surprise to me, as JJ said that this sort of thing was one of their design goals. For all I know AAC may have code that recognizes files that are processed by it.
Title: Nine different codecs 100-pass recompression test
Post by: Mach-X on 2013-03-26 06:09:13
Interesting.
Though i am surprised that Vorbis did so bad.

Have you tried with aoTuVb6.03?
Cause it should be more resilient then LibVorbis.


Shouldn't make a difference. aoTuV's betas do not stray from the libvorbis spec, they only are more efficient. IE quality level for encode at same setting is identical, only file size is different. Unless I stand to be corrected?
Title: Nine different codecs 100-pass recompression test
Post by: eahm on 2013-03-26 16:17:47
Shouldn't make a difference. aoTuV's betas do not stray from the libvorbis spec, they only are more efficient. IE quality level for encode at same setting is identical, only file size is different. Unless I stand to be corrected?

I would love to know this as well, from a Vorbis developer.

Is Ogg Vorbis improving/being developer anymore? Is all your attention on Opus now? Thanks.
Title: Nine different codecs 100-pass recompression test
Post by: lvqcl on 2013-03-26 16:40:27
IE quality level for encode at same setting is identical

No, it's not.
Title: Nine different codecs 100-pass recompression test
Post by: 2Bdecided on 2013-03-26 17:22:23
Anyone else think there's an error in here?

e.g. compare...
RESULTS BY CODEC (100 PASSES, FROM BEST TO WORST): VBR, HIGH QUALITY (~256 KBPS) 1 (tie) MP3 (LAME)
...with...
DETAILED RESULTS: lame, vbr high quality

The latter is worst at 10 passes than the former is at 100. In the latter section at 100 passes, it sounds far worse than in the first set of samples (also supposedly after 100 passes).

Apologies if I've misunderstood or missed something.

Cheers,
David.
Title: Nine different codecs 100-pass recompression test
Post by: 2Bdecided on 2013-03-26 17:25:32
Sound quality of lossy codecs is determined though DBT, full stop.
Agree 100%.

This test is an interesting insight into what codecs do when pushed beyond their limits, and shows you what a specific unlikely transcoding scenario will produce. However, the codec that performs best over 100 iterations is not necessarily the one that's best in a single iteration. e.g. one might do all the damage in the first iteration, and then make no change in the other 99.

It is interesting and worthwhile, but it's not the last word (and maybe not even the first word) in choosing a codec for a given application.

Cheers,
David.

P.S. reminds me of this...
http://www.youtube.com/watch?v=mES3CHEnVyI (http://www.youtube.com/watch?v=mES3CHEnVyI)
Title: Nine different codecs 100-pass recompression test
Post by: Porcus on 2013-03-26 22:44:03
e.g. one might do all the damage in the first iteration, and then make no change in the other 99


And that is more than just theory. LossyWAV.
Title: Nine different codecs 100-pass recompression test
Post by: Mach-X on 2013-03-27 04:39:50
IE quality level for encode at same setting is identical

No, it's not.

Care to explain? My understanding of the betas is that quality level 2 is quality level 2 regardless, and that the tunings only reduce filesize, not change in sound quality.
Title: Nine different codecs 100-pass recompression test
Post by: hankwang on 2013-03-27 11:22:44
About the noise in the Vorbis sample: I experienced that "Vorbis exhibits an analog noise-like failure mode" (phrasing from Wikipedia). I wonder whether this noise is really an artifact of the quantization in the codec, or is deliberately added by the decoder using a pseudorandom generator in order to mask other encoding artifacts. In a normal low-bitrate Vorbis sample with noise-like artifacts, I find that less disturbing than the warbling sounds in MP3. It would make sense to mask artifacts with noise and it would explain the huge noise after 100 re-encodes.

Anyone who knows the internals of Vorbis who could chime in?
Title: Nine different codecs 100-pass recompression test
Post by: Primius on 2013-03-27 13:49:17
If codec A was better than codec B after 100 iterations , wouldn't it be also better on the first iteration?
Is lossyWAV a "realistic" counterexample?, how would lossyWAV perform if a random time shift between the compression iterations was introduced?
(to be fair, this would also be a applied to the other codecs in the test)

Would optimizing an existing encoder to perform well this test inevitably result in regressions in the first encode iteration?

Could the reason why opus ranked low be because "it has no psychoacoustic model"?

Could the High frequency noise caused by 100 iterations of Vorbis be the same underlying problem, that caused the "HF noise boost" complaints in the past, I read about in the wiki?
Title: Nine different codecs 100-pass recompression test
Post by: greynol on 2013-03-27 14:00:31
If codec A was better than codec B after 100 iterations , wouldn't it be also better on the first iteration?

Not necessarily.

At the end of the day you have to rely on DBT for any particular codec/setting/sample/iteration/etc. so I don't see the point in such a lazy end-around.
Title: Nine different codecs 100-pass recompression test
Post by: db1989 on 2013-03-27 14:01:47
If codec A was better than codec B after 100 iterations , wouldn't it be also better on the first iteration?
Maybe you missed the discussion about the potential for codecs to recognise that the input signal had previously being processed by that format and act accordingly. It’s not been verified AFAIK, but it’s a very real possibility, so you can’t just generalise like this. There are plenty of reasons that such simple rules may not be true and are generally a bad idea.

Anyway, in case it hasn’t already been said enough, DBT of properly encoded first-generation files is the only way to judge a codecs’ performances in the normal use-cases for which they’re designed. Any extrapolation from 100 passes is pointless at best, dangerously misleading at worst.
Title: Nine different codecs 100-pass recompression test
Post by: Mach-X on 2013-03-27 15:40:39
db1989 and,greynol 100%, and greynol I hadn't meant to imply that you intended to bin the discussion,I was simply suggesting to all mods that while not particularly useful on a practical level, nor should ANY conclusions about ANY codec be drawn from the results (and all such claims SHOULD be binned), I find the tests and results interesting on a casual academic level. Indeed on a casual listen of the samples I must say I am a bit embarrased to say I might not be able to abx the 100 pass aac vs original. Along the lines of what arnie was saying is it possible the aac encoder can detect what has already been processed? After one pass does it simply spit the same file out 99 times? Can we use filesize or some other measurement to find out?
Title: Nine different codecs 100-pass recompression test
Post by: lvqcl on 2013-03-27 16:07:45
Care to explain? My understanding of the betas is that quality level 2 is quality level 2 regardless, and that the tunings only reduce filesize, not change in sound quality.

http://en.wikipedia.org/wiki/Vorbis#Tuned_versions (http://en.wikipedia.org/wiki/Vorbis#Tuned_versions)
Quote
Various tuned versions of the encoder (Garf, aoTuV or MegaMix) attempt to provide better sound at a specified quality setting, usually by dealing with certain problematic waveforms by temporarily increasing the bitrate.
Title: Nine different codecs 100-pass recompression test
Post by: Mach-X on 2013-03-27 16:24:24
I see the word "attempt" in there, but no evidence that anything audible nor tested actually was accomplished. In fact since I cant abx libvorbis at -q2 or higher it stands to reason that those tunings offer no improvements at settings higher than that, including those used in this experiment.
Title: Nine different codecs 100-pass recompression test
Post by: Nick.C on 2013-03-27 18:11:52
In fact since I cant abx libvorbis at -q2 or higher it stands to reason that those tunings offer no improvements at settings higher than that, including those used in this experiment.
[my emphasis]
So, on the basis of one failed ABX result, you contend that no improvements can be made? Which material did you use? Were any of the samples known problem samples for Vorbis?

On the topic of recursive lossyWAV processing - at the same quality settings, lossyWAV stops changing the audio at about the fourth iteration.
Title: Nine different codecs 100-pass recompression test
Post by: saratoga on 2013-03-27 18:47:06
In fact since I cant abx libvorbis at -q2 or higher it stands to reason that those tunings offer no improvements at settings higher than that, including those used in this experiment.


Tuning in this context usually means improving transparency on rare problem files.  Its no surprise you don't notice a difference, at those bitrates most codecs are generally transparent except for the sorts of problem files tuning is meant to help with. 
Title: Nine different codecs 100-pass recompression test
Post by: Mach-X on 2013-03-28 01:14:34
Precisely the point I was getting at. At the bitrates *used* in this experiment, on the sample *used*, there is no evidence to suggest that using a 'tuner' fork of vorbis would produce results any different than already presented. *I* didn't put forth a claim, somebody else did. Still waiting on the abx test of the 100 pass libvorbis vs the 100 pass tuner.
Title: Nine different codecs 100-pass recompression test
Post by: Spikey on 2013-04-09 18:39:39
Quote
Anyway, in case it hasn’t already been said enough, DBT of properly encoded first-generation files is the only way to judge a codecs’ performances in the normal use-cases for which they’re designed. Any extrapolation from 100 passes is pointless at best, dangerously misleading at worst.

I think in addition to this, it misses the obvious point that say after 3 reencodes instead of 100, is the 'loser' from the 100 experiment ABX'able from the 'winner'? Or, any versus any other for that matter. So while after 100 passes things might be really obvious (or really confusing), in just 1-3 reencodes all may be non ABX-able from one another (although of course, the test still needs to be done!).

Interesting thread, although I think it's confusing/oversimplifying a good topic rather than clarifying it. (Scary to see some oldtimers relying on a wave graph with obvious limitations rather than their own ears/logic!)