Short re-encoding blind listening test
2005-03-17 07:59:15
Re-encoding from lossy source (sometimes called transcoding) is a technique which is not optimal (the output quality is necessary lower compared to encodings done directly from the original source), but often used for greater convenience. Are some lossy formats better than other ones for re-encoding purpose? Which one could be considered as best source? We have few empirical elements of answers. Some tests were published in the past. The purpose of mine is to add some additional elements (but only few ones).1/ Samples The following test is very limited. I’ve only used four samples (you know, time and envy…). It will be impossible to make any strong conclusions with such limited samples, but maybe some interesting leads would appear. The four samples are those selected by ff123 for its 128 kbps listening test.http://ff123.net/samples.html 2/ Bitrate Always the most disputed point… I must explain my choice. First, I had to select the bitrate of both input and output. As output, the choice was easy: MP3 as format, ABR 128 as setting. It’s probably one of the most universal settings. But for input, the choice of bitrate was harder. On one hand, we have perceptual encoders (mp3, mpc, vorbis, aac), which could reach transparency at 170…190 kbps for most people. On the other hand, there are hybrid encoders, which need much higher bitrate (300 kbps) to be fully transparent, but which are reputed to be better source for re-encoding process. I had therefore two reasonable choices: — to set all formats to 300 kbps. It might be interesting, but there are few people using --quality 10 for mpc, -q 9.5 for ogg vorbis, CBR 320 for MP3 or AAC. Therefore, I have discarded this solution. — to make a compromise, and use 256 kbps as average bitrate. This bitrate is much more common than 300 kbps. It corresponds to --preset extreme (mp3), to -q8 (vorbis) and is near --insane (mpc). These settings are of course not really popular, but are not rare either. On the other side, modern hybrid encoders have progressed recently (DualStream could encode decently at 230 kbps, and WavPack 4 lossy allows 196 kbps). 256 kbps is probably still not optimal for hybrid format, but it should be more than acceptable, and maybe more for re-encoding purpose.3/ Input challengers I’ve decided for the most common formats: AAC, MP3, MPC, Vorbis, & WavPack lossy. I’ve considered average bitrate of WV4 encodings as standard (261 kbps), and tried to obtain the same with other format. • MPC: mppenc 1.15u and --quality 7.5 (--insane is ~230 kbps and --braindead ~270 kbps) • Vorbis: I preferred aoTuV beta3 to official 1.1 encoder. -q 8.3 match 261 kbps. • AAC: choice was more problematic. I’ve tried first with Nero AAC VBR (and ‘fast’ encoder), but no preset correspond to the targeted bitrate. Therefore, I’ve opted for CBR 256. Instead of Nero AAC, I’ve used iTunes AAC (I have few elements to justify this choice, but: 1/this encoder was superior at CBR 128 to Nero CBR on two last collective tests organized by Roberto; 2/ the newer Nero AAC (called by ‘fast’ mode) was still considered as unfinished by JohnV in a recent past; 3/ iTunes AAC has less pre-echo issues at high bitrate; 4/ not related to quality , but iTunes AAC is twice faster and is running on a second platform). • MP3: I’ve privileged lame 3.97 alpha 8 to any other ‘stable’ version. I did more than 800 blind comparisons with lame 3.97 (from alpha 5 to alpha , and in my “double-blind constructed” opinion this encoder have nothing to envy to older version. Unfortunately, the highest VBR preset (-V0 or --preset extreme) can’t match the targeted bitrate (242 kbps instead of 261). I’ve hesitated for a long time between ABR 256 and -V 0, but after deliberation, I’ve opted for VBR. I’m reading HA.org for more than three years, and people using --preset extreme are countless compared to those using ABR/CBR at this bitrate. As consequence, my setting was: -V0 --vbr-new (it performed slightly better on my recent tests, at least with lower VBR settings: -V4, V3 & V2). • WavPack 4: I’ve hesitated one moment between -hb256 and -hb256x, but the encoding speed of -x optimisation have decided for me (a 3Ghz computer is probably needed to encode at x2… mine reached ½ real time!).4/ ADDITIONAL NOTES • I performed two separate listening tests for two samples (rawhide.wav and dogies.wav ). Each test corresponds to a different part of the sample. • As reference, I haven’t used an uncompressed file, but simply an optimal mp3 encoding (i.e. encoded with a proper source). • iTunes encoding offsets were removed by schnofler’s ABC/HR tool; gain was systematically applied to avoid existing volume difference between files (it wasn’t really necessary). [span style='font-size:14pt;line-height:100%']5/ RESULTS [/span] ABX log files are here .iTunes AAC It suffers three times: with dogies_1 , rawhide_1 and wayitis . Each time I noticed an additional artifact: • “but some drums have an ugly coloration” (dogies.wav (piano & drums) ) • “audible distortions on voice” (rawhide.wav ) • “piano notes are excessively distorted (coloring)” (wayitis.wav ) With other files, quality was identical to reference for my hearing (and even slightly better [i.e. less aggressive] I’d say with cymbals on rawhide_2 ).LAME MP3 --preset extreme Clearly the worse challenger. Extract from ABX log files: • “form of ringing: sound is very fluctuating . Flabby.” (dogies.wav (piano & drums) ) • “cymbals are very unstable ” (dogies.wav (cymbals) ) • “drums are distorted, unstable ” (fossiles.wav ) • “cymbals are much more distorted .” (rawhide.wav (cymbals) ) • “horrible fluctuating/unstable noise” (wayitis.wav ) Each time, there was the same kind of distortion. It’s a form of ringing, very typical of lossy encoding, and which ruins the quality of background noise or ambiance. I was often amazed by the huge difference existing between the encoded file and the re-encoded one. I didn’t imagine that re-encoding could have such impact on quality… I also recall that bitrate was also the lowest. But I don’t think that this slightly lower bitrate explains such bad performances. Wayitis.wav sample has for example higher bitrate with MP3 (source) than with vorbis (source), but: MP3 (260 kbps ) -> MP3 (128) : notation =1.5 OGG (252 kbps ) -> MP3 (128) : notation = 5.0 Despite of higher bitrate, quality was really worse…MUSEPACK --quality 7.5 One of the best source according to this small test. Transparent three times, and best once. Nevertheless, I’ve noticed problems on cymbals, slightly more distorted with mpc as source. • “additional distortions on cymbals ” (dogies.wav (cymbals) ) • “cymbals are distorted” (rawhide.wav (cymbals) )Ogg VORBIS aoTuV -q 8.3 Best source with musepack: transparent three times, and best once. I’ve mainly noticed one specific problem: ‘drooling sound’ (in other words imprecise edges). It’s something similar to smearing, but with something else I can’t really describe. • “excessively drooling : smearing is audible, and sound isn't very stable” (dogies.wav (piano & drums) ) • “but with additional degradation (smearing, 'drooling' sound)” (rawhide.wav (cymbals) )WavPack 4 lossy -hb256 As expected, there was audible noise, and it handicaps the format. But there are two important things I’d like to precise: - first, noise wasn’t always audible (not ABXable at least). I honestly expected at this sub-optimal bitrate (for a hybrid format) more audible problems. It’s a very good point. - second thing: audible problems don’t necessary consist in additional noise; there are artifacts, which don’t differ from artifacts triggered by perceptual encoders as source for re-encoding. I have noticed it with dogies.wav , and less clearly with cymbals on rawhide.wav . • “Very noisy . I wouldn't say that this noise isn't disturbing. Anyway, there's an annoying artifact in the middle of the tested part” (dogies.wav (piano & drums) ) • “noise is sometimes noticeable; drums are slightly aggressive (noise)” (fossiles.wav ) • “distorted cymbals. A bit aggressive” (rawhide.wav (cymbals) ) • “noise (I can't locate it... it's a very strange one)” (wayitis.wav )[span style='font-size:17pt;line-height:100%']6/ GENERAL CONCLUSIONS [/span] Hard to make such conclusions with only four samples. But we could note some interesting points which are clearly different from general claims and/or suppositions: • when re-encoding from one lossy to another lossy format, keeping the same format doesn’t necessary help to maintain quality. LAME high bitrate encodings is (here) the worse source for LAME output… All other lossy encodings are much better inputs. • The use of hybrid formats doesn’t necessary lead to keep re-encoding free of additional artifacts. Hybrid encoders are probably artifact free (at least if we didn’t consider noise as artifact, which is contestable), but this additional noise could trigger extra artifacts with re-encoding practice! . • subband encoders (as mpc) aren’t necessary a better source for lossy re-encoding. • the quality degradation isn’t constant: some parts don’t suffers from re-encoding process, and some others (doggies_1; rawhide_2) are much more sensitive. Therefore, I would be very careful before claiming than such and such techniques are better for re-encoding.7/ APPENDIX: statistical analysis • ANOVA analysis:OGG is better than MP3 MPC is better than MP3 WV4 is better than MP3 AAC is better than MP3 • FRIEDMAN analysisOGG is better than MP3 MPC is better than MP3