Multi-Codec Listening Test: 96-128-192-256Kbps

Topic: Multi-Codec Listening Test: 96-128-192-256Kbps (Read 61785 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #25 – 2009-03-20 16:02:01

These are interesting tests, but I don't think any serious conclusions can be made because the durations are only a second or two. In the public HA listening tests the first two seconds of the encoded samples have always been cutted off because the lossy codecs may first need to adapt to the content. I don't know how severe the problem can be and which codecs & settings are most affected, but that has been the accepted practise.

In addition, as sauvage78 stated, anyone who interprets the results must remember that the results are valid only for these specific samples, which represent in total of 8 seconds of quite unusual sound clips. I'd recommend playing once through the original lossless samples before making any conclusions.

[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]EDIT: fixed a typo (~~adopt~~ > adapt)[/size]

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #26 – 2009-03-20 16:06:55

Quote from: Alex B on 2009-03-20 16:02:01

In the public HA listening tests the first two seconds of the encoded samples have always been cutted off because the lossy codecs may first need to adopt to the content.

Lossy audio codecs don't "adopt" to content over time (n-pass video coding is different). In fact, they don't even have any memory about the past surviving the current frame boundary (except maybe a bit reservoir for bitrate constraints).

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #27 – 2009-03-20 16:11:37

Quote from: sauvage78 on 2009-03-20 16:00:38

Edit: Can I do it with the quicktime installed by Itunes ? a few years ago quicktime wasn't a freeware if I recall well.

You can just use my samples or create your own with Quicktime Player's "export" function (and see that they are identical to mine). But there is no batch interface on Windows, yet, so better save your time for the actual testing, if you don't have a Mac available.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #28 – 2009-03-20 16:35:56

Alex B:
I can only agree with you as some people seems to only watch the colored table & say WAOUH ... that is not the right way to do things, you have to test for yourself to put it in perspective.
That said, the Harlem sample is applauds, so every live CD is potentially affected by applauds.
In the same way, killer samples are very usual in electronic music, songs from NIN, Ministry, Marylin Manson, Fear Factory ... are full of effects that can be very similar to the Krafwerk/Rush/Autechre samples.

So overall, even if it's only 10 sec in the ocean of music ... I think you can draw some conclusions from my test both for live music & for electronic music.
But I agree it definitly needs more samples for other genres.

I can only tell you that this test is very serious & very honest ... I didn't spend two days to do a cheap test. I have better things to do in life specially as I don't use lossy !!!
Unless you're a real sadomasochist, you don't listen to Autechre 30min in a row for fun ... that I can tell you ...

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #29 – 2009-03-20 16:57:59

Quote from: rpp3po on 2009-03-20 16:06:55

Lossy audio codecs don't "adopt" to content over time (n-pass video coding is different). In fact, they don't even have any memory about the past surviving the current frame boundary (except maybe a bit reservoir for bitrate constraints).

I just spent half an hour trying to find the original reason for adding 1000-2000 ms of additional offset in the public listening tests, but unfortunately my searches didn't find the correct threads/posts. It may be related to the bit reservoir behavior or something else, but if I recall correctly it has something to do with audio quality in the very beginning of the encoded samples.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #30 – 2009-03-20 17:12:32

The test is really interesting (ABXing vorbis at -q8 is not common) but the cross-comparison of different audio coders could be misleading. I just quote the original posters words:

Quote

I searched back on the forum & on various samples databases for the worst problem samples I could found for vorbis (…)

Quote

but that doesn't mean Vorbis is bad, because I selected problem samples specific to vorbis & then tested it on Nero AAC, so it is unfair for vorbis.

Maybe a big red warning on the top of the first message should avoid future confusions.

Anyway, thank you for your test (and welcome to the club of people disgusted by ABX procedure )

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #31 – 2009-03-20 17:45:27

I finally found at least one thread in which Gabriel explains why "additional offset" should be used.

Quote from: Gabriel on 2004-11-21 13:05:01

I would like to suggest a little change to the recommended practices in listening tests.

Most of modern codecs are working based on the recent context. They usually have a way to adapt the bitrate to the content that take into consideration the past recent bitrate (a window). Many encoders also have a psychoacoustic model that take into consideration the previous psychoacoustic parameters/results.

Right now, when listening to samples, we usually encode a short sample with the encoder and listen to the result.
But the encoder needs some time to adapt its models (bitrate and psychoacoustic), and of course will not be able to properly adapt at the very beginning of the sample. If the sample hasn't been extracted from the full track, the encoder would have some time to adapt its models. It means that encoding a short sample is not totally representative of how this portion would be encoded in a "real" encode.

That is why I am proposing the following:
When encoding a short sample, allow a 1 second margin at the beginning and at the end of the sample so the encoder can adapt its models. This should not be 1s of silence, but a real 1s of content.
For ease of use, this could even be taken into consideration by the testing tools.

For video, the vqeg already has a similar recommendation: 1s at the beginning and 1s at the end should not be considered for tests, in order to let encoders stabilize themselves.

And:

Quote from: Gabriel on 2004-11-22 14:33:58

Well, I suggested 1s because we have to find a reasonable value.
I do not know about wma encoders, but even 1s is not optimal to Lame, as the ATH adjustement might need more than 1s to stabilise.
But 1s is still way better than nothing and does not reduces the sample that much.

Regarding the testings themselves, I think that it would be very nice to have the tools automatically restrict the default time by 1s at both ends.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #32 – 2009-03-20 18:03:09

It may be true in theory but my experience shows that it is not true in practice, at last for vorbis. For a very simple reason ... before I had my "artefact only" 2 sec samples ... I had to test 10 to 30 sec samples to actually find the artefact ... it never happened that there was a variation in what I was hearing betwen the 1-2 sample & the 10-30 sec sample ... This is true for Vorbis at -q2 which is the codec/setting I used to find my artefacts/samples ... I don't know for other codecs. I doubt it, specially as the argument comes from Gabriel & Lame MP3 competes very well, if it was a real problem Lame wouldn't compete so well.

Edit: When I will have more time I will re-code both long & short samples at Lame V7 then decode both to wav & cut the long wav to match the small sample. If what you say affect audio quality, I should be able ABX a difference. I think I won't find any, but I prefer to be 100% sure.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #33 – 2009-03-20 20:20:44

Making statements that codec A is better than B based on ABX results is nonsense. ABC/HR is required.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #34 – 2009-03-20 20:57:29

Well it is an A/B Vs. C/D test using foobar2000 ABXing component so the reference was hidden, I didn't knew my reference file. I conducted the test in 2 parts, first I determined which one I considered as the lossless file between A & B then I determined between C & D which was the closer. It is not an X selected reference vs. A/B random test. What it really miss IMHO is statistical validity & a larger sample database ... with time I can fix the first problem but I cannot force others to test ... any quality claims are to be taken carefully. But if each time someone test codecs everyone jump & tell this is not valid for reason XYZ ... it is not surprising that tests like this doesn't show up more often. Not only it's boring as hell, but the whole world suddenly disagree with you. The only thing you can do is to test with the highest transparency possible, the most scientifically possible & then tell your opinion so that others can disagree openly. I have nothing against critics. All the files are here & my ABX logs too, so the test can re-run forever by others until is is proven scientifically valid. There is a small part of truth in this test. Readers should just be aware that it is not THE truth. If such test didn't gave an hint/a clue/an orientation, it wouldn't even be worth it testing audio for yourself.

I runned the test for myself, I am very confident of the result for myself. But in the same way I don't blindly trust tests made by others, I don't expect others to blindly trust myself. It is not a problem for me if you disagree, it's a problem for me if I made mistake ... like not using the optimal settings for iTunes AAC. (but my test is valid within the setting I used, which is iTunes for Windows default import setting, I will edit the table to make it clearer for mac OS users)

For me, it is nonsense to make quality claims within the same area of flaw. I cannot honestly tell what is better between to two orange/medium or two yellow/light artefacts. But if you tell me that I cannot tell that a sample I marked as red sound worst than a codec I marked as green. It is such an evidence that it is false, that I can only disagree. I didn't rate the sample from 1 to 5 because I consider that its to wide to be honest so I rated as 1/2/3. The fact that my scale is smaller means that the difference between rating is higher. Trust me, red is awfull & yellow shouldn't be ABXable for anyone without ABXing experience.

I am pretty confident that I can make some quality claims because I am very confident that I was able to find the right anchor in the first place. I agree this is very un-scientific & personnal ... but I cannot disagree with myself, I am not yet schizophrenic It's your job to disagree !

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #35 – 2009-03-20 21:12:12

@Alex B: Calling all lossy codecs oblivious was maybe to general. MP3 does in fact know frame interdependencies. So frame 3 could be depending on frame 2's content. I doubt that more than a quarter of a second would make a difference, but I don't know the actual implementation. Anything concerning rate control is rather messy compared to AAC anyway, in my opinion. AAC doesn't know frame interdependencies, so you wouldn't need leading silence for these kind of tests. I don't know how this is handled in Vorbis.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #36 – 2009-03-20 21:22:58

Quote from: rpp3po on 2009-03-20 21:12:12

@Alex B: Calling all lossy codecs oblivious was maybe to general. MP3 does in fact know frame interdependencies. So frame 3 could be depending on frame 2's content. I doubt that more than a quarter of a second would make a difference, but I don't know the actual implementation. Anything concerning rate control is rather messy compared to AAC anyway, in my opinion. AAC doesn't know frame interdependencies, so you wouldn't need leading silence for these kind of tests. I don't know how this is handled in Vorbis.

You're mistaken here, an AAC decoder has very minimal inter frame dependencies (but not none), but nothing is stopping an encoder from keeping track of, and using, a lot of past (or even future) data, as long as the bitstream conforms.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #37 – 2009-03-20 21:28:30

Quote from: menno on 2009-03-20 21:22:58

You're mistaken here, an AAC decoder has very minimal inter frame dependencies (but not none)...

You must know. Just out of curiosity, which would that be?

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #38 – 2009-03-20 21:40:55

Quote from: rpp3po on 2009-03-20 21:28:30

Quote from: menno on 2009-03-20 21:22:58
You're mistaken here, an AAC decoder has very minimal inter frame dependencies (but not none)...

You must know. Just out of curiosity, which would that be?

For LC it is only overlap and add in the filterbank, but this has no influence as long as the frames are presented to the decoder in the same order as the encoder output them. Then there is inter frame prediction for MAIN profile (who uses that). And for SBR there is a header with some configuration data emitted only once so many frames, as well as a lot of influence on parameters from previous frames.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #39 – 2009-03-20 22:43:41

I am sad to report that the Kraftwerk sample just miserably failed my ABX for Quicktime AAC in all versions up to 274 kbit/s (256kbit/s constrained VBR/iTunes Plus), so even the ones with corrected bitrate:

Code: [Select]

foo_abx 1.3.3 report
foobar2000 v0.9.6.2
2009/03/20 23:29:36

File A: Y:\Downloads\01- DCT Killer Samples (Lossless)\01- Artefact+Context\QT7.6_VBR_Target-Bitrate\04- Kraftwerk (Artefact+Context) QT7.6_256kbs_VBR_constrained.m4a
File B: Y:\Downloads\01- DCT Killer Samples (Lossless)-1\01- Artefact+Context\04- Kraftwerk (Artefact+Context) Lossless.flac

23:29:36 : Test started.
23:30:33 : 01/01  50.0%
23:30:41 : 02/02  25.0%
23:30:59 : 03/03  12.5%
23:31:12 : 03/04  31.3%
23:31:30 : 04/05  18.8%
23:31:38 : 05/06  10.9%
23:31:58 : 06/07  6.3%
23:32:13 : 07/08  3.5%
23:32:21 : 08/09  2.0%
23:32:39 : 09/10  1.1%
23:32:51 : 10/11  0.6%
23:33:08 : 11/12  0.3%
23:33:20 : 12/13  0.2%
23:33:29 : 13/14  0.1%
23:33:31 : Test finished.

 ---------- 
Total: 13/14 (0.1%)

It's instantly noticeable, just try it yourself. The synth sound in the middle part (from 00:01) is completely muffled.

And I can also completely reproduce sauvage78's findings that Nero is already transparent at q .4:

Code: [Select]

foo_abx 1.3.3 report
foobar2000 v0.9.6.2
2009/03/20 23:57:41

File A: Y:\Downloads\03- Nero AAC 1.3.3.0\04- Kraftwerk\04- Kraftwerk (Artefact Only) (Duplicated Right Channel) Lossless.flac
File B: Y:\Downloads\03- Nero AAC 1.3.3.0\04- Kraftwerk\04- Kraftwerk (Artefact Only) (Duplicated Right Channel) Nero AAC 1.3.3.0 Q0.40.mp4

23:57:41 : Test started.
23:58:58 : 00/01  100.0%
23:59:23 : 01/02  75.0%
23:59:47 : 01/03  87.5%
00:00:00 : 01/04  93.8%
00:00:14 : 01/05  96.9%
00:00:28 : 01/06  98.4%
00:00:40 : 02/07  93.8%
00:00:51 : 03/08  85.5%
00:01:01 : 03/09  91.0%
00:01:12 : 04/10  82.8%
00:01:20 : 04/11  88.7%
00:01:27 : 05/12  80.6%
00:01:35 : 05/13  86.7%
00:01:39 : Test finished.

 ---------- 
Total: 5/13 (86.7%)

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #40 – 2009-03-20 23:40:31

My two cents on the behavior of a codec during the first 1 or 2 seconds of audio, in short form:

At the beginning of an encoding, the bit reservoir is full (since previously, there was no audio)
=> The bit reservoir can be "drained" more aggressively than in the middle of an encoding
=> more than the targeted average bits per time can be spent
=> the first few frames (or tenths of a seconds) most likely sound better than later audio parts

Of course, this only applies to CBR coding. A VBR codec doesn't have to enforce a very strict bit rate per second (or per x frames), so there is no, or a very lenient, bit reservoir.
=> For VBR, quality should be the same for the first few frames and later frames.

Still, I recommend using samples longer than 1 or 2 seconds because, as said, an encoder might need a few frames to adjust to the input, and because our hearing also needs some time to get accustomed to the stimulus (especially if it's something noisy and transient like sauvage78's test set and there is a distinct click/pop when looping a test item).

sauvage78, which items did you use for ABXing? The artefact+context, or the artefact-only? The former ones are long enough, the latter ones not, in my opinion.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #41 – 2009-03-21 00:53:50

Wow i didn't know that ogg autov can have some really serious precho problems.

Code: [Select]

foo_abx 1.3.3 report
foobar2000 v0.9.6.3
2009/03/21 00:36:40

File A: C:\Downloads\02__aoTuV_Beta5.7\02- aoTuV Beta5.7\04- Kraftwerk\04- Kraftwerk (Artefact Only) aoTuV Beta5.7 256Kbps.ogg
File B: C:\Downloads\02__aoTuV_Beta5.7\02- aoTuV Beta5.7\04- Kraftwerk\04- Kraftwerk (Artefact Only) Lossless.flac

00:36:40 : Test started.
00:36:56 : 01/01  50.0%
00:37:00 : 02/02  25.0%
00:37:18 : 02/03  50.0%
00:37:21 : 03/04  31.3%
00:37:25 : 04/05  18.8%
00:37:30 : 05/06  10.9%
00:37:35 : 06/07  6.3%
00:37:39 : 07/08  3.5%
00:37:42 : 08/09  2.0%
00:37:47 : 09/10  1.1%
00:37:50 : 10/11  0.6%
00:37:55 : 11/12  0.3%
00:37:59 : 12/13  0.2%
00:38:10 : 13/14  0.1%
00:38:13 : 14/15  0.0%
00:38:18 : 15/16  0.0%
00:38:22 : 16/17  0.0%
00:38:27 : 17/18  0.0%
00:38:30 : 18/19  0.0%
00:38:33 : 19/20  0.0%
00:38:38 : 20/21  0.0%
00:38:44 : Test finished.

 ---------- 
Total: 20/21 (0.0%)

Precho all the way through the synth, causing smearing and making it sound muffed up. Pretty bad for a 358kbps file.

I can also confirm that iTunes AAC has the same problem aswell.

Code: [Select]

foo_abx 1.3.3 report
foobar2000 v0.9.6.3
2009/03/21 00:30:22

File A: C:\Downloads\04__iTunes_AAC_8.1.0.52\04- iTunes AAC 8.1.0.52\04- Kraftwerk\04- Kraftwerk (Artefact Only) (Duplicated Right Channel) iTunes 8.1.0.52, QuickTime 7.6 256Kbps VBR.m4a
File B: C:\Downloads\04__iTunes_AAC_8.1.0.52\04- iTunes AAC 8.1.0.52\04- Kraftwerk\04- Kraftwerk (Artefact Only) (Duplicated Right Channel) Lossless.flac

00:30:22 : Test started.
00:30:30 : 01/01  50.0%
00:30:34 : 02/02  25.0%
00:30:42 : 02/03  50.0%
00:31:05 : 03/04  31.3%
00:31:09 : 04/05  18.8%
00:31:12 : 04/06  34.4%
00:31:15 : 05/07  22.7%
00:31:19 : 06/08  14.5%
00:31:25 : 07/09  9.0%
00:31:32 : 08/10  5.5%
00:31:36 : 09/11  3.3%
00:31:39 : 10/12  1.9%
00:31:44 : 11/13  1.1%
00:31:50 : 12/14  0.6%
00:31:55 : 13/15  0.4%
00:31:59 : 14/16  0.2%
00:32:03 : 15/17  0.1%
00:32:07 : 16/18  0.1%
00:32:12 : 17/19  0.0%
00:32:15 : 18/20  0.0%
00:32:20 : 19/21  0.0%
00:32:26 : 20/22  0.0%
00:32:29 : 21/23  0.0%
00:32:40 : 22/24  0.0%
00:32:46 : 23/25  0.0%
00:32:52 : 24/26  0.0%
00:32:58 : Test finished.

 ---------- 
Total: 24/26 (0.0%)

Same problem.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #42 – 2009-03-21 12:50:51

rpp3po:
I have re-installed Itunes in order to see if I did anything wrong, here is the encoder/setting that I used
Preferences/Import Settings/Custom then I only changed the bitrate.

C.R.Helmrich:
I used the short version as marked within the ABX logs. But if this is a real problem I will find it, give me some time. I need to know as I intend to find even more short samples with artefact to make the ABXing time shorter.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #43 – 2009-03-21 13:16:44

I have this already cleared up here. In iTunes you have the chance between VBR constrained and ABR. Maybe the bitrate confusion comes from the fact that your listed bitrates are for artifact only encodes and mine for artifact+context? A too short sample shouldn't really worsen a codec's ability to prevent artifacts, but you can't really compare bitrates for such ultra short clips.

When you want to try the QT pro version just for this test and not for anything else you may google for pablo/nop.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #44 – 2009-03-21 13:55:24

Thks a lot, I get what you meant with your googling thing but it is not my intention to test codecs which are not free for us mortal windows users. I will re-focus on aotuv vs. nero aac because this is where my interest really stands. I will edit VBR to VBR constrained in the table.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #45 – 2009-03-22 01:20:20

What are you talking about? They are all free. Maybe not Quicktime Pro, but iTunes is, and so are the rest.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #46 – 2009-03-22 02:12:48

Quote from: DigitalDictator on 2009-03-22 01:20:20

What are you talking about? They are all free. Maybe not Quicktime Pro, but iTunes is, and so are the rest.

You need to read all the posts.

If you have a Mac, you can actually get all the options in a very convenient way for free.

Some folks seem to have real problems with the idea of using a Mac.

Some of those ideas might be rational,

some perhaps a bit less than rational....

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #47 – 2009-03-22 13:10:11

I was playing around with Vorbis today, and I found another sample which you might find interesting. It is lossless, directly rendered from the trial of FL Studio. The problem is, when coded with vorbis (aoTuV, latest beta) at quality 2, you can hear distortion in the foreground synth.

Edit: typo

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #48 – 2009-03-22 13:53:03

I tried to catch something but I failed ... can you tell me when & what to listen to more exactly, I focused on synth at the beginning/middle & end, I found nothing.

Don't worry this happens often this morning after discovering the eig sample from Mo0zOoH, I tried to ABX his two other samples:

Autechre — [Gantz Graf EP #01] Gantz Graf [3:58];
Nine Inch Nails — [Quake OST #01] Quake Theme [5:08].

from this thead:
Mo0zOoH's problem samples

I cannot ABX the first one at all & I can ABX the second one but only at lame V7 so it wasn't worth it.

That's why it's hard to put such a test together, first you must waste time listening to things that others can hear while you cannot... It's boring & frustrating.
Thks anyway.

Edit:
In the future, I will split the table in two as, if I will add new samples for sure to this test (Ministry & Abfahrt Hinwil are planned, I already know that Lame MP3 will have 2 medium artefacts at V2 on these). I do not plan to test the new samples on iTunes & Musepack, for various reasons (not only audio quality) I don't think that these codecs worth that I spend time on them. I think the same of Lame MP3, but actually Lame MP3 is a good anchor & is usefull to identify the artefacts so I decided to keep Lame MP3 even if I will never use it personnaly. I will focus on Vorbis/Nero/Lame & lately Lossy|Flac. Also, I want people to be able to test for themselve. So I want them to be able to download the samples, but there is an upload limit that I am already almost reaching. So when I will add new samples I will re-organize my files to remove lossless doublons in the archive & gain some space. I want this thread to be heavyly oriented toward vorbis, so that, maybe one day, its flaws get fixed.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #49 – 2009-03-22 14:23:30

Quote from: knucklehead on 2009-03-22 02:12:48

Quote from: DigitalDictator on 2009-03-22 01:20:20
What are you talking about? They are all free. Maybe not Quicktime Pro, but iTunes is, and so are the rest.

You need to read all the posts.

Sure I read all the posts. I just think he confuses codecs with applications and platforms. It's still the same codec AFAIK. Not important anyway.

Notice