Pre-Test thread

Topic: Pre-Test thread (Read 56348 times) previous topic - next topic

0 Members and 2 Guests are viewing this topic.

Pre-Test thread

2003-08-22 07:52:47

Hello.

As most of you already know, I am planning to start a 64kbps public listening test in September.

So here are the planned details. Nothing is definitive so far:

The test starts at September 3rd and ends at September 14th

The codecs that will be tested are:
- Ahead HE-AAC "Streaming :: Medium" VBR profile, high quality.
- Ogg Vorbis post-1.0 CVS -q 0
- MP3pro codec in Adobe Audition, VBR quality 35, high quality, m/s and is stereo, no CRC, no narrowing.
- WMAv9 Standard 64kbps (there's no PRO version at such bitrate, AFAIK)
- Real Audio Cook 64kbps (I didn't investigate other settings yet. Comments welcome)

The samples that will be tested have been announced on this thread. If you have concerns/comments about the sample suite choice, please post there.

The test results will be calculated the same way my former tests were. I don't plan to include bitrates in the formula. Comments are welcome now (they are of no use to me after the test has been started).

I haven't decided about anchors yet (my guru is traveling ), but someone suggested me that I use Lame ABR 128 as higher anchor, so that we can verify which one of these codecs really deliver the marketing of "sounds like MP3 at half the bitrate"

Then, the lower anchor would be a standard 3.5kHz lowpass, like it is done on most formal tests at low bitrates.

So, I'd like to know your thoughts on what I planned.

Thanks for your attention;

Roberto.

Pre-Test thread

Reply #1 – 2003-08-22 09:13:17

I wonder if mp3PRO VBR is reliable. FhG fastenc VBR seems to be bad tuned (thats what I read here). So maybe we should consider to take CBR instead.

Pre-Test thread

Reply #2 – 2003-08-22 09:16:52

Quote

The codecs that will be tested are:
...

What about ATRAC 3 Plus (Net MD)? Should be interesting IMO.

Quote

...
someone suggested me that I use Lame ABR 128 as higher anchor, so that we can verify which one of these codecs really deliver the marketing of "sounds like MP3 at half the bitrate"

Great idea.

Quote

Then, the lower anchor would be a standard 3.5kHz lowpass, like it is done on most formal tests at low bitrates.

It's not completely clear to me what this is good for. Are the lowpassed samples supposed to be rated "1" while lame ABR 128 is "5"? To me some artifacts (ringing, "underwater-warbling" sound more annoying than even a 3.5kHz lowpass. If I got the concept of "anchors" right - why not use something like 64kpbs mp3 as lower anchor (maybe not --alt-preset 64 but something "tuned" like CBR@full stereo + no resampling)?

EDIT: Just noticed the bitrates from the other thread:
MP3Pro :::::: 66.5
HE-AAC ::::: 65.3
Ogg Vorbis :: 61.8

I've suggested it before but didn't find any reply, so again (sorry ):
What keeps you from adjusting Vorbis' -q setting to get a 65.x bitrate?

Pre-Test thread

Reply #3 – 2003-08-22 09:22:03

Quote

I wonder if mp3PRO VBR is reliable. FhG fastenc VBR seems to be bad tuned (thats what I read here). So maybe we should consider to take CBR instead.

The tests at http://www.soundexpert.info/ give similar results @ 64kbps (CBR better than VBR). Here Nero MP3Pro encoder was used. I have no idea if these results are reliable though.

Pre-Test thread

Reply #4 – 2003-08-22 09:38:05

Beat me, but what about MP3 (LAME or fastenc) at 64 kbps mono?
OK, it's easy to ABX, but that's not the point in a 64 kbps-test. The question is, how annoying it is in comparison.
IMHO, 64 kbps mono really is a choice for streaming and some portable devices.

Pre-Test thread

Reply #5 – 2003-08-22 11:17:01

Is it possible to have QuickTime AAC files also in this test?

EDIT: And what about PNS for nero? With quick test i heard that it is useful option to use low bitrates.

Pre-Test thread

Reply #6 – 2003-08-22 11:44:25

With regard to the WMA codec (you are correct that there is no 64k Pro profile), are you intending to use 64k CBR, or 64k VBR (2 pass), Roberto? CBR would better suit streaming, but given the other codec profiles, the VBR (2 pass) mode would be a fairer comparison.

Doug

Pre-Test thread

Reply #7 – 2003-08-22 16:59:21

Quote

I wonder if mp3PRO VBR is reliable. FhG fastenc VBR seems to be bad tuned (thats what I read here). So maybe we should consider to take CBR instead.

Hrm, I really have no idea.

Quote

What about ATRAC 3 Plus (Net MD)? Should be interesting IMO.

Hrm... I don't know. Does it implement m/s or IS stereo? (Atrac3 doesn't)

And where is it used at? AFAIK, not even minidisc units play it.

Besides, I had a very bad experience with SonicStage, and I'm not sure I want to try to install it on my fresh system :/ (My HDD recently crashed and I bought a new one, just finished reinstalling everything)

Quote

Are the lowpassed samples supposed to be rated "1" while lame ABR 128 is "5"

The point of using anchors is to put results into perspective, by applying a process that doesn't variate depending on sample complexity. A lowpass is the same for a sonata and for castanets.

That perspective is valid both for the test participant and for the final results.

Quote

If I got the concept of "anchors" right - why not use something like 64kpbs mp3 as lower anchor (maybe not --alt-preset 64 but something "tuned" like CBR@full stereo + no resampling)?

Yes, that's another option. That's why I started this thread

Quote

I've suggested it before but didn't find any reply, so again (sorry ):
What keeps you from adjusting Vorbis' -q setting to get a 65.x bitrate?

I don't know. I was under the impression that people would only use -q 0 for their encoding needs at 64kbps, so a test at -q 0.2 wouldn't be of much use to them. (I think)

But, indeed, it might end up being the best sollution.

Quote

IMHO, 64 kbps mono really is a choice for streaming and some portable devices.

Stereo -> Mono downmixing is preprocessing, and no preprocessing should happen on these tests. (at most, fade in/out where the samples are cut)
So, if MP3 is tested, it must be on stereo. Or downmix for all encoders and do a mono test. Else, we're also comparing apples and oranges. (damn, I hate that metaphor)

Quote

Is it possible to have QuickTime AAC files also in this test?

Probably not. No matter how good it is, I doubt it can compete with AAC + SBR. I'm choosing the best codec for each format, and for MPEG4 audio at low bitrates, it's probably Ahead.

Quote

EDIT: And what about PNS for nero? With quick test i heard that it is useful option to use low bitrates.

True, but I don't think it would be good for HE AAC. PNS is only applied to the AAC part, so, according to Menno, there might appear a weird "hole" in the frequencies.

Still, I'll ask Ivan about his thoughts on using PNS.

Quote

With regard to the WMA codec (you are correct that there is no 64k Pro profile), are you intending to use 64k CBR, or 64k VBR (2 pass), Roberto? CBR would better suit streaming, but given the other codec profiles, the VBR (2 pass) mode would be a fairer comparison.

Yes, I will probably go with VBR. Forgot to mention that at the first post.

Thanks a lot for your comments.

Best regards;

Roberto.

Pre-Test thread

Reply #8 – 2003-08-22 17:44:41

thanks for this test rjamorim!

as vorbis 1.0.1 should be released in september perhaps you should use this version for your test

Quote

Real Audio Cook 64kbps (I didn't investigate other settings yet. Comments welcome)

i am sure if you ask karl lillevold (the guy from real on doom9) he will find out which realaudio codec (cook, atrac.. + last versions available) and which settings should be used to reach the best results with realaudio at 64kbps

Pre-Test thread

Reply #9 – 2003-08-22 17:47:33

Quote

Quote
What about ATRAC 3 Plus (Net MD)? Should be interesting IMO.

Hrm... I don't know. Does it implement m/s or IS stereo? (Atrac3 doesn't)

I suggested ATRAC3Plus because it seems to perform quite well at 64kbps at www.soundexpert.info listening test so far - and it has similar hardware support as Ogg Vorbis ATM.

The most detailed technical description I could find are here and here.

Seems like the newest big improvement is the use of dynamic bit allocation between channels (full stereo, no m/s or is) instead of double mono.

As this format is even more closed than WMA (no plugins for encoders/players available etc.) and hardware players are (and will be) only available by one company I'd say it's a waste of resources to include it in the test.

Pre-Test thread

Reply #10 – 2003-08-23 03:06:33

Quote

as vorbis 1.0.1 should be released in september perhaps you should use this version for your test

Well, there are some things I would need to know first.

First, will it happen at the beginning of September or near the end?

Second, will it include updates that make it worth the wait? I.E, will the 64kbps and surroundings coding be improved?

I can wait for a new release, no problem, but if it deals with other issues, I see no reason to postpone.

And, of course, I'm expecting they will deliver the update on time. :B

Quote

i am sure if you ask karl lillevold (the guy from real on doom9) he will find out which realaudio codec (cook, atrac.. + last versions available) and which settings should be used to reach the best results with realaudio at 64kbps

I already talked to him on other tests (he actually participated on both), and he said he would be able to help me.

Atrac3 won't be a good option. It doesn't even offers m/s stereo coding - all streams are encoded like Dual Mono. I would expect such codec to be even worse than MP3 at these bitrates.

Regards;

Roberto.

Pre-Test thread

Reply #11 – 2003-08-23 03:43:40

Quote

WMAv9 Standard 64kbps (there's no PRO version at such bitrate, AFAIK)

why don't use quality 10 to 25 of WMAPRO 9 44 kHz, 2ch, 24-bit

I've tested before the quality 10 at ~50 kbps sounds nicer than the standard 64kbps
quality 25 size will be ~64 kbps, but most of the time a few kb smaller than WMA9 standard 64 kbps

Pre-Test thread

Reply #12 – 2003-08-23 03:44:18

Quote

What keeps you from adjusting Vorbis' -q setting to get a 65.x bitrate?

I don't think it really makes sense to do this because it doesn't mirror a real world usage scenario. People are not going to use a different -q setting per sample to reach a set bitrate every single time they encode a different file. Instead, they pick a quality setting and stick with it. It may turn out that on this sample set, Vorbis averages a little low, but on another sample set, it's going to be the opposite. Given this and the fact that -q0 is widely recognized as given "64kbps" (even the Xiph guys seem to support this on a wide scale), this is the setting that IMO should be used.

As Roberto pointed out earlier, and I agree, I think that adjusting the settings away from the common incarnations, just to "set" the bitrate on this test, calls into question it's credibility. IMO, it's one thing to adjust settings downward to try and reach a set bitrate (as was done with MPC in the previous test), since this should only really have the affect of worsening the results, but it's entirely different to adjust the settings upward to compensate for a lack of accuracy in encoding. If Vorbis, or any of these other codecs, happen to use too few bits per sample in their VBR modes in any given case, that points to a possible flaw in the encoding scheme and any adjustment around this just goes to hide the very issues that we are trying to discern in the first place.

This test is about quality, and the test subjects are VBR coders. The points of the test are to measure fidelity at a given quality mode, with bitrate being used only as a rough guideline and mode of classification (not implicit comparison). People should realize that very important fact and accept the implications that come along with it (possible VBR pitfalls) in the context of this test. And finally, in the test results, we should be interested in the representation of -- and significance in relation to -- real world usage scenarios rather than technicalities beyond the concerns of the majority of the readers (something like using -q0.x vs -q0).

Pre-Test thread

Reply #13 – 2003-08-23 03:48:30

Quote

I suggested ATRAC3Plus because it seems to perform quite well at 64kbps at www.soundexpert.info listening test so far - and it has similar hardware support as Ogg Vorbis ATM.

Well, even though it has some hardware support, it's nearly not usable anywhere else. You can only use it on Windows, using that $%#&! SonicStage software, and there's DRM.

Even though Vorbis still lacks good hardware support, you can encode and play it almost everywhere, software and operational system-wise.

Quote

Seems like the newest big improvement is the use of dynamic bit allocation between channels (full stereo, no m/s or is) instead of double mono.

Man, what's wrong with these Sony engineers? :B

Channel redundancy coding is not rocket science, it's been around in audio formats for more than 15 years now.

Quote

As this format is even more closed than WMA (no plugins for encoders/players available etc.) and hardware players are (and will be) only available by one company I'd say it's a waste of resources to include it in the test.

I would also agree with that. Heck, even VQF probablyt still has a bigger user base than Atrac3 plus.

Regards;

Roberto.

Pre-Test thread

Reply #14 – 2003-08-23 04:36:17

Quote

As Roberto pointed out earlier, and I agree, I think that adjusting the settings away from the common incarnations, just to "set" the bitrate on this test, calls into question it's credibility.

Not at all. This should give the same result as 2-pass ABR, just done manually, instead of automatically.

I do understand where Roberto and co. are coming from and saying that the mode used should be the VBR mode that produces an average of "x" kbps for a broad spectrum of music. The problem is that we're not testing that broad spectrum of music. We're testing several individual samples. In terms of equality, ABR works much better for a small, focused listening test like this one, merely because it provides both a single, standard testing methodology, and uniform bitrates across samples.

The tester should make every effort to provide as little difference between codecs as possible. Roberto did not do that with the 128 test. Instead, he threw together a mismatch of VBR and ABR, rationalizing his use of VBR with the broad-spectrum tests. Here I see ABR is going to be put up against VBR and CBR, if I understand correctly. This seems even more ludicrous than before. I understand the reasons behind it, that there's no quick and easy way to solve this problem, but it remains a problem. It remains an inequality between codecs. This is the problem I have with the theory behind these tests.

I understand that some people do not agree with me. But that is where I stand, and there seem to be people that agree with me. *shrugs*

Pre-Test thread

Reply #15 – 2003-08-23 05:01:32

Quote

The tester should make every effort to provide as little difference between codecs as possible. Roberto did not do that with the 128 test.

On the contrary. The tester should make every effort possible to represent each codec in the state in which it was designed to function, whether it be ABR, CBR, VBR. You don't lock any particular codec into a certain mode just to make it "fair."

If you were testing the abilities of different supercars would you disable the front wheels of the 4 wheel drive models just because there were also entries that only had rear-wheel drive? I don't think so.

This is the only way to conduct a test which best represents real world applications.

Pre-Test thread

Reply #16 – 2003-08-23 05:05:30

Quote

Quote
As Roberto pointed out earlier, and I agree, I think that adjusting the settings away from the common incarnations, just to "set" the bitrate on this test, calls into question it's credibility.

Not at all. This should give the same result as 2-pass ABR, just done manually, instead of automatically.

I understand that some people do not agree with me. But that is where I stand, and there seem to be people that agree with me. *shrugs*

You seem to have missed the main point of my post:

This test should reflect a real world usage scenario as much as possible.

Artificially tuning the settings does not do this, not to mention that 2-pass ABR doesn't even exist for these codecs (with the exception of WMA maybe? even so, this is not case being called into question).

Quote

I do understand where Roberto and co. are coming from and saying that the mode used should be the VBR mode that produces an average of "x" kbps for a broad spectrum of music. The problem is that we're not testing that broad spectrum of music. We're testing several individual samples.

This is arguable. We are testing several individual samples comprising a broad spectrum of music. The two conditions are not mutually exclusive. Granted we are not testing every type of music, but that is impossible in any case, and if we're going to nitpick on that point, then we might as well forgo the test entirely.

No matter what the case here, people are going to have to accept that these codecs are not being tested under every condition (genre of music, or even sample for that matter) possible. That means that the results, same as with all tests, have to be taken with a grain of salt, and within context.

Quote

In terms of equality, ABR works much better for a small, focused listening test like this one, merely because it provides both a single, standard testing methodology, and uniform bitrates across samples.

I disagree. For one, most ABR modes are based upon VBR (so they share many possible flaws). They are further encumbered though by catering to bitrate first, and quality second (or quality restrained or superceded by bitrate if you prefer).

As I understand it, simply by virtue of emphasing VBR as the method of choice in this test, the point is to measure quality first and bitrate second here. I see no implicit virtue in testing ABR over VBR either -- it would seem to me to be a rather arbitrary choice and one not directly tied to the test's focus which, once again, should be related to provided quality in a real world usage scenario.

Most ABR modes in audio codecs are not entirely accurate in terms of bitrate either, so there is no guarentee that using such a mode will provide any more stable of a foundation than VBR in the first place.

Quote

The tester should make every effort to provide as little difference between codecs as possible.

No, the only real necessity is that the tester attempt to equalize the perceived classification of the codecs being tested, only making adjustments where gross mismatches occur.

This means that the absolute bitrate is not nearly as relevant as the wide-scale average bitrate (going beyond samples in this test) or usage scenarios.

What I believe we should be focusing on first is, again, the perceived classification (-q0 vorbis is said to compete with other codec at 64kbps avg mode, for example) first, and the average bitrate second, not worrying about the absolute sample by sample bitrate.

Quote

Roberto did not do that with the 128 test. Instead, he threw together a mismatch of VBR and ABR, rationalizing his use of VBR with the broad-spectrum tests.

I cannot speak directly for Roberto, but I believe this is because he was approaching the test from the perspective that I just laid out above.

Quote

It remains an inequality between codecs. This is the problem I have with the theory behind these tests.

The problem is that these codecs aren't equal, and so any test including them cannot expect the conditions of comparison to be pefect down to the attribute level of every single codec, only varying by common value. These codecs are different in abilities and behavior, and so if we are to test them and compare them meaningfully and representatively, we need to focus on measuring a generalized and abstracted point (quality levels) not a specific and particular point (performance at exact bitrates).

Pre-Test thread

Reply #17 – 2003-08-23 05:09:05

Quote

This seems even more ludicrous than before.

Why, thank-you for your kind words.

Let me quote my guru here, Darryl Miyaguchi:

"For those who would have done it differently: the opportunity is still there for you! If you can dish out the criticism, can you stand to take it too?"

I won't really bother answering to the other points you made. They have been explained time and time again, with very good justificatives, by ff123, JohnV, Garf, me...

You didn't try to invalidate any of these justificatives, like the ones Wildboar and Dibrom kindly just pointed out. Instead, you just repeated what people have been repeating time and time again ad nausea.

If you can't understand what people have posted in this thread explaining why your points of view are flawed, I see no hope really.

Edit: interestingly, I replied to one post by you in that thread. Seems you missed it completely.

Pre-Test thread

Reply #18 – 2003-08-23 06:09:57

@wildboar:
The multi-pass ABR method I described would work perfectly for a VBR system like Vorbis's or Musepack's. It is in no way akin to your analogy. I understand the need to mix ABR and CBR here. I do.

The analogy falls flat, though, because it is nothing more than superficial. It doesn't relate at all to the test other than in a very, very general sense. And it's not adequate for discussion.

Perhaps something more adequate in describing the diffence between ABR/VBR/CBR is as follows:

We're studying the genetics of the tail lengths of dark cats trying to find a breed that has the longest tail, whilst remaining suitably dark. Due to some bizarre reason, we can only pick breeding pairs that are dark, and we study the tail lengths of the kittens.

So, do we pick parent cats that have:

A great majority black , but several pure white kittens? (VBR)
All dark kittens, with some that are black, some that are lighter? (ABR)
Or kittens that are all a uniform shade of dark grey? (CBR)

If all the kittens coat colours were properly weighted (however that is done) and averaged, and they all worked out to the uniform shade of dark grey of the CBR kittens, would all breeding pairs be acceptable as parents to be tested?

It adequately analogizes in my mind. I may not have explained it thoroughly enough, so I hope you all can catch my drift.

@dibrom:
Thank you for the in-depth response, foremost. I did decide to pick one specific area and debate that. I got your main thrust, and I understand the need for similarity to the real-world. What I meant to do was describe the way an equivalent of a 2-pass ABR mode could be achieved using a numerical quality selector.

I'm going to forgo debating every single point of your response. You raised several more points that are opinion-based, and we could argue them all day and achieve nothing.

You did make some good non-opinionated points, though. I'll pick a few that stand out to me.

Quote

What I believe we should be focusing on first is, again, the perceived classification (-q0 vorbis is said to compete with other codec at 64kbps avg mode, for example)

It has this perceived classification? My understanding was that the coders intended -q0 Vorbis to work out to 64kbps on average, not compete with a 64kbps average codec. Furthermore, it was my understanding that everyone acknowledges that -q0 may be a little off in one direction or another.

The name of the test states that it's testing 64kbps codecs. I'd think that this implies that perceived classification does not enter into the picture; rather, that 64kbps should be the bitrate, or something thereabouts.

Quote

No, the only real necessity is that the tester attempt to equalize the perceived classification of the codecs being tested, only making adjustments where gross mismatches occur.

What's the point of the test then, if the codecs aren't on even ground? Attempt to equalize, so, in other words, set the codecs up so there's as little difference between them as possible? That's what I meant, if not what I said. We agree here.

Quote

we need to focus on measuring a generalized and abstracted point (quality levels) not a specific and particular point (performance at exact bitrates).

But ABR does not do that. ABR shifts the focus to the latter, not the former. CBR doubly so. This is exactly the point that bothers me.

I understand that Roberto's aiming for real-world results, and thus I can see why he does not wish to use anything other than the encoding methods directly available through the encoders, but there are some problems, that I think can have some detrimental effect on the overall test.

Dibrom, I apologize if I glossed over your message. It was long, and there was plenty of issues for me to address, so if I missed something, please tell me.

@rjamorim:

Quote

Why, thank-you for your kind words.

Let me quote my guru here, Darryl Miyaguchi:

"For those who would have done it differently: the opportunity is still there for you! If you can dish out the criticism, can you stand to take it too?"

Forgive me for wording that as strongly as I did. I didn't mean to be unkind, I meant to emphasize that I would have done things in a different manner, had I been the organizer. I'm not. You're putting in a great amount of time and effort to set the test up and to defend it. You have my respect for that, and I do not mean to seem otherwise. I can take criticism; I thrive on it. I do not presently get as much of it as I would like. I also greatly appreciate what you're doing. I forgot to emphasize that. If you weren't taking the time to do it presently, it wouldn't be getting done. That said, I had problems with the way the test was performed, although I think the LAME 128ABR anchor and the reasoning behind it was a stroke of genius, and added a human touch to all the dry science and ABX tests.

I read through the 128 test's explanations. I disagree with some of their assertions. Ultimately, what we're dealing with here is a difference of opinions. I suppose I've made enough noise about this to last for some time. I won't bring this topic up again, I just hoped I could make a difference and explain to other people the way I saw things. We'll see how you all take my cat analogy.

EDIT: And yes, I did see that response. I just disagree.

Pre-Test thread

Reply #19 – 2003-08-23 06:17:19

Quote

I think the LAME 128ABR anchor and the reasoning behind it was a stroke of genius

Of course it was. Darryl suggested it

Pre-Test thread

Reply #20 – 2003-08-23 08:33:16

Quote

Atrac3 won't be a good option. It doesn't even offers m/s stereo coding - all streams are encoded like Dual Mono. I would expect such codec to be even worse than MP3 at these bitrates.

Umm sorry, but just for the sake of accuracy, this is not correct. I can not speak for ATRAC3plus, but ATRAC3 @ 66 kbits uses joint stereo. Whether it uses it well is another story...

There are clear references to this both on the Sony site, and in my MD manual. One quote, "ATRAC3 in LP4 mode encodes audio in "joint-stereo" mode, encoding the left and right channels in one step (i.e. jointly) and exploiting the similarity between channels to increase compression..."

All the same I think leave out ATRAC3 unless you are still looking for that low anchor!

Den.

Pre-Test thread

Reply #21 – 2003-08-23 08:38:59

Quote

I don't think it really makes sense to do this because it doesn't mirror a real world usage scenario. People are not going to use a different -q setting per sample to reach a set bitrate every single time they encode a different file. Instead, they pick a quality setting and stick with it. It may turn out that on this sample set, Vorbis averages a little low, but on another sample set, it's going to be the opposite. Given this and the fact that -q0 is widely recognized as given "64kbps" (even the Xiph guys seem to support this on a wide scale), this is the setting that IMO should be used.

This sounds reasonable to me. The weak point I see here is the real world scenario. There are several theoretically thinkable ways to get (=measure) figures about bitrate-wise behaviour of the tested VBR codecs under average real world conditions. E.g. taking statistics about sold records of different genres, encode huge numbers of samples and calculate an average bitrate weighted by the statistics ... . All possibilities that come to my mind here are just too much effort.

So there are two possibilities left:
1) Taking "Ogg vorbis -q0 averages 64kbps" as best possible assumption because "it's widely recognized as true" OR
2) Changing overall -q setting for the test as I've suggested.

Both have their problems:
1) We try to set up a test to get hard, comparable figures out of human subjective perception by double blind testing, statistical analysis etc., but we choose codec settings based on an assumption that is nothing than "widely recognized as true". That could lead to an uncalculable insecurity of the results we get.

2) We "adjust" average bitrates, but we don't know how close they mirror a real world scenario either (it's very hard to define "real world scenario" anyway), which leads to a similar insecurity.

Quote

As Roberto pointed out earlier, and I agree, I think that adjusting the settings away from the common incarnations, just to "set" the bitrate on this test, calls into question it's credibility.

As I tried to explain above, both possibilities have similar problems about what you call credibility here.

Quote

IMO, it's one thing to adjust settings downward to try and reach a set bitrate (as was done with MPC in the previous test), since this should only really have the affect of worsening the results, but it's entirely different to adjust the settings upward to compensate for a lack of accuracy in encoding. If Vorbis, or any of these other codecs, happen to use too few bits per sample in their VBR modes in any given case, that points to a possible flaw in the encoding scheme and any adjustment around this just goes to hide the very issues that we are trying to discern in the first place.

I don't understand why you refer to vorbis needing fewer bitrates on some type of music as "lack of accuracy in encoding". One could also say "Vorbis is very good at encoding this type of music because it reaches a certain quality level needing less bits than encoding other music". Isn't this what VBR is about? So *not* adjusting Vorbis' bitrate could be seen as punishment for the good performance.

Quote

This test is about quality, and the test subjects are VBR coders. The points of the test are to measure fidelity at a given quality mode, with bitrate being used only as a rough guideline and mode of classification (not implicit comparison). People should realize that very important fact and accept the implications that come along with it (possible VBR pitfalls) in the context of this test.

Unfortunately the relationship between VBR bitrate and measured quality (1-5) isn't defined mathematically, e.g. there's no linear correlation. In situations like this a test can only deliver comparable results if only one parameter is measured while the others are fixed to the same level. (You can't compare how much fuel different cars need per 100 miles by letting each one drive at different speeds). Because of this bitrates have to be as close as possible, otherwise it'd be useless to measure quality.

Quote

And finally, in the test results, we should be interested in the representation of -- and significance in relation to -- real world usage scenarios rather than technicalities beyond the concerns of the majority of the readers (something like using -q0.x vs -q0).

As it's very hard to get a widely accepted definition of "real world scenario" (e.g. mine would consist of > 50% latin music ) and even harder to get total averaged numbers about bitrate-wise behaviour of VBR codecs we should take what we have got for sure and assume our set of test samples as mirror of "real world" IMHO. Both possibilities have their pros and cons - and I can understand and will accept (do I have a choice? B) ) if rjamorim decides to stick with -q0.

[span style='font-size:7pt;line-height:100%']edit: grammar, typos, clarification[/span]

Pre-Test thread

Reply #22 – 2003-08-23 11:18:31

For Real Gecko Audio you should use the "64 kbps Stereo Music RA8" setting (codec flavor 24) (Warning: there is also "64 kbps Stereo Music", but it uses an older codec version!). The easiest way is to download the newest helix producer, make your audience file (you can use a gui) and run the application with cli (or use a gui).
Real only offers CBR at the moment.

For real-life testing I´m suggesting the following, for ABR codecs this could be very useful and much more real-life:
Encode the entire song and then cut the sample out, so the encoder can decide the bitrate distribution over the entire song, not for one little part. So for the critical parts, which are tested, more bits can be used, while for the other parts less bits can be used. The average bitrate is now 64kbps over the entire song, not for this 20sec sample. But also, if the vbr algorithm fails, maybe for this part less bits are used. Encoding the entire song is much more real-life and shows better how good the vbr algorithm is.

For mp3pro, I suggest using the VBR quality mode, it gives good results (in CEP 2). There is no reason for using cbr. The vbr algorithm works fine, better than other sometimes, because it doesn´t reduces the bitrate dramatically at parts with low volume. This is good with classical music.

Pre-Test thread

Reply #23 – 2003-08-23 12:54:30

Some comments :

atrac3plus would be nice to test. Sony is putting some marketing investment in this format. I recently saw some portable CD/mp3 players reading atrac3plus ; there are flash memory players based on this new audio format too. Nevertheless I don't expect to see any other manufacturer supporting a Sony format, and therefore audience of this format may be very limited in the future. Unfortunately, including atrac3plus is a real pain. Encoding/decoding through SonicStage, capturing through TotalRecording, editing offset and file precise length with CEP for 12 samples is in my opinion a real torture. And I don't forget that all samples have to be flacced in order to intergrate the public archive, increasing the weight, downloading time and server bandwith. Sad...

wma9 at 64 kbps is probably the most used encoder/setting at this bitrate, by a lot of people having a small USB mp3player. Not on HA of course... but I often read positive comments and propaganda for wma9@64 kbps = mp3@128, at least in nomad conditions. Including wma 'standard' and lame 128 as anchor, is a nice idea, and will be a good, official reference for fighting this optimistic equation. Of course, I would like to see wma9pro performance too, even at 50 kbps, and compare it to wma9 'standard'. I was positively surprised by the performance of the format with classical music (but horrified with loud music), but I didn't tried with many samples...
Due to the lack of CLI decoder of both WMA/WMApro, I suppose difficult to include the two encoders :/
(note : there is a VBR setting for wma9 standard - is it possible to consider it ?)

AAC - HE-AAC : an AAC encoding, opposed to a HE-AAC one, may show some surprise (I don't know) : SBR is a nice tool, but some reverse effects are not impossible (they exist, I'm sure). Can we add an encoding ? File size isn't an issue (faad2 is already present), but challengers number is one, maybe... Why not Nero ABR, with PNS ? Or maybe Sorenson encodings

Pre-Test thread

Reply #24 – 2003-08-23 16:17:33

Quote

Umm sorry, but just for the sake of accuracy, this is not correct. I can not speak for ATRAC3plus, but ATRAC3 @ 66 kbits uses joint stereo. Whether it uses it well is another story...

Well, that's what Karl Lillevold told me, I don't know...

Quote

All the same I think leave out ATRAC3 unless you are still looking for that low anchor!

Hehe. OK.

Quote

Encode the entire song and then cut the sample out

That's not a real possibility because, of all the 12 samples, I only have one of them in it's entirety. It would require that people send me the entire songs for each sample. And then I would be accountable for piracy. You get the problem? :B

Quote

atrac3plus would be nice to test. Sony is putting some marketing investment in this format. I recently saw some portable CD/mp3 players reading atrac3plus ; there are flash memory players based on this new audio format too. Nevertheless I don't expect to see any other manufacturer supporting a Sony format, and therefore audience of this format may be very limited in the future. Unfortunately, including atrac3plus is a real pain. Encoding/decoding through SonicStage, capturing through TotalRecording, editing offset and file precise length with CEP for 12 samples is in my opinion a real torture. And I don't forget that all samples have to be flacced in order to intergrate the public archive, increasing the weight, downloading time and server bandwith. Sad...

Right. I am still not sure there is/wil be much demand for Atrac3plus, and given that it would only increase the mess... :/

Besides, keep in mind that if I go with the idea of 5 encoders + 2 anchors, that already means 7 tests for each sample. Few people have patience for that.

Quote

wma9 at 64 kbps is probably the most used encoder/setting at this bitrate, by a lot of people having a small USB mp3player. Not on HA of course... but I often read positive comments and propaganda for wma9@64 kbps = mp3@128, at least in nomad conditions. Including wma 'standard' and lame 128 as anchor, is a nice idea, and will be a good, official reference for fighting this optimistic equation. Of course, I would like to see wma9pro performance too, even at 50 kbps, and compare it to wma9 'standard'. I was positively surprised by the performance of the format with classical music (but horrified with loud music), but I didn't tried with many samples...
Due to the lack of CLI decoder of both WMA/WMApro, I suppose difficult to include the two encoders :/

Right. I guess either format would do for the test, but I'm not willing to add both. And for the same reasons pointed above: burden on the listeners, files must be delivered in FLAC format... (not to mention people saying I'm favouring MS by adding two WMA flavors)

So, it boils down to weighting which one is of more interest for this test. What do you people think?

Quote

(note : there is a VBR setting for wma9 standard - is it possible to consider it ?)

Yes, but it's worth thinking: Will people get interested in VBR 64? I am under the impression that people are mostly using CBR 64, so it's closer to a "real world scenario". Opinions? Ideas?

Quote

AAC - HE-AAC : an AAC encoding, opposed to a HE-AAC one, may show some surprise (I don't know) : SBR is a nice tool, but some reverse effects are not impossible (they exist, I'm sure). Can we add an encoding ? File size isn't an issue (faad2 is already present), but challengers number is one, maybe... Why not Nero ABR, with PNS ? Or maybe Sorenson encodings

Well, the only issue preventing such from happening is, as I said before, burden on participants. Do you think people won't get tired with 8 samples to test? What is your opinion?

Best regards;

Roberto.

Notice