Upcoming ABR vs VBR listening test

Topic: Upcoming ABR vs VBR listening test (Read 7972 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Upcoming ABR vs VBR listening test

2002-01-30 20:17:47

It is well known that at low and medium bitrates up to 160kbps, there is no VBR setting which is better than the ABR setting at equivilent, especially with the ABR --alt-presets now available since Lame v3.90. Before Lame v3.90, --r3mix is generally accepted as the way to achieve the two main goals of VBR:

1) To obtain the best quality possible at the bitrate it encodes at
2) To be able to flex the bitrate according to the difficulty of the track being encoded

From v3.89 to v3.90, --r3mix has not improved. However, ABR took a big step ahead with the --alt-presets in v3.90. Is it possible that the ABR quality has surpassed --r3mix quality at the same bitrate for a track? From a quick and short listening test I did with a couple of good ears, it seems that this may be likely. Therefore the first hypothesis to be tested by the listening test is:

1. --r3mix gives inferior quality that an --alt-preset ABR encoded at the same bitrate. If test concludes that this is true, that means that --r3mix fails to satisfy the first goal of VBR stated above.

Even if the first hypothesis is shown to be true by the test, we cannot replace --r3mix because it is not practical to guess at the --alt-preset <bitrate> when a user is encoding an album. For practical use to encode, the encoding command line has to be a fixed --alt-preset <bitrate>, and in order to replace --r3mix, this encode has to perform as good as or better than --r3mix for all materials to be encoded, including the harder ones which normally brings --r3mix over 200kbps. This might be possible because while ABR doesn't flex its bitrate as much as a true VBR command line, it does flex a little. The bitrate chosen will be open for discussion, but for the time being, I will choose with "--alt-preset 176", which happens to be very close to the nominal target --r3mix bitrate of 175, and halfway between 160 and 192. So the second hypothesis to test is:

2. "--alt-preset 176" gives equal or better quality than "--r3mix" across various types of music.

To test these two hypothesis, a grouped blind listening test will be conducted, similar to ff123's 128 test 1 and 2 and r3mix's AQ test. I will be using 3-5 different samples for testing, with 4 different encoding methods to test:

1) --alt-preset cbr 128 (the placebo)
2) --r3mix
3) --alt-preset <variable> (adjusted to give the same filesize as --r3mix)
4) --alt-preset 176

If the results for this first ABR vs VBR test warrants it, I am planning to follow it up with a second test to pit ABR against the --alt-preset standard VBR preset.

I am currently preparing the test, if you have any questions, suggestions or comments about the upcoming test, please feel free to voice them out.

Thanks.

Upcoming ABR vs VBR listening test

Reply #1 – 2002-01-30 21:30:31

Seems to me that the choice of samples will be all important in this test. You'll probably want to choose an equal number of low bitrate and high bitrate samples (as r3mix encodes them). r3mix will probably have some trouble with the low bitrate samples and abr 176 will have trouble with the high bitrate stuff. At least that is the most sensible guess of what will happen.

Minor quibble: cbr 128 is an "anchor," not a "placebo." It's there to remind people what "bad" sounds like. Hopefully, it is bad enough in all cases.

You'll probably want the best ears you can find for this type of test, since I'll wager the differences between encodes will be small in most cases.

It would be nice if ABC/HR could be used. Ivan's waverate can be configured as ABC/HR, but then people would have to download WAV files, and it would limit the test to Windows users.

ff123

Upcoming ABR vs VBR listening test

Reply #2 – 2002-01-30 21:41:46

A couple of comments:

Quote

Originally posted by tangent
Even if the first hypothesis is shown to be true by the test, we cannot replace --r3mix because it is not practical to guess at the --alt-preset <bitrate> when a user is encoding an album. For practical use to encode, the encoding command line has to be a fixed --alt-preset <bitrate>, and in order to replace --r3mix, this encode has to perform as good as or better than --r3mix for all materials to be encoded, including the harder ones which normally brings --r3mix over 200kbps. This might be possible because while ABR doesn't flex its bitrate as much as a true VBR command line, it does flex a little.

I feel that it would certainly be possible to "fix" this small issue if ABR is found to be nominally better than r3mix. In a manner similar to the way the alt-preset vbr settings work, I've been planning to update the --alt-presets to "flex" more (it would also be possible to disable this extra flexibility via an option) on difficult signals. It really isn't too difficult a task to include this functionality at all. In fact, depending on the timeline of the test, it may even be possible to implement some of this in the abr modes as another variant to test...

Upcoming ABR vs VBR listening test

Reply #3 – 2002-01-30 21:50:20

Quote

Originally posted by ff123
Seems to me that the choice of samples will be all important in this test. You'll probably want to choose an equal number of low bitrate and high bitrate samples (as r3mix encodes them). r3mix will probably have some trouble with the low bitrate samples and abr 176 will have trouble with the high bitrate stuff. At least that is the most sensible guess of what will happen.

I very much agree with all of this. Choosing the correct samples is critical here.

Quote

It would be nice if ABC/HR could be used. Ivan's waverate can be configured as ABC/HR, but then people would have to download WAV files, and it would limit the test to Windows users.

The "BlindAudio" program Sphoid and I are working on should be released by the end of the week, so it should certainly be possible to use a cross platform ABC/HR implementation for this test. Right now, all that is left before the beta is released is to implement the test result layout and encryption, which is for the most part a simple task.

Upcoming ABR vs VBR listening test

Reply #4 – 2002-01-30 23:49:56

Before I forget, you should probably append --scale 1 to the presets which need it (e.g., --alt-preset cbr 128 --scale 1); I believe you'll have to do the same thing for --r3mix as well.

On another note, in the sample you gave me to listen to (gekkou), I believe there is probably that problem that mp3 has with the first couple of seconds or so (although i didn't verify). It should be fair in one way, because both samples are given an equal opportunity, so to speak. In another way, as a measure of how good each setting really is in real-world conditions, maybe it's not so fair to judge the quality of a setting based on the badness one hears in the first couple seconds.

Just some thoughts.

ff123

Upcoming ABR vs VBR listening test

Reply #5 – 2002-01-31 00:02:58

Oh, that reminds of me of clipping as well: If any files clip, I would probably want to apply 1.5 dB attenuation via mp3gain (or whatever it takes to prevent them from clipping).

ff123

Upcoming ABR vs VBR listening test

Reply #6 – 2002-01-31 06:34:18

--r3mix is obsolete and should be considered deprecated. With the new --alt-preset system, there is little reason to keep using --r3mix. I always considered --r3mix to be a hookey preset that advocated the arbitrary use of certain switches (like -Z, which is not something you should always use).

Upcoming ABR vs VBR listening test

Reply #7 – 2002-01-31 07:41:40

Thanks for the useful advices, ff123. I will definitely be checking for clipping with mp3gain, and applying a constant attentuation across all the samples if any of them is found to clip.

Selecting the samples is going to be quite a challenge, I guess. Looks like I will have to use 5 samples rather than 3 due to the nature of this test.

If BlindAudio can be ready by end of the week, that's probably when I will start the test. Any chance flexi-ABR will be ready at the same time? Just kidding

Upcoming ABR vs VBR listening test

Reply #8 – 2002-02-01 01:49:33

Here's an analysis trade off to consider:

The more settings you have to compare, the disadvantage is that it becomes harder to get significant differences between means because you start taking a big hit by adjusting the p-value for multiple samples.

However, the fewer settings you have to compare, the disadvantage is that it becomes harder to apply post-screening or the correlation technique.

The best situation would be to use a group of experienced and sensitive listeners you think will correlate highly with each other, and to try to limit the settings you compare.

ff123

Upcoming ABR vs VBR listening test

Reply #9 – 2002-02-01 06:44:00

Quote

Originally posted by tangent
Selecting the samples is going to be quite a challenge, I guess. Looks like I will have to use 5 samples rather than 3 due to the nature of this test.

Do you have a list in mind? I might be able to suggest a few myself depending on what you are going for already.

Quote

If BlindAudio can be ready by end of the week, that's probably when I will start the test.

The initial version will be out by then for sure. There will be a followup version shortly though. I'd suggest that if you want to use this program that you wait a few days after release to make sure everything is going smoothly.. give some people a chance to mess with the interface and stuff like that.

There's also a few other issues to consider when using BlindAudio, and I'm not quite sure how to handle them yet myself. Initially I wasn't going to include the client side encoding and decoding of samples (based on 1 reference sample, then the program would create the encoded samples to listen to), but I decided to try and add this in for the first release. The way it will work is that there will be a test "description" written by the test giver which will lay out which formats will be used in the test, as well as which command line options to use per sample and stuff like that. Upon loading the program, it will detect the OS of the user and download the appropriate encoder and decoder binaries for their system. Now, the problem is compiling all of these binaries for the different operating systems, and putting them all in one place. Right now I have it hardcoded to check this website in a specific directory, but I will probably change that to be specifiable in the test description. In addition, I may need some help to assemble the reference set of encoders and decoders for each OS. In this case we would only need LAME binaries for the different OS's... which initially will only be Windows, Linux, Mac and BSD. I can get the ones for Windows, and Linux if I have to, though if someone could provide a linux binary that would be more helpful.... but the Mac and BSD binaries I'll need help with. The same thing would go for other formats used in future tests. Basically we need some people to be set up as compile maintainers for this program.

What will happen if a particular combination of encoder/decoder/OS does not exist, is that it would download a set of losslessly compressed files (FLAC will be the format supported for now, since LPAC is not supported on PPC) for that particular format.

Basically, it'll take some extra work to setup a test with this program, that is if you want to use it's full potential, but the end result is that it will be accessible to more users on different platforms, and download times should be reduced, even with the JRE for Java factored in... which would pay itself off even more in subsequent tests.

Quote

Any chance flexi-ABR will be ready at the same time? Just kidding

LOL... not in a week, sorry At some point after this program is released and fixed up to include all of the stuff originally envisioned (including server side result organization and graph generation) I'll probably focus on LAME and ABR again.... there's still a lot of work left in the other Hydrogen Tools as well though...

Upcoming ABR vs VBR listening test

Reply #10 – 2002-02-01 15:28:05

Quote

Originally posted by ff123
On another note, in the sample you gave me to listen to (gekkou), I believe there is probably that problem that mp3 has with the first couple of seconds or so (although i didn't verify).
ff123

Hmm, about the gekkou-intro: Alt-preset standard doesn't have the strong flutter effect which happens during the secong guitar note's echo, which --r3mix has. The flutter problem of r3mix is not because of first couple of seconds, it happens also if the section is doubled (copied to the end).
However there's a (very short) "pubble" artifact with the first guitar note (instant beginning) that seems to be quite a lot relating to the fact that beginning of the file is hard to encode.

Upcoming ABR vs VBR listening test

Reply #11 – 2002-02-01 21:24:23

On a sort of general note, I'm a bit wary of drawing conclusions from tests using 5 hard-to-encode samples. All it "proves" is that a certain codec/setting is better at those 5 particular hard-to-encode samples; this could mean that the codecs are still indistinguishable in 99.99% of the cases (which is often the case), or it could even be possible that the "worse" codec on the hard samples actually performs slightly better on the majority of the cases, or some other combination.

Also, using trained listeners is good to get statistically significant results I suppose, but doesn't really interest me personally, because I'm interested in which codecs/settings are better for the average person to encode music in, and the average person is not a trained listener (so I want to know what an average person will find annoying, not what a trained listener who is used to listening for a particular set of artifacts will find annoying).

Upcoming ABR vs VBR listening test

Reply #12 – 2002-02-01 22:08:28

Quote

Originally posted by Delirium
On a sort of general note, I'm a bit wary of drawing conclusions from tests using 5 hard-to-encode samples. All it "proves" is that a certain codec/setting is better at those 5 particular hard-to-encode samples;

Yeah, I agree. 5 samples is of course not enough to make any final conclusions. However if you take 4-5 different pre-echo cases, you can say with very high certainty that certain setting has overall better pre-echo handling.
Similarly we could take 5 non-pre echo samples which are carefully selected non-extreme clips and see how's the result with those.

Sure, 10-15 samples can't show everything about quality level, but it can certainly show quite a lot, if the samples are selected wisely.

To get even a small glimpse of true overall quality with 5 samples, I would use 1 good pre-echo test case (like castanets) and 4 non-pre-echo cases which are carefully selected (not extreme samples like them,spahm, etc.)

Upcoming ABR vs VBR listening test

Reply #13 – 2002-02-01 22:25:39

Quote

On a sort of general note, I'm a bit wary of drawing conclusions from tests using 5 hard-to-encode samples. All it "proves" is that a certain codec/setting is better at those 5 particular hard-to-encode samples; this could mean that the codecs are still indistinguishable in 99.99% of the cases (which is often the case), or it could even be possible that the "worse" codec on the hard samples actually performs slightly better on the majority of the cases, or some other combination.

There is truth in this. For example, many of the best ears like Fhg's mp3enc/Alternate at 128 kbit/s on most music compared with other mp3 codecs, but on hard samples, it can really screw up badly. The bad parts are given less weight by these people in judging the codec quality ecause the other codecs sound worse to their ears the majority of the time.

On the other side of the coin, at higher bitrates with good quality codecs, the "bad" things might only happen occasionally even to people with sensitive ears, so there is merit in trying to single out these cases to compare codecs.

The ideal thing to do might be to sample dozens of samples chosen at random from within various tracks of music. That will never happen, though.

Quote

Also, using trained listeners is good to get statistically significant results I suppose, but doesn't really interest me personally, because I'm interested in which codecs/settings are better for the average person to encode music in, and the average person is not a trained listener (so I want to know what an average person will find annoying, not what a trained listener who is used to listening for a particular set of artifacts will find annoying).

I believe that the effect of adding untrained listeners is just to decrease the sensitivity of the test. I think you're speculating that a group of untrained listeners might have a different set of preferences from the trained group. So for instance, the trained group might like one setting, while the untrained group might like another. But the data so far in these various blind tests seems to indicate that apart from the listeners who correlate well with each other, there is no other "signal," just noise. Presumably that signal is coming from the trained listeners, who are more likely to report reliable (i.e., repeatable) results.

Sean Olive, in his studies of listeners for loudspeaker tests, also indicates that training is a necessity for obtaining reliable results.

http://www.revelspeakers.com/i/listening_lab.pdf

ff123

Upcoming ABR vs VBR listening test

Reply #14 – 2002-02-02 01:52:23

Quote

Originally posted by ff123
I believe that the effect of adding untrained listeners is just to decrease the sensitivity of the test. I think you're speculating that a group of untrained listeners might have a different set of preferences from the trained group. So for instance, the trained group might like one setting, while the untrained group might like another. But the data so far in these various blind tests seems to indicate that apart from the listeners who correlate well with each other, there is no other "signal," just noise. Presumably that signal is coming from the trained listeners, who are more likely to report reliable (i.e., repeatable) results.

Sean Olive, in his studies of listeners for loudspeaker tests, also indicates that training is a necessity for obtaining reliable results.

http://www.revelspeakers.com/i/listening_lab.pdf

I also have to say that I very strongly agree with this. I believe (and have so far not seen data otherwise) that in general an artifact that is offensive to one group of people, will usually be offensive to another group of people even if they are less trained. The only difference perhaps is the degree to which this is a factor. If we were testing at which point the threshold of "highly offensive" is reached for the "common listener" then perhaps highly sensitive listeners might not be so appropriate, but in this case we are instead trying to determine relative quality so the more significant the results, the better.

I'd have to support the idea of using sensitive listeners then, in addition to something like ABC/HR, all to end up making the test results even more significant.

Upcoming ABR vs VBR listening test

Reply #15 – 2002-02-02 02:45:27

Quote

Originally posted by Dibrom
I also have to say that I very strongly agree with this. I believe (and have so far not seen data otherwise) that in general an artifact that is offensive to one group of people, will usually be offensive to another group of people even if they are less trained. The only difference perhaps is the degree to which this is a factor.

Well, I was speculating that being trained to detect certain types of artifacts might make trained listeners more likely to hear certain classes of problems (namely the particular ones they were trained to detect), which might not necessarily be all the possible problems that could arise from perceptual audio coding. So you'd in effect have untrained listeners vs. partially-trained listeners, meaning the "trained" listeners would tend to give more weight to certain classes of defects (the ones they know about) than the untrained listeners, who would tend to weight them all equally (since they have no particular experience with any of them). Also there's a possibility training, even if it includes all the relevant types of artifacts, might skew their weights from their "natural" weighting - i.e. training might double a person's ability to perceive stereo separation problems but quadruple a person's ability to perceive pre-echo problems, resulting in the trained person effectively weighting pre-echo twice as heavily relative to stereo separation as the untrained person.

But I have no evidence to suggest that any of this is indeed the case; just mentioning it as a possibility. ff123 has read far more on codec testing than I have, so I trust his judgement. =]

Upcoming ABR vs VBR listening test

Reply #16 – 2002-02-13 13:21:07

Hi, as you probably suspect, the test has been delayed for a few reasons. Firstly, I want to use the BlindAudio tool for the users to grade the test samples. Secondly, Dibrom promised to have a test version of his custom-tweaked ABR ready soon. Supposedly it provides the ability for ABR to flex more than usual by applying the tweaks which he is currently using for "--alt-preset standard". Thirdly, upon some testing, I found some interesting results when trying to encode samples with --r3mix and --alt-preset 176, mainly finding a few samples where ap176 actually flex more than --r3mix does, and some other unexpected results. This means instead of the 3-5 samples to test as expected, there may now be 5-7 samples.

I have identified 3 of the samples I want to use for the test, with more coming. The test will now consist of 5 encodes:
1) -b 128
2) --r3mix
3) --alt-preset <var> (tweaked to give exact same bitrate as --r3mix)
4) --alt-preset 176
5) --alt-preset 176 + Dibrom's tweaks

Upcoming ABR vs VBR listening test

Reply #17 – 2002-02-13 16:32:45

Quote

Originally posted by tangent
Secondly, Dibrom promised to have a test version of his custom-tweaked ABR ready soon.

To add a bit to this, the new ABR modes are actually working now in a build I'm using and they have been for a few days. Only problem is that things aren't quite optimal with lower bitrate encoding (128kbps) on some clips, so I'm still working on that aspect. I think I'll have a test build to release for public verification by this weekend, maybe a bit sooner depending on how it goes.

To give an example of overall behavior now, --alt-preset <bitrate> behaves similar to something like "Vorbis -b" to where on most "normal" samples, you will get a bitrate very close to what was specified, but on difficult samples, the bitrate increases. For example, encoding fatboy with --alt-preset 128 yields a final bitrate of ~200kbps, but on a typical pop song it might be ~130kbps.

Upcoming ABR vs VBR listening test

Reply #18 – 2002-02-14 09:10:53

Sorry , if yesterday and in future there were/ will be some "silly/funny" remarks by me.

Since last week I have not the time to read much in internet.
So please you have to excuse my asking, where Dibroms pictures have been, as in same thread 7 post above there have been the links.....

So I am reading very quickly over some posts.

And sometimes I tell my opinion, where I think I can tell something relevant, but I am aware that my thoughts may have been considered already by other smart people around here !

So my opinion to testing:

Average listeners should have the possibilty the take part in tests. Only then we will find sometimes new golden eared persons.

As I have seen in recent test results, the average listeners could be selected easily, as they usually rate everything the same....

So for relevant test results the uncritical data has to be sorted out.

Upcoming ABR vs VBR listening test

Reply #19 – 2002-02-14 17:06:52

Quote

Originally posted by user

Average listeners should have the possibilty the take part in tests. Only then we will find sometimes new golden eared persons.
As I have seen in recent test results, the average listeners could be selected easily, as they usually rate everything the same....
So for relevant test results the uncritical data has to be sorted out.

Good points. As it happens, ff123 has been working on this and has a good way of achieving those. I think he will explain them in a moment.

Upcoming ABR vs VBR listening test

Reply #20 – 2002-02-14 17:16:10

Well, as it turns out, I think average listeners (as in average hearing) are fine too, as long as they have had some practice listening to artifacts and performing the particular test protocol. But maybe then they aren't so average any more. Yes, it's possible to post-screen, but far better is not to include noisy listeners in the first place.

I still think it would be interesting to create some sort of screening application or website in which the ability to detect obvious artifacts is tested for. I'll have to think about what it would take to code up the php for that.

ff123

Upcoming ABR vs VBR listening test

Reply #21 – 2002-02-14 18:46:36

Hmm,

Actually the screening/training application is right here staring me in the face. It's the Blind Audio tool, of course.

ff123

Upcoming ABR vs VBR listening test

Reply #22 – 2002-03-31 19:56:29

Quote

Originally posted by Dibrom
To give an example of overall behavior now, --alt-preset <bitrate> behaves similar to something like "Vorbis -b" to where on most "normal" samples, you will get a bitrate very close to what was specified, but on difficult samples, the bitrate increases. For example, encoding fatboy with --alt-preset 128 yields a final bitrate of ~200kbps, but on a typical pop song it might be ~130kbps.

Doesn't that kind of move it away from generally defined ABR (Average Bitrate) behaviour and more into 'regular VBR' territory?

IMHO, an ABR setting ought to maintain the specified bitrate average, even if quality suffers -- that's the main reason why people choose to use ABR instead of VBR settings. If an ABR setting can skyrocket in bitrate for any reason, is it still "ABR" by definition?

Dibrom, if you go ahead with this, how about creating a new switch and mapping the old 'inflexible' ABR behaviour to it:

--alt-preset abr <xxx>

Cheers...

Upcoming ABR vs VBR listening test

Reply #23 – 2002-03-31 22:16:39

Quote

Originally posted by ff123
Hmm,

Actually the screening/training application is right here staring me in the face. It's the Blind Audio tool, of course.

ff123

For that matter, there's a new version sphoid has been working on some lately (along with lots of other things) that should probably be released in a day or two, fixing most of the bugs present in the former version and increasing efficiency some.

Upcoming ABR vs VBR listening test

Reply #24 – 2002-03-31 22:18:25

Quote

Originally posted by fewtch
Doesn't that kind of move it away from generally defined ABR (Average Bitrate) behaviour and more into 'regular VBR' territory?

IMHO, an ABR setting ought to maintain the specified bitrate average, even if quality suffers -- that's the main reason why people choose to use ABR instead of VBR settings. If an ABR setting can skyrocket in bitrate for any reason, is it still "ABR" by definition?

Dibrom, if you go ahead with this, how about creating a new switch and mapping the old 'inflexible' ABR behaviour to it:

--alt-preset abr <xxx>

Cheers...

I won't remap it to something like that, but there will instead be a way to disable it, something like:

--alt-preset noflex abr

It will be done like the "fast" options in the vbr modes.

Notice