Skip to main content

Topic: Pre-Test thread (Read 41968 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • phong
  • [*][*][*][*]
Pre-Test thread
Reply #25
I think a 64kbps test is going to be less tiring than 128kbps per sample because it is much easier to distinguish them.  But seven or eight versions to listen to may still be too many.

As for the bitrate thing - we could do the same that that was done last time - encode tons of CDs at each of the quality levels around -q0 and see what REALLY equates to 64kbps.

Oh, and I've been working on writing a ABC/HR clone for Linux.  It may be good to go in time for the test.  Are you interested in having that available or would it be too much trouble to put together and test two packages?  It's written in Python with wxPython for the gui and pygame (SDL) for the audio so it should be very portable (in case there are any Mac users out there that want to use it).
I am *expanding!*  It is so much *squishy* to *smell* you!  *Campers* are the best!  I have *anticipation* and then what?  Better parties in *the middle* for sure.
http://www.phong.org/

  • Dologan
  • [*][*][*][*]
  • Members (Donating)
Pre-Test thread
Reply #26
Hmm... I frankly don't understand the ABR/CBR/VBR nitpicking that just keeps arising again and again. Unless it raises compatibility issues, most people (myself included) don't give a damn if the sample has a 56.2, 74.7 or 64.0 kpbs avg. bitrate as long as it is in a certain tolerable range.
I think all this could be avoided simply by changing the test name from "64 kbps test" to something more like "64kpbs-range test" or "low bitrate test". Some people just take the "64 kpbs" part too much at heart.  Ok, I know this is raised due to "fairness" issues, but these have been discussed at length in favour of letting the codecs do what they are good at and not crippling them to a constrained, unnatural setting, since that would be unfair, too.
A suggestion to better design the amount of codecs/samples for the test: Make a poll about how much time would you be willing to spend on the test (for those who would consider participating) and then choose a combination of codecs/samples that best suits the results. IMHO it would be better to have a comparison of few codecs but with small error bars that allow reliable conclusions, than to obtain an entire battery of codecs nobody in this forum uses with error bars so large that barely any significant conclusions can be reached. Besides, making a large test would probably bias the results in favour of the listening preferences of patient people with lots of time, which may or not be different for other kind of people. (ok, this might sound crazy, but who knows about the influence of personality on annoyance thresholds?)

Regards,
~Dologan

  • rjamorim
  • [*][*][*][*][*]
Pre-Test thread
Reply #27
Quote
A suggestion to better design the amount of codecs/samples for the test: Make a poll about how much time would you be willing to spend on the test (for those who would consider participating) and then choose a combination of codecs/samples that best suits the results.

Problem is, you can't even possibly imagine how much time someone will spend on the test. In the AAC test, I had Garf's results 2 hours(!) after I officially started the test. And JohnV submitted his last results few hours before the test closure.

What's the amount of time someone spends testing a sample? 2 minutes? 30 minutes? Besides, if it's a problem sample at low bitrate, the person will surely spend less time than if it's an easy sample at high bitrates.

Quote
I think all this could be avoided simply by changing the test name from "64 kbps test" to something more like "64kpbs-range test" or "low bitrate test". Some people just take the "64 kpbs" part too much at heart.


That makes sense, indeed, but most of the people that are criticizing the test aren't doing this because the bitrate deviates, but because, due to the bitrate deviation, some codecs might end up more "favoured" than others. In that case, even changing the test name wouldn't appease them.

Regards;

Roberto.
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org

  • guruboolez
  • [*][*][*][*][*]
  • Members (Donating)
Pre-Test thread
Reply #28
Testing eight different files for each sample doesn't annoy me for such low bitrates. But others people may probably be bored. Then,

isn't it possible to create two kind of archives :
- essential encoder (wma, ogg, he-aac, mp3pro)
- additional encoder, for curious people (wma9 pro, real, aac)

People who want to participate to the test had to send results for the first pack, and if they want to investigate further, they can evaluate encoding include in the second package. By doing that, you won't annoy or frighten people with too much encodings, and you will get some interesting results for additionnal codecs, without starting another test. It seems to be a good compromise between completeness and respect for the testers.
  • Last Edit: 23 August, 2003, 08:16:57 PM by guruboolez

  • rjamorim
  • [*][*][*][*][*]
Pre-Test thread
Reply #29
That really sounds like a great idea to me.

(Although I didn't think about it hard enough to pick up eventual flaws in it)

Of course, the "official" 64kbps test results will be the ones featuring the "essential" codecs, and somewhere in the official page there'll be a link to a "subtest" featuring the essential codecs + additional ones.

That separation is needed because the tests can't be merged together if the amount of listeners isn't the same at each sample. To start with, the ANOVA error margin would be different for each case.

Besides, ABC/HR is limited to 8 sliders. You are already suggesting 7 codecs, not counting the anchors. Do you have any idea how to circumvent that issue? I don't think doing two separate test setups would be the right way, but that needs to be discussed. I would personally think the right way would be one test setup = essential codecs and the other = essential + additional codecs, and not one setup = essential and the other = additional. (I don't know if I'm making myself clear...)


Heh, that would make it harder for me to process the results, but I'm inclined to oblige and see how things turn out.

Comments? Ideas?

Regards;

Roberto.
  • Last Edit: 23 August, 2003, 08:42:00 PM by rjamorim
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org

  • guruboolez
  • [*][*][*][*][*]
  • Members (Donating)
Pre-Test thread
Reply #30
Is it worth to put two different anchors in this test ?
For the 128 kbps listening test, anchor was needed to preserve lame mp3 from an exagerate notation. Here, there is no (known) encoder to protect from (known) stronger competitors. Why not remove this one ?
On the other side, I'd like to see mp3@128 as bottom anchor. This anchor is more than a "dead file" : it's a popular reference, and at the end of the test, we can build some conclusions the relation between this file and others competitors. It's very important to give a point of comparison : some people are obnubilate by the idea of mainting 128 kbps quality at half bitrate, and this test, with mp3@128 include, is the occasion to give strong answers (superior to pseudo scientific waveform comparison...) to these people (and it would be a good advertising for HA.org).
Honestly, is a 3.5 Khz lowpassed wav file really needed here ?
  • Last Edit: 23 August, 2003, 09:03:14 PM by guruboolez

  • rjamorim
  • [*][*][*][*][*]
Pre-Test thread
Reply #31
Quote
Is it worth to put two different anchors in this test ?

Well, I don't know. That's open to debate still. Unfortunately, the biggest authority I know of in listening tests is somewhere in Thailand :B

Quote
For the 128 kbps listening test, anchor was needed to preserve lame mp3 from an exagerate notation. Here, there is no (known) encoder to protect from (known) stronger competitors. Why not remove this one ?


Well, as already explained somewhere, the Anchor isn't there only to protect rankings, but also to put things into perspective across the entire sample suite.

Quote
On the other side, I'd like to see mp3@128 as bottom anchor. This anchor is more than a "dead file" : it's a popular reference, and at the end of the test, we can build some conclusions the relation between this file and others competitors. It's very important to give a point of comparison : some people are obnubilate by the idea of mainting 128 kbps quality at half bitrate, and this test, with mp3@128 include, is the occasion to give strong answers (superior to pseudo scientific waveform comparison...) to these people (and it would be a good advertising for HA.org).


Oh, sure, MP3 is definitely in.

Quote
Honestly, is a 3.5 Khz lowpassed wav file really needed here ?


Well, indeed, maybe not.

I'm just trying to figure out how to sort results, given that some of them will contain the essential codecs, others will contain the essential + additional. It can surely be done by hand, but given I expect this to be my biggest test to date, it'll be a PITA. And then you guys can't expect results delivered a few hours after the test closure. :B

Any idea?

R.
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org

  • ErikS
  • [*][*][*][*][*]
Pre-Test thread
Reply #32
One idea how to add more codecs to the test would be to make three different test suites. One where wma pro is in, another where real is in but not wma pro, and the third where aac would be in but none of the above. Then when a person downloads the suite, the server randomly gives one of the three packages. This way everybody will test the core codecs but you will still have some results for the additional ones.

  • tigre
  • [*][*][*][*][*]
Pre-Test thread
Reply #33
Would it be an option to use the 8 sliders for tested codecs only and the higher anchor (lame @128kbps) while the lower anchor (no matter if lowpassed or a crappy encoding) can be provided seperately so people can listen to it without using ABC/HR? It should be so obvious what's wrong compared to the original that ABXing this one isn't necessary - and it'll get a fixed rating (= 1 ?) anyway (if I get the idea of anchors right).
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello

  • S_O
  • [*][*][*][*]
Pre-Test thread
Reply #34
Quote
That's not a real possibility because, of all the 12 samples, I only have one of them in it's entirety. It would require that people send me the entire songs for each sample. And then I would be accountable for piracy. You get the problem? :B
If they don´t upload it here to HA for the public, and yust send the song to you? That´s not piracy, in Germany it´s allowed to make a  private copy for realtives and friends (this could have been changed since the new copyright-law).
Another idea is, that they encode the samples themselves, you send them the exact setting (batch-file for CLI-encs), then they decode it again and cut the decoded files and them to you.
I think that´s very important for real-life testing, there have to be a way how it is possible.

For the codecs I think these should be tested:
HE-AAC
mp3pro
Ogg Vorbis
WMA
Real Gecko
Lame mp3 --preset 128
(atrac3plus?)
That are 6 (7) codecs, everybody can test that.

  • bond
  • [*][*][*][*][*]
Pre-Test thread
Reply #35
dont think that people will get bored if they test 64kbps quality files? hey a 128kbps test were you cant hear any differences is more boring imho
I know, that I know nothing (Socrates)

  • rjamorim
  • [*][*][*][*][*]
Pre-Test thread
Reply #36
Quote
One idea how to add more codecs to the test would be to make three different test suites. One where wma pro is in, another where real is in but not wma pro, and the third where aac would be in but none of the above. Then when a person downloads the suite, the server randomly gives one of the three packages. This way everybody will test the core codecs but you will still have some results for the additional ones.

man, if you can only imagine the mess it'll be to process the result files... :B

At the time being, I use a very useful tool created by ff123. It takes the results file in a text list and sorts them in a table that is usable in his Friedman tool.

If we go with adding a different codec for each sample package, I would have to edit ALL packages by hand. You'll have to expect the results for a week after the test is over.

In this aspect, Guru's suggestion would be easier to implement.

Quote
Would it be an option to use the 8 sliders for tested codecs only and the higher anchor (lame @128kbps) while the lower anchor (no matter if lowpassed or a crappy encoding) can be provided seperately so people can listen to it without using ABC/HR? It should be so obvious what's wrong compared to the original that ABXing this one isn't necessary - and it'll get a fixed rating (= 1 ?) anyway (if I get the idea of anchors right).


Not really a fixed rating. As you noticed on the 128kbps test, Blade was an anchor, and still it got scores higher than 1.

So, anyway, I'll probably just ditch the bottom anchor.
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org

  • rjamorim
  • [*][*][*][*][*]
Pre-Test thread
Reply #37
Quote
If they don´t upload it here to HA for the public, and yust send the song to you? That´s not piracy, in Germany it´s allowed to make a  private copy for realtives and friends (this could have been changed since the new copyright-law).

It would be illegal in nearly the entire rest of the World, including Brazil. :-/

Quote
Another idea is, that they encode the samples themselves, you send them the exact setting (batch-file for CLI-encs), then they decode it again and cut the decoded files and them to you.


Well, is HE AAC, WMA, and Real even "cuttable"?

And, when cutting MP3pro, there's no risk of teh SBR part getting b0rked?

Quote
I think that´s very important for real-life testing, there have to be a way how it is possible.


Well, it is possible. That's what they do in formal listening tests. But I don't have the resources to conduce a formal listening test (which usually costs 4-digit dollars)

Quote
For the codecs I think these should be tested:
HE-AAC
mp3pro
Ogg Vorbis
WMA
Real Gecko
Lame mp3 --preset 128
(atrac3plus?)
That are 6 (7) codecs, everybody can test that.


Real Gecko? 

Yes, I agree completely with the first 6 codecs. And I'm not too fond of featuring atrac3plus. First, because I don't see it getting as mainstream as the others, mostly due to Sony's (understandable) paranoia on security and so on (DRM, etc.)

Quote
dont think that people will get bored if they test 64kbps quality files? hey a 128kbps test were you cant hear any differences is more boring imho


Indeed. IMO, anything above 7 codecs is too much for "every participant". That's why I would maybe go with Guru's idea of offering a superset of samples using non-essential codecs.

Regards;

Roberto.
  • Last Edit: 24 August, 2003, 03:10:58 PM by rjamorim
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org

  • Gecko
  • [*][*][*][*][*]
Pre-Test thread
Reply #38
I think dividing the test into an essential and a non essential part is an excellent idea. Guruboolez' proposed division into essential/additional makes sense to me.

What about the rating scale of abc/hr? I believe this issue was brought up in the aftermath of the last test, but I don't remember the answer. Personally, I am fine with the wording and I wouldn't be able to come up with better alternatives, but I believe that many people have trouble with the scale. The reason behind this may be the nonlinearity of the scale and the use without context. (I guess that's what the anchors are for.) I'm not sure if people rate the samples against the original wav or if they rate them in the context of using ~64k samples ("Actually, this sample sounds like crap, but hey, it's only 64k"). Maybe this should be made more clear. Maybe someone should write a small text how to do proper rating, give examples. This could yield more accurate results.

Another issue is the number scale. People value numbers too much. Perhaps they should be removed from the interface and only be output to the result file. This way people would focus more on the describing words and their meaning than on the numbers.

  • S_O
  • [*][*][*][*]
Pre-Test thread
Reply #39
Quote
Well, is HE AAC, WMA, and Real even "cuttable"?

And, when cutting MP3pro, there's no risk of teh SBR part getting b0rked?

Real is cuttable (there is rmeditor, cli application from helix), WMA should be cuttable, too, but I don´t know a good tool for that. AAC is cutable with BeSplit, but I don´t know if it also copies the ancillary data correctly. For mp3pro there is a problem (also for mp3), because of the bit reservoir. But that should only affect the first frame, so no real problem. And if a cutting tool like mp3directcut doesn´t work correctly with SBR, a simple hexeditor should work. Does someone know if there is SBR in all frames, or only in some? So it could be essentially for decoding that there is SBR in first frame, otherwise SBR isn´t detected.
Quote
It would be illegal in nearly the entire rest of the World, including Brazil. :-/
F*cking laws! But I noticed something illegal in your old test: You distributed binaries of faad, lame and blade. This is not legal in some countires (like the USA), too.
Quote
Well, it is possible. That's what they do in formal listening tests. But I don't have the resources to conduce a formal listening test (which usually costs 4-digit dollars)
Even if you would buy all this discs it would less than 100€/$ (of course even that would be too much). If the sample owners would encode/cut themselves it would be the easiest and legal way. If you don´t trust them or they are not able to do it alone, you could make own there PCs using NetMeeting remote control.
Quote
Real Gecko?
The codec is named "Gecko" in the real papers, because "Real Audio" can be every codec used by Real (Sipro Voice Codec, DolbyNet, Atrac3 etc.). Because the FourCC of it is "cook" (this comes from the  name of the codec developer "Ken Cooke") it also often called so.

  • rjamorim
  • [*][*][*][*][*]
Pre-Test thread
Reply #40
Quote
But I noticed something illegal in your old test: You distributed binaries of faad, lame and blade.

Well, there I was breaking patents that people don't give much of a damn anyway. It's not nearly as bad as breaking copyright. :-/


Quote
Even if you would buy all this discs it would less than 100?/$ (of course even that would be too much).


Haha, even if I had all that money (I don't), most of the music that will be featured can't be found in this hellhole of a country

Quote
If the sample owners would encode/cut themselves it would be the easiest and legal way. If you don´t trust them or they are not able to do it alone, you could make own there PCs using NetMeeting remote control.


Well, I believe few would allow a stranger to remotely control their PCs :B

And the issue remains: Can we cut MP3pro, AAC+SBR and WMA?

Problem here is that we can't use that process for some samples and not for others, that would make the test biased from the start.

Another issue: If these people were to cut the samples themselves, they would need to own at least Adobe Audition (mp3pro) and Nero 6 (HE AAC). And not all of them own it, and some aren't as morally unrestrained as me as to go and get a ju4r3z version :B

Quote
The codec is named "Gecko" in the real papers, because "Real Audio" can be every codec used by Real (Sipro Voice Codec, DolbyNet, Atrac3 etc.). Because the FourCC of it is "cook" (this comes from the  name of the codec developer "Ken Cooke") it also often called so.


Yeah, I call it Cook myself. Whatever floats your boat...

Regards;

Roberto.
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org

  • S_O
  • [*][*][*][*]
Pre-Test thread
Reply #41
Quote
Well, I believe few would allow a stranger to remotely control their PCs :B
With NetMeeting you can see everything the other does and you can always terminate the remote control yust by pressing one key.
Quote
Well, there I was breaking patents that people don't give much of a damn anyway. It's not nearly as bad as breaking copyright. :-/
Since the sample is only for you, and the program you offer is for everybody, I think this different, breaking the patent law x-thousand times is much worse than breaking copyright law 12 times, also it doesn´t matter if you delete the uncutted samples afterwards.
Quote
And the issue remains: Can we cut MP3pro, AAC+SBR and WMA?

mp3pro: possible. I yust cutted a file in a hex editor somewhere (at a frame beginning) and it was decodeable correctly with SBR. Since aac is cuttable, and mp3pro is cuttable, HE-AAC should also be cuttable.
Quote
Another issue: If these people were to cut the samples themselves, they would need to own at least Adobe Audition (mp3pro) and Nero 6 (HE AAC). And not all of them own it, and some aren't as morally unrestrained as me as to go and get a ju4r3z version :B
You don´t care about ju4r3z, but you care about copyright???

  • rjamorim
  • [*][*][*][*][*]
Pre-Test thread
Reply #42
Quote
You don´t care about ju4r3z, but you care about copyright???

I don't care about anything. :B

I'm just saying that I doubt people would want to publicly go around breaking copyrights to send their tracks to me. I don't care, but they might care.

The problem with cutting HE AAC is that it's inside an MP4 container, and you can't just go around chopping the container, you must take in consideration the MP4 headers, etc.
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org

  • Dologan
  • [*][*][*][*]
  • Members (Donating)
Pre-Test thread
Reply #43
Roberto, are you sure you can't merge the results of the core+extra codecs together and still do valid statistics?
Unfortunately, I have forgotten most of the statistics lessons I took over a year ago, so I would have to dig into a book to refresh my asleep neurons; but IIRC it wasn't that big of a deal if during an experiment one of the mice in one of the test groups died; so I suppose it must be analogous if some codecs don't have the same number of testers...

~Dologan

  • rjamorim
  • [*][*][*][*][*]
Pre-Test thread
Reply #44
Well, the biggest problem I see here is using ff123's friedman tool to perform the statistical analysis.

It can accept text files in this format:

Code: [Select]
mp3    aac    vorbis    wma    mpc
2.5    4.2    4.0    3.6    4.5
3.1    4.0    4.3    4.0    5.0
4.0    5.0    4.2    4.5    5.0


But not in this one:

Code: [Select]
mp3    aac    vorbis    wma    mpc    real    vqf
2.5    4.2    4.0    3.6    4.5    
3.1    4.0    4.3    4.0    5.0    3.5
2.0    3.0    3.2    2.5    3.5  5.0


I can't fill the columns with some null value, because there isn't one.

And I wouldn't even know where to start calculating the statistics, so friedman.exe is a must-have.

Besides, I would need some proof that mixing together the results won't bias the test, so that I can show something to eventual critics.
  • Last Edit: 25 August, 2003, 12:49:27 AM by rjamorim
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org

  • Dologan
  • [*][*][*][*]
  • Members (Donating)
Pre-Test thread
Reply #45
Hmm... I see...
When are you planning to start the test? I don't want to promise anything, but if I have time the next few days, I guess I could freshen my memory with some statistics books and see if it would be possible for us to perform a statistical analysis with unequal groups that doesn't bias the test in some way. The analysis then would not be as simple as running a little program, but would certainly be more complete imo. What do you say?

~Dologan

  • Gabriel
  • [*][*][*][*][*]
  • Developer
Pre-Test thread
Reply #46
*I think that Guru's idea of 2 sets is interesting but unfortunately probably bad in our case. I am afraid that only experienced listeners would pick the second group. As we know that those listeners are using lower ranking (as demonstrated in the 128kbps test), ranking of both groups would probably not be comparable.

*I am not sure if Atrac-3 is really usefull, considering both the user base and the fact that the new portable players from Sony are now able to use other formats.

*Perhaps plain AAC should be considered, as it is decodable right now by some hardware players, while HE-AAC is not

*If a lower anchor has to be used, why not Lame --preset 64? (mp3 is still widely used at low bitrates for Shout/Ice streaming)

*I think that the high number of codecs is not such an issue, compared to the 128 test. In this case it will be easier for a broad range of listeners.

  • rjamorim
  • [*][*][*][*][*]
Pre-Test thread
Reply #47
Quote
When are you planning to start the test?

September 3rd. Of course, I would need that information a little before.

Quote
What do you say?


Well, if it doesn't turns out terribly difficult (I.E, I won't take a week to sort out results), fine, I can go for it.
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org

  • rjamorim
  • [*][*][*][*][*]
Pre-Test thread
Reply #48
Quote
*I think that Guru's idea of 2 sets is interesting but unfortunately probably bad in our case. I am afraid that only experienced listeners would pick the second group. As we know that those listeners are using lower ranking (as demonstrated in the 128kbps test), ranking of both groups would probably not be comparable.


Good point.

Quote
*I am not sure if Atrac-3 is really usefull, considering both the user base and the fact that the new portable players from Sony are now able to use other formats.


Indeed, Atrac3 is nearly out of the test.

Quote
*Perhaps plain AAC should be considered, as it is decodable right now by some hardware players, while HE-AAC is not


Maybe, but HE AAC is definitely in. It's actually the main reason of this test, since the other codecs (Vorbis, MP3pro, WMA std) didn't change much since ff123's test.

Quote
*If a lower anchor has to be used, why not Lame --preset 64? (mp3 is still widely used at low bitrates for Shout/Ice streaming)


Maybe, but I'm more inclined of leaving the bottom anchor out.

Quote
*I think that the high number of codecs is not such an issue, compared to the 128 test. In this case it will be easier for a broad range of listeners.


Well, that's OK for me, I'm not the one that is going to take the test. :B

Thanks for your thoughts.

Regards;

Roberto.
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org

  • ErikS
  • [*][*][*][*][*]
Pre-Test thread
Reply #49
* If someone modifies ff123's tool you use for the analysis to accept null values, would you consider including the additional codecs?

* I don't like the idea to not use a lower anchor and I don't like the idea of using lame encoded files as anchors. Why? Because anchors should be fixed. If someone want's to redo this test next year to see how much the codecs have improved he needs to be able to use the exact same anchors as in this test. Lowpass is very fine in this regard, and also BladeEnc is pretty safe since it hasn't changed the last five years or so. But lame is still evolving slowly, so lame anchors should be avoided if possible. Or if you really want to use lame as an anchor you should save the exact version and which settings you used together with the test results. The lower anchor is needed to put things in perspective IMO. Without it, I think the scale of ratings would vary more than if it was included.