Public Listening Test [2010]

Topic: Public Listening Test [2010] (Read 178454 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Public Listening Test [2010]

Reply #25 – 2009-12-29 02:39:54

Quote from: Polar on 2009-12-28 08:20:03

Fwiw, as I promised last year, I am willing to host the test samples once more. Plenty of bandwidth available.

Thank you very much. We will need it.

Public Listening Test [2010]

Reply #26 – 2009-12-29 04:25:18

Quote from: /mnt on 2009-12-27 23:25:15

I would like to see Quicktime's new true VBR encoder on the test, even if it's a Mac OS X exlusive. Also having FAAC on the test as a low anchor would be interesting.

I am on the fence about this. Part of me would like to see a test using Quicktime's true VBR encoder while another part of me would rather see just iTunes AAC (ie QuickTime VBR_constrained) since it can be accessed by everyone. The procedure of Quicktime true VBR AAC encoding on Windows is ridiculous and I am surprised someone hasn't come out with a solution as there are plenty of other programs that offer true VBR encoding under Mac OS X. I am a Windows user so I would want to just see iTunes AAC thrown in the mix with the latest release from Nero. Part of me is still curious regarding Quicktime's true VBR performance so I wouldn't mind seeing it in the test either. Ideally I would like to see VBR_constrained and true VBR while dumping a few other encoders that can't be easily downloaded (ie fhg).

Public Listening Test [2010]

Reply #27 – 2009-12-29 21:12:28

I propose to include only iTunes constrained VBR mode as ~90% of OSs are Windows.
Let's keep practical approach.

Public Listening Test [2010]

Reply #28 – 2009-12-29 22:10:05

I still say quicktime true vbr. I think if the results are tempting enough, it'll light a fire under somebody to create a more reasonable way to encode on windows. And if we don't, I think there will always be room for, "well, it wasn't quicktime's best setting in that test..." And people will talk about what might have been.

Public Listening Test [2010]

Reply #29 – 2009-12-30 00:08:10

All right, there is poll (true vs constrained VBR)

I found that true vbr is enable only at 128 kbps in QT Windows version.
Does anybody note the same?

Quote from: kornchild2002 on 2009-12-29 04:25:18

Ideally I would like to see VBR_constrained and true VBR while dumping a few other encoders that can't be easily downloaded (ie fhg).

How are constrained VBR (CVBR) and true VBR (TVBR) modes comparable in bitrate area?
For previous versions of iTunes:
CVBR ~100 kbps
TVBR ~95 kbps
Those 5kbps can incluence on final results.

It will be good if we organize group of members to avoid concetration of taking decesions just by one person (especially for chosing competitors and samples).
Well known members are welcome like Guru, Sebastian, /mnt or other.

Public Listening Test [2010]

Reply #30 – 2009-12-30 01:02:33

I am not sure how the bitrates of true VBR and VBR constrained compare as I run Windows only. I used to have an iMac but ended up selling it to someone who really needed it for college. That was alright as I would usually just boot camp into Windows XP anyway. I guess someone running Mac OS X could always run some bitrate tests for you to see how they compare. I believe that iTunes, when using VBR constrained, will go with Quicktime's normal quality setting. The iTunes Plus setting will use the high quality setting in Quicktime. That would also have to be something taken into consideration. Will the high quality mode be used for true VBR encoding, normal, low, or something else?

I saw the pole and I voted for VBR constrained simply because that is what I have easy access to. I would hope that someone will release a Windows solution for Quicktime true VBR AAC encoding after a public listening test. That being sad, people have been asking for this ever since true VBR AAC encoding was introduced in Quicktime and nothing has come of it on the Windows front.

Public Listening Test [2010]

Reply #31 – 2009-12-30 17:21:37

Quote from: IgorC on 2009-12-30 00:08:10

All right, there is poll (true vs constrained VBR)

I found that true vbr is enable only at 128 kbps in QT Windows version.
Does anybody note the same?

Quote from: kornchild2002 on 2009-12-29 04:25:18
Ideally I would like to see VBR_constrained and true VBR while dumping a few other encoders that can't be easily downloaded (ie fhg).

How are constrained VBR (CVBR) and true VBR (TVBR) modes comparable in bitrate area?
For previous versions of iTunes:
CVBR ~100 kbps
TVBR ~95 kbps
Those 5kbps can incluence on final results.

It will be good if we organize group of members to avoid concetration of taking decesions just by one person (especially for chosing competitors and samples).
Well known members are welcome like Guru, Sebastian, /mnt or other.

Well this 5kbps would definately make it better. This means more bits to the actual audio. Now these extra bits might not make a difference if the compression model changes inbetween CVBR and TVBR which may not be the case.
I think the first thing would be to encode same song in two modes and the see the output file size.

Public Listening Test [2010]

Reply #32 – 2009-12-30 19:37:09

Quote from: IgorC on 2009-12-29 02:25:50

Quote from: C.R.Helmrich on 2009-12-28 11:29:13
- High anchor: Lossless original. Edit: no further high anchors to minimize listening time.

Speaking of lossless original as high anchor did you mean:
1. There won't be high anchor at all
or
2. There will be supposedly lossy file but will be actually lossless reference.

Basically both. I meant that there should be a single hidden lossless reference, and no anchor a la MP3@200kbps because the codecs under test are close enough to transparency.

Quote

How about more strict rules?:

Remove all listeners from analysis who
a) graded the high anchor lower than 4.8-4.9 or even 5 in case if high anchor will be lossless.
b) graded the low anchor higher than any competitor. Low anchor by its definition will be perfectly inferior to any of competitors.

Both are possible but dangerous.

a) Certainly possible if you have only highly experienced listeners in the test, but if not, you might end up having to post-screen a lot of people, even such with consistent and useful grading (but showing a lack of concentration on one single test item out of, say, 15).
b) Yes, on average the low anchor will be inferior to all other codecs, but this does not guarantee that it will be inferior for every possible test item. The decision whether to apply this rule could be made after the test is finished.

Chris

Public Listening Test [2010]

Reply #33 – 2009-12-30 21:37:08

After reading Roberto's (fast ) manual about listening test (section Dealing with ranked refernces) http://www.rarewares.org/rja/ListeningTest.pdf
I see that your a) point is fair.

Quote from: C.R.Helmrich on 2009-12-28 11:29:13

- Post-screening rules: Remove all listeners from analysis who
a) graded the high anchor lower than 4.5

Low anchor (LC-AAC at 48 kbps) was enough inferior to all competitors on all samples in previous tests that's why I think it will be logical remove all listeners from analysis who rated low anchor higher than any of competitors. I will ask Sebastian as he has dealt personally with results.

Public Listening Test [2010]

Reply #34 – 2009-12-30 22:17:04

Quote from: birdy25 on 2009-12-30 17:21:37

Quote from: IgorC on 2009-12-30 00:08:10
All right, there is poll (true vs constrained VBR)

I found that true vbr is enable only at 128 kbps in QT Windows version.
Does anybody note the same?

Quote from: kornchild2002 on 2009-12-29 04:25:18
Ideally I would like to see VBR_constrained and true VBR while dumping a few other encoders that can't be easily downloaded (ie fhg).

How are constrained VBR (CVBR) and true VBR (TVBR) modes comparable in bitrate area?
For previous versions of iTunes:
CVBR ~100 kbps
TVBR ~95 kbps
Those 5kbps can incluence on final results.

It will be good if we organize group of members to avoid concetration of taking decesions just by one person (especially for chosing competitors and samples).
Well known members are welcome like Guru, Sebastian, /mnt or other.

Well this 5kbps would definately make it better. This means more bits to the actual audio. Now these extra bits might not make a difference if the compression model changes inbetween CVBR and TVBR which may not be the case.
I think the first thing would be to encode same song in two modes and the see the output file size.

Those numbers are based on something like hundred files.

Public Listening Test [2010]

Reply #35 – 2010-01-03 13:35:25

Quote from: C.R.Helmrich on 2009-12-28 11:29:13

- Post-screening rules: Remove all listeners from analysis who
a) graded the high anchor lower than 4.5,
b) graded the low anchor higher than the high anchor.

Wouldn't this introduce biases? I mean, this listener is "wrong", but among listeners who are not able to tell the differences at all, this post-screening would reveal only those who by chance grade low anchor too high, not those who by chance grade high better than low.

I am not a statistician -- and certainly not an applied one -- so it might very well be that this is a fair compromise to do to eliminate at least some randomguessers and also those who are able to tell differences but actually prefer the compression artifacts.

By the way, has one considered BitTorrent for distribution?

Public Listening Test [2010]

Reply #36 – 2010-01-03 15:25:42

This is Hydrogenaudio, why not using LossyWAV for the high anchor? Let's get the word out!

With regards to Quicktime true VBR vs constrained VBR, the discussion should wait until the bitrate discussion is in advanced enough stage. The poll may suggest a setting that cannot match the desired bitrate.

Public Listening Test [2010]

Reply #37 – 2010-01-03 16:34:23

Quote from: Porcus on 2010-01-03 13:35:25

I am not a statistician -- and certainly not an applied one -- so it might very well be that this is a fair compromise to do to eliminate at least some randomguessers and also those who are able to tell differences but actually prefer the compression artifacts.

Yes, that's the basic idea. If a listener grades consistent with the majority of non-post-screened listeners, there is no way to tell from the results whether (s)he guessed or actually heard a difference. So you have to include that listener.

Quote from: jido link=msg=0 date=

This is Hydrogenaudio, why not using LossyWAV for the high anchor? Let's get the word out!

Since people have been almost flooding this thread with proposals for encoders to be tested in a single test, it's time for me to give my 2 cents.

The question is what we want. Of course you could put all codecs of interest (LossyWav, iTunes CVBR and TVBR, nero 1.3.3 and 1.5.3, CT/Winamp, LAME, WMA, etc.) into one test. But trust me, if you do that, you will get mostly inconclusive results, i.e. waste a lot of listening effort, due to listener overload, as I already explained.

LossyWav's objective is to be transparent. AAC at 96 kbps usually is not transparent, so its objective is to be near-transparent, i.e. "as good as possible". If you want to check whether LossyWav is transparent, do a separate ABX test against the unprocessed original (or maybe an ABX-HR test including AAC at 256 kbps or so, if you want.) Then people can focus on whether the codecs under test really are transparent, without being distracted by at the same time having to evaluate the quality of lower-bit-rate codecs.

If you want to check whether there's a statistically significant improvement of iTunes TVBR over CVBR, propose or conduct a separate public ABX-HR or MUSHRA test for those two encoders. Then people can focus on sonic differences (if any) between those two encoders.

The same applies to nero 1.5.3 vs. 1.3.3.

Then, once you finished those last two tests, you can take the "winners" of those tests and conduct the test which we are promoting (under the title "AAC test") in this thread. Yes, it's a lot of work, but it's the only way to get meaningful results.

If you don't want to do those last two tests, just choose one encoder from each test based on certain non-quality considerations (e.g. nero 1.5.3 because it's a newer release, iTunes CVBR because it's also available for Windows). This should be fine since most likely, there are only minor sonic differences at 96 kbps between nero 1.3.3 and 1.5.3 and between iTunes CVBR and TVBR. You can always do said tests later, of course also with the exact same test material.

Chris

Public Listening Test [2010]

Reply #38 – 2010-01-03 21:05:59

Regarding the results, I would personally discard all results where the low-anchor is not rated.

Public Listening Test [2010]

Reply #39 – 2010-01-03 21:09:53

Quote from: Sebastian Mares on 2010-01-03 21:05:59

Regarding the results, I would personally discard all results where the low-anchor is not rated.

And if low anchor is rated higher than any of competitors or high anchor?

Public Listening Test [2010]

Reply #40 – 2010-01-03 21:14:57

Quote from: IgorC on 2010-01-03 21:09:53

Quote from: Sebastian Mares on 2010-01-03 21:05:59
Regarding the results, I would personally discard all results where the low-anchor is not rated.

And if low anchor is rated higher than any of competitors or high anchor?

Post processing actually comes down to how wisely the low anchor was selected. It could be possible that for certain problem samples that target a special encoder, the low anchor sounds better than a contender.

Regarding TVBR vs. CVBR for Apple, this also comes down to the goal of your test. If you want to test encoders in general, a poll is fine, if you want to test only encoders that are easily accessible to users, CVBR is probably best because it's available in iTunes directly and Windows has a higher market share than OS X.

Public Listening Test [2010]

Reply #41 – 2010-01-03 21:43:10

Quote from: Sebastian Mares on 2010-01-03 21:14:57

Post processing actually comes down to how wisely the low anchor was selected. It could be possible that for certain problem samples that target a special encoder, the low anchor sounds better than a contender.

After some testing and reading of previous results I think it will be more optimal to adapt less restrictive rules:

Quote

Remove all listeners from analysis who
a) graded the high anchor lower than 4.5,
b) graded the low anchor higher than the high anchor.
c) didn't grade the low anchor.

As many people agree that there won't be high anchor then:

Quote

Remove all listeners from analysis who
a) graded the high anchor lower than 4.5,
b) graded the low anchor higher than all competitors.
c) didn't grade the low anchor.

Quote from: Sebastian Mares on 2010-01-03 21:14:57

Regarding TVBR vs. CVBR for Apple, this also comes down to the goal of your test. If you want to test encoders in general, a poll is fine, if you want to test only encoders that are easily accessible to users, CVBR is probably best because it's available in iTunes directly and Windows has a higher market share than OS X.

As poll indicates there are more people who are interested in TVBR.
However I find that TVBR isn't enable at 96 kbps even in QT Windows version. If somebody can confirm that then the inclusion of TVBR is very questionable.

Public Listening Test [2010]

Reply #42 – 2010-01-04 04:56:37

As high anchor will be dropped then we can add one more competitor.

4 competitors + 1 low anchor

Proposal list:
1. Nero 1.5.1
2. Apple iTunes CVBR. (TVBR isn't enable at 96 kbps in QT Windows version)
3/4. Two winners from internal pre-test between Divx, Fraunhofer (FH) and CT.

Low anchor:
iTunes LC-AAC CBR 48 kbps.

VBR vs CBR:
Nero, Apple and Divx have (good) VBR mode while CT is CBR only.
I don't mind about CT but it's not convinient to include CBR encoders basing on previous experience. I don't want to create a wheel. Untill now we started to discuss already discussed items and realized that final desicion is the same as from previous experience. Finally my propose is VBR encoders only.

C.R.Helmrich, will new Fraunhofer encoder have VBR mode?

Settings
Best settings for final quality. No negotiation for speed.

Public Listening Test [2010]

Reply #43 – 2010-01-04 10:42:21

Are you sure 48 kbps LC is not too exaggerated? Do you expect any contender to be worse or as bad as 64 kbps?

Public Listening Test [2010]

Reply #44 – 2010-01-04 13:06:30

IgorC: Yes, Fraunhofer's AAC encoder supports VBR coding. We don't distinguish between TVBR and CVBR, though, so that VBR mode is most likely a CVBR mode.

Sebastian: Maybe we should think about using a generic low anchor, as done in MUSHRA tests. How about a 7-kHz lowpass filtered version of the reference? Sonically, that should be quite similar to 48-kbps AAC LC. And we wouldn't have to worry about which encoder to choose for generating the anchor.

As mentioned in the AES Journal paper referenced here, the goal is to span the entire range of the grading scale with the codecs and anchors, in order to minimize bias, i.e. reduce the width of the confidence intervals. A 7-kHz anchor (and 48-kbps LC) should be about mid-way between "very bad quality" and the quality of the 96-kbps LC encoders. To define the lower end, we could throw in 8-kHz 8-bit a-Law PCM of the reference as "64-kbps phone quality low anchor". That will take very little listening effort to identify, so it doesn't make the test more difficult, but gives us the advantage of stabilizing the test results.

Chris

Public Listening Test [2010]

Reply #45 – 2010-01-04 14:40:19

Wasn't the primary goal to test the latest nero (many peeps have been waiting long for its release) vs. apples TrueVBR (which was hyped up, not negatively meant, at its release)? I wouldn't even know where to get those competing aac encoders...

The scientific method demands an analysis of the single most separable part.

Maybe the public listening test needs more than one instance...

Well, I have to admit, though: I won't be a tester anyways, prly, and do not practice blindtests, but I follow hydrogenaudios recommendations.

Public Listening Test [2010]

Reply #46 – 2010-01-04 17:10:55

Though many people want to see how QuickTimes true vbr AAC encoder holds up, it isn't readily available. I have always thought that the underlying purpose behind public listening tests is to test encoders that were readily available. Apple has made TVBR AAC encoding on Windows so awkward that most people don't do it. The tags are lost, sometimes it doesn't work out, and you have to go through the process twice (once for encoding to a MOV file and then taking the AAC audio out of the MOV container and putting it in an mpeg-4 container). It takes about 5 minutes for the process to encode one file. Things are a lot easier on Mac OS X. So I feel that any QuickTime TVBR AAC results will benefit Mac OS X users only since most Windows users aren't going to go out of their ways just to encode a few TVBR tracks.

Sure, we can start including obscure encoders and settings but what would be the point if no one would use them regardless of how well the encoders perform? Additionally, at this bitrate, it looks as if QuickTime can't encode a TVBR AAC file. I can't confirm the results as I have access to QuickTime Pro only at work. Lastly, given the described performance of the FhG AAC encoder, it sounds like CVBR AAC might be best for testing QuickTime/iTunes.

Public Listening Test [2010]

Reply #47 – 2010-01-05 16:14:20

For Windows users, I made a tiny tool to access the QuickTime AAC encoder from the command-line.
qtaacenc

Public Listening Test [2010]

Reply #48 – 2010-01-05 16:53:37

Performance isn't all that great with with foobar2000 v0.9.6.9 and Windows 7. It took about 4 minutes to encode a 5 minute song using foobar2000 at -q100 (fb2k was reporting an encoding speed of 15x but that would drop to 0x after about 25 seconds when the progress bar was filled). Your exe file would consume about 35-45% of my processor (a simple 1.66GHz Atom but Lame.exe and Nero.exe consume about 5% or less) and foobar2000 jumped to about 115MB of RAM. The output file also throws dBpowerAMP for a loop. I encoded the test file at the 100 level and dBpowerAMP reports the bitrate as being 0kbps. iTunes sees it as being 224kbps though. Are things smoother on Windows XP or maybe when using the latest beta of fb2k?

This seems like a viable option except for the near-1X encoding speed. It would take me over 20 straight days of encoding my lossless archive using this tool. So I think you are onto something especially since Windows users generally get the shaft when it comes to true VBR encoding with QuickTime AAC.

Public Listening Test [2010]

Reply #49 – 2010-01-05 17:38:12

As far as I've tested with fb2k 0.9.6.9 and winXP, the conversion of about 10 minutes FLAC files finishes in 30 sec. Maybe there is a problem with Vista or 7. (But I can't test now, though )

The behavior of the progress bar is expected. qtaacenc begins conversion after the progress bar is filled (due to the lack of the pipe encoding feature).

Notice