Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: 128kbps Extension Test - FINISHED (Read 73140 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

128kbps Extension Test - FINISHED

Reply #150
Quote
Quote
and i think you meant normal wma9 as the pro codec cant be used below 128 afaik


Oh, yeah? I sincerely don't know.

Can someone with Windows Media Encoder 9 check out if you can get a 64kbps two pass VBR encode out of it in WMA pro?

Nope. CBR minimum bitrate is 128kbit. The only way to get it below that is using the Quality VBR mode, Q10 and Q25 give bitrates below 128kbit, maybe even Q50, but the bitrates these modes spit out depend on the input you feed the codec.

128kbps Extension Test - FINISHED

Reply #151
this is an interesting test and I'm glad it was (and will continue to be) done, but as a scientist I have to ask : What makes this a double-blind test? shouldn't the samples be compiled on the fly for each person from a large bank of random music so as to eliminate subconscious prejudice from those selecting the samples?

After all the principal of the double-blind is that neither the experimentee or the experimenter knows what they are being subjected to until the results are in... blind tests have demonstrated homeopathy works, whereas double blind ones then dismissed the claim... 

EDIT : not that I think that it would make much of difference in this case, as the samples are supposed to be deliberately selected to not highlight any particular flaw...
Hip-hop looks like it's having more fun than you are - Chuck D

128kbps Extension Test - FINISHED

Reply #152
Quote
this is an interesting test and I'm glad it was (and will continue to be) done, but as a scientist I have to ask : What makes this a double-blind test? shouldn't the samples be compiled on the fly for each person from a large bank of random music so as to eliminate subconscious prejudice from those selecting the samples?

It is double-blind in the sense that neither the administrator nor the listener knows which codec is being listened to at any given time (unless there is only one codec being compared).

If random samples were chosen each time, there would be no way to make a group comparison, since preferences vary by sample as well as by person.

However, you have a point about how the samples are selected in the first place.  I selected the original group of samples for the 64 kbit/s test, after calling for people to send in short clips of music that they liked to listen to.  This process was definitely not random.  I culled according to my own judgment.  The idea was to obtain a mix of genres, with vocals both male and female, a variety of acoustic instruments.  I chose what I personally thought was interesting-sounding music (eg., I chose not to include several Japanese pop-music selections).

For the 128 kbit/s tests, Roberto substituted in a couple of "problem" samples, with lots of transients, known to give codecs trouble.

How this might have biased the test is unknown, but the caveat is clearly made that the results of this test are valid for this particular mix of music and the particular group of people who listened to it.

Quote
After all the principal of the double-blind is that neither the experimentee or the experimenter knows what they are being subjected to until the results are in... blind tests have demonstrated homeopathy works, whereas double blind ones then dismissed the claim... 


This is not so different from a drug test, in which the drug under test is known, but whether the drug or the placebo is being administered to any given person is not known.

ff123

128kbps Extension Test - FINISHED

Reply #153
Well, as i said I don't think it's too important in this case, as the testing is very specifically targeted, and the results are just as meaningful... what i'm suggesting would probably be a great del more difficult to implement, for only marginal gain.

after all, using targeted samples has it's place too.

I'm just suggesting a broader, possibly more meaningful test criterion, in that the purpose of the double-blind test is to erase any prejudice on behalf of the tester, as well as the testee. randomising the selection process, of course, is an integral part of this. Otherwise it's more like the less thourough Pepsi Challenge style blind test.

re: the drug test example. I feel providing truly random audio samples would be like the tester not knowing who gets what drug in double blind test. rather than matching the patient to the test drug or the placebo. true the test is blind, but if the testing is unknowingly potentially tailored for the "best" result then inevitably the trial is tainted.


Meanwhile any one aberrant result caused by the random sampling should be overruled by the mean result and is, at least more truly random.
Hip-hop looks like it's having more fun than you are - Chuck D

128kbps Extension Test - FINISHED

Reply #154
The test IS double-blind - you don't know which sample you're listening to, you can only discern between coded and not coded.
Double-blind testing was invented to remove some nonverbal information from uncovering what the sample is.
There is no such problem with computers.
ruxvilti'a

128kbps Extension Test - FINISHED

Reply #155
First of all, I'd like to apologise for the way I've put things in my last posts, as I think I'm coming across as a jerk. but hey, that's the 'net. it's easy to offend, unless you always agree with everyone... 

AstralStorm, my main point is that the interaction between the testers and the subjects is in the audio selection. if the music they encode is very randomly chosen on a user by user basis (say a random 30 secs out of an hour, chosen and encoded on-demand) the results might be different.

I don't think they would be. but they might.

imagine: "hey, our test says J-pop in 24kbs mp3 sounds the same as raw!" 
Hip-hop looks like it's having more fun than you are - Chuck D

128kbps Extension Test - FINISHED

Reply #156
I wasn't annoyed by your post. Just tried to dispell some FUD. Oh well...

Lots of people would find hardly any artifacts in normal (easy) track encoded by recent codec at this bitrate.
Even best would have problems.
This would diminish the differences between them.
ruxvilti'a

128kbps Extension Test - FINISHED

Reply #157
rjamorim, great work! Three thumbs up!

I think some ppl take these test a bit too seriously. The main point came out clearly: mp3 is outdated with 128 kbps. It is up to your software and hardware choices what to use of the winners.

Hopefully all the mobile device developers also read your test and move on from mp3  Can't wait the 64 kpbs test results.

Best wishes,
Jore
"There's nothing as pathetic as an aging hipster."

128kbps Extension Test - FINISHED

Reply #158
Quote
Wma in all its confusing glory:

Basically everything wma that is not pro, lossless or voice (even WMA v9 standard) can be played by all WMA codecs including the first v2 codec.

To play pro, lossless and voice install the wma v9 codecs (which will play everything).

So if I have a Nomad Jukebox that supports wma it will NOT play file encoded (lossy or lossless) with the Pro encoder?

Also, what is the real world scenario that the 64kbps test is evaluating?  I would think with all the "golden ears" here, noone would use that low of a bitrate.  Is it to evaluate codecs for the purpose of streaming content?

128kbps Extension Test - FINISHED

Reply #159
Quote
So if I have a Nomad Jukebox that supports wma it will NOT play file encoded (lossy or lossless) with the Pro encoder?

Right, at least until creative releases a firmware update (if ever)

Quote
Also, what is the real world scenario that the 64kbps test is evaluating?  I would think with all the "golden ears" here, noone would use that low of a bitrate.  Is it to evaluate codecs for the purpose of streaming content?


That, and flash players, and probably because the other bitrate ranges either have already been tested (128) or are untestable (160+)

128kbps Extension Test - FINISHED

Reply #160
For the WMA Codecs on protables, the source code comes from Microsoft themselves. I have heard they have yet to release a PRO or lossless version for portables to the various manufacturers.

128kbps Extension Test - FINISHED

Reply #161
Quote
and probably because the other bitrate ranges either have already been tested (128) or are untestable (160+)

Is this because at 160 or above some codecs are mostly transparent?  I would think that the presence of HD based players for mp3, wma (though not pro), ogg (Rio Karma), aac, and the whispers of the possibility of mpc, would make more people interested in seeing the results at 160 and 192.  I, for example, have been using lossless while waiting to make a decision on my lossy codec choice.  The relative performance of these codec at higer bitrates would really help me with that decision as well as the decision of what bitrate to encode files for portable use.  It would help even if some of them tie due to transparency.  For example if three codecs get 5s at 160 I can quit worrying and choose whichever portable I like and encode at 160.  Am I missing something?

And thanks guys for the wma clarification.

128kbps Extension Test - FINISHED

Reply #162
Great test Roberto.

I think from my point of view, I'd prefer an encoder that doesn't trip up very badly very often, even if its average score were a little lower.

Now, WMA Pro tripped up badly once. Perhaps it was bad luck and with other samples another codec would trip up, so statistical information isn't perfect.

However, I tabulated the mean scores (read from your graphs) and estimated the standard deviation.

Assuming all test samples are similarly distributed in terms of encoder variability, and assuming a "normal" or "gaussian" distribution, the average minus one sigma and average minus two sigma give a guide to the worst behaviour we're likely to see:

Code: [Select]
Track     AAC    Lame   MPC    Vorbis WMAPro Blade
41_30sec  4.36   3.3    4.33   4.2    3.97   1.4
ATrain    4.41   3.78   4.37   4.17   4.48   3.05
Bachpsic  4.5    3.41   4.66   4.51   4.8    2.9
Blackwat  4.62   3.92   4.71   4.38   4.56   2.18
death2    4.35   3.62   4.67   4.18   2.7    1.27
flooress  4.08   3.68   4.52   4.57   4.25   1.7
layla     4.15   3.59   4.4    4.24   4.45   1.83
macabre   4.59   4.06   4.55   4.54   4.86   3.16
midnight  4.56   3.42   4.43   4.26   4.38   2.39
thear1    4.69   4.16   4.48   4.11   4.44   2.41
thesourc  4.61   4.33   4.62   4.43   4.87   2.36
waiting   4.13   2.71   4.35   3.78   3.88   1.99

AvgScore  4.42   3.67   4.51   4.28   4.30   2.22
Std.Dev   0.21   0.44   0.13   0.22   0.59   0.62

-1 sigma  4.21   3.23   4.37   4.06   3.71   1.60
-2 sigma  4.00   2.79   4.24   3.84   3.11   0.99
-3 sigma  3.79   2.35   4.10   3.61   2.52   0.37

-1 sigma pt = 84.13% p(new sample < this value)
-2 sigma pt = 97.72% p(new sample < this value)
-3 sigma pt = 99.87% p(new sample < this value)

sd/sqrt12 0.06   0.13   0.04   0.06   0.17   0.18
errorbar  0.12   0.25   0.08   0.13   0.34   0.36


The probabilities at the end refer to the inverse normal distribution and the chances of getting a value worse than the -1 sigma point etc. if you chose a new sample at random and had the same listeners test it.

This is the result at the average minus 2-sigma point:



The errorbar line is based on the estimated error in the mean score, which I'd use to find the best rated codec overall on a mean score basis = 2*(Std Dev / Sqrt(12))


Just my thoughts. Many thanks to those who tested (I didn't have time, or probably the artifact training to join in)

By my criterion, of not failing badly, MPC wins over AAC, Vorbis, WMAPro, LAME, Blade.

(Edit: Note, I posted the wrong image originally, so please refresh if the top graph doesn't match the scores or this order)

DickD

P.S. Hmm, I wonder if WMAPro did badly only because it was using 2 passes to aim at 128 kbps for the specific short sample tested. Perhaps it's fairer to use it in a one-pass mode that averages at 128 kbps over many albums.

128kbps Extension Test - FINISHED

Reply #163
Quote
The errorbar line is based on the estimated error in the mean score, which I'd use to find the best rated codec overall on a mean score basis = 2*(Std Dev / Sqrt(12))

The graph showing the standard error of the mean is an interesting one, as it shows the variability of quality across the codecs.  I think that's a useful graph which should be included in future test analyses.

Edit:  your graph shows twice the standard error of the mean; I think a more conventional graph would show just the standard error.  Still, it's a good graph.

Quote
P.S. Hmm, I wonder if WMAPro did badly only because it was using 2 passes to aim at 128 kbps for the specific short sample tested. Perhaps it's fairer to use it in a one-pass mode that averages at 128 kbps over many albums.


1-pass VBR was way too variable in average bitrates across albums.  See the bitrate thread for this test.  So we chucked it in favor of 2-pass VBR.

ff123

 

128kbps Extension Test - FINISHED

Reply #164
Quote
Is this because at 160 or above some codecs are mostly transparent?  I would think that the presence of HD based players for mp3, wma (though not pro), ogg (Rio Karma), aac, and the whispers of the possibility of mpc, would make more people interested in seeing the results at 160 and 192.  I, for example, have been using lossless while waiting to make a decision on my lossy codec choice.  The relative performance of these codec at higer bitrates would really help me with that decision as well as the decision of what bitrate to encode files for portable use.  It would help even if some of them tie due to transparency.  For example if three codecs get 5s at 160 I can quit worrying and choose whichever portable I like and encode at 160.  Am I missing something?

Well, you can try to extrapolate the results of the 128kbps test. At 128kbps, MPC, AAC, WMA and Ogg ended up with average 4.35 points. At 160, I'm pretty sure all of them would reach very close to 5, maybe with some results from golden ears making the scores go down a little.

Unless I use very problematic samples, but then, it won't be really representative of several musical styles.

Finally, there's the point that it would be a VBR test. And I only have the courage to conduce one more VBR test (the 64kbps test)