Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Public Listening Test [2010], Item Selection (Read 48894 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Public Listening Test [2010], Item Selection

Reply #25
OK, guys, I need help here. The number of critical items I can dig up is overwhelming. So, in order of appearance:

halb27, post #3:

Your harpsichord 40 item is accepted for testing. Other items you proposed are from "mp3 times" and might not be AAC-critical. They are:
-  trumpet_My Prince
-  lead-voice
-  trumpet
-  Là Ou Je Suis Née
-  keys_1644ds
-  herding_calls

/mnt, post #5:

over the last few months you proposed a nice long list of AAC-critical items. Linchpin is already accepted for testing. Your remaining proposals are:
-  Kraftwerk remasters (the Zip file with excerpts you uploaded around Christmas?),
-  Show Me Your Spine (any 15-second passage you favor)
-  Human Disease
-  Hexonxonx
-  smothered_hope

IgorC, post #11:

many of the items you proposed to me are actually already in Fraunhofer's internal test set, so well known to me. The ones which are not are:
-  Creuza
-  Spill the blood
-  Aquatisme from 48 kbps AAC test
-  Descending Darkness
-  Girl
-  Erase_replace

To all above members:

Could you please ABX-HR your respective items list using the newest nero 1.5.4 -q 0.41 and QT True VBR Q60 and then report to me via personal message

-  which item is easiest to ABX vs. the lossless reference, regardless of coder (or in other words, averaged over the two coders)
-  how large the differences are for each item between the two coders without mentioning the coders (i. e. "huge differences", "both sound bad", etc.)

That would be a great help! Thanks,

Chris
If I don't reply to your reply, it means I agree with you.

Public Listening Test [2010], Item Selection

Reply #26
It seems to me this more a test of how different encoders react to various killer samples - which is interesting but far less useful than a general audio quality test. You should test a wide selection of musical genres, not statistical anomalies (which all codecs will have) which have no bearing on overall sound quality whatsoever. It seems like an enormous waste of effort, have I missed something crucial here?

Public Listening Test [2010], Item Selection

Reply #27
Quote
have I missed something crucial here?


This post and below.

Public Listening Test [2010], Item Selection

Reply #28
For me the idea that codecs have an "overall general quality" is a joke, it's either you can tell that there is a difference & describe how it sound different to you, or you can't.

Before I did some ABXing for myself & wasted several days to understand how lossy music sounded different from lossless music, I used to speak with term like this "overall general quality" ... nowaday I think that this way of speaking is only due to a lack of understanding of what you are speaking about: for me in the mouth of a newbie: "overall general quality"=placebo.

The only "overall general quality" of a codec I know is mix between:
1: how many killer samples affect the codec.
2: how bad is the distortion that you can hear within the killer samples that affect the codec.

What is a killer sample? It is a sample that you can ABX, simply. There is no "overall general quality" outside of killer samples because you cannot evaluate the quality of sound that you cannot even ABX.

This is specially true at medium/high bitrate because then generic music (not selected) will be transparent 99% of time ... evaluating the quality of transparent samples is non-sense.

I agree that you can speak of "overall general quality" of a codec at low bitrate (strictly below 128Kbps for modern encoders) because at say 96Kbps even random (non-killer) music is likely to not be transparent. Then yes "overall general quality" can have some meanings, but IMHO that is a special case.

I know that the idea that such thing as an "overall general quality" exist is due to people speaking of codecs in a very generic way like "this codec sounds metallic", with modern codecs this is definitly a word abuse & a generalization.

Don't be fooled by the langage, if a codec pass killer samples you can rest in peace: generic music will be an health walk for the codec.

Edit: I disagree with Gurubolez's above opinion, but that's also why I value personnal test that I can reproduce like those of /mnt more than public listening tests. /mnt killer samples are often gold to me.

Public Listening Test [2010], Item Selection

Reply #29
Edit: I disagree with Gurubolez's above opinion, but that's also why I value personnal test that I can reproduce like those of /mnt more than public listening tests. /mnt killer samples are often gold to me.

Which is why I like to include some of /mnt's samples in this public test  By the way, do you have some AAC-critical samples of your own to add to this list?

And: which of guruboolez' posts are you referring to?

Chris
If I don't reply to your reply, it means I agree with you.

Public Listening Test [2010], Item Selection

Reply #30
1: No, I don't have personnal samples for AAC, the simple reason being that I actually don't use lossy at all. I like Nero AAC LC's quality (Specially 0.55) but I have a personnal problem with AAC not being gapless natively (from MPEG specifications). Stealing other ABXer samples is the reason why I read this topic  I keep an eye on AAC 's quality discussion because I might use it one day with video, but it is unlikely that I will ever use it for music. I always buy new HDD to keep using lossless.

2: Those linked within lvqcl' post.

 

Public Listening Test [2010], Item Selection

Reply #31
1: I like Nero AAC LC's quality (Specially 0.55) but I have a personnal problem with AAC not being gapless natively (from MPEG specifications).

True, the lack of gapless playback standardization is shocking even for me. But there might be some progress on this subject soon.

Chris
If I don't reply to your reply, it means I agree with you.

Public Listening Test [2010], Item Selection

Reply #32
Since all thinkable gapless implementations need the same metadata (encoder delay, actual number of samples) I wouldn't worry too much about it. Apple's implementation is the de-facto standard right now. And if there is every any future formal specification, it will be trivial to copy the existing iTunes metadata into a tag compatible to the new scheme.

Public Listening Test [2010], Item Selection

Reply #33
Well, the following is about gaps, I know it is offtopic here but I don't know where to answer:

I did a very short & very simple test after googlebot's post, I encoded my lossless Pink Floyd - 1973 - The Dark Side Of The Moon rip to Nero AAC V1.5.4.0/quality 0.55 with F2K V1.01 & I listened between track 5 & 6 to see if I could hear anything bad, either added silence or a glitch.

Code: [Select]
  TRACK 05 AUDIO
    TITLE "Money"
    PERFORMER "Pink Floyd"
    INDEX 01 19:24:35
  TRACK 06 AUDIO
    TITLE "Us And Them"
    PERFORMER "Pink Floyd"
    INDEX 01 25:56:35


With the tags created during the encoding I couldn't hear any glitch.

Now I deleted the tags with Mp3tag v0.45a & re-listened to the transition between track 5 & track 6, & guess what ... now without tags there is an audible glitch ... I don't know if this is an issue with Mp3tag but all I know is that this very simple test (It takes less than 5 min) shows that losing the gapless playback metadata information by misstake is actually very easy with the actual Nero trick ... so as long as a simple tag edition will end in the possibility of losing gapless playback personnaly I will not use AAC for music. (I may use it with video as gaps are not an issue there)

Even if there is a standard for this one day & even if it is a trivial task to convert the actual metadata trick to this future standard ... it is actually so easy to lose this information that you may have lost it before a more robust standard exist.

My dream lossy codec is an MPEG ISO standard codec that achieve the quality of Nero AAC, with a native gapless support as good as Vorbis/Musepack (gapless directly in the specification). The actual tag trick is not satisfying for me. In the future this issue may lead me to re-use Vorbis while I know from my listening tests that Nero AAC beats Vorbis qualitywise. Actually I only use lossless in order to avoid choosing between plague & cholera ...

Public Listening Test [2010], Item Selection

Reply #34
I don't understand the issue. Proper gapless tags were saved. You removed them with a tagging tool, that didn't honor them. And gapless info was lost. The situation would not be different if the tags were written in a to be proposed ISO format, as long as the tool you are using doesn't honor them.

PS I just realized: sorry for the ongoing off-topic debate. Feel free to remove it from the thread.

Public Listening Test [2010], Item Selection

Reply #35
Well indeed I obviously deleted manually the metadata tags but the issue is that you cannot delete them as easily with a simple tag editor from other codecs with native gapless support. So on the one hand many people (incl. newbies) edit their tags, on the other hand very few people (advanced users) are aware of the gap problem untill they suddenly hear a glitch. This disproportion leads to the conclusion that many unaware AAC users might sooner or later lose their gapless metadata which is "volatile". Indeed it is not really an issue for people like us who know which misstake not to commit in order to keep their metadata, but it is not natural for beginners to think that editing tags can hurt the playback of their files. This tricky tag solution is just not friendly to newbies & not fully satisfaying IMHO. Feel free to disagree that's just my opinion.

Edit:
Even if from a technical point of view the info wouldn't be very different if it was in the specification, the metadata would be buried deeper in the files. Embedded inside, instead of wrapped around. Undeletable & supported by default by all decoders. For me gapless playback is not something "optionnal", it should be at the heart of any modern codec. Despite its great audio quality, with regard to gaps, AAC is not really a modern codec. It seems to me that nothing evolved for gaps between mp3 & AAC, which means that so far all audio codecs designed by MPEG are thought for video users & not audiophiles.

Public Listening Test [2010], Item Selection

Reply #36
Did you report this bug to mp3tag?  (I assume this doesn't occur when editing tags with itunes.)  I also have some issues with the itunes aac tag format, including the lack of support for multiple items in a tag (such as multiple artists), but it is the standard now, and I doubt another (lossy) format will succeed it.  (had it arisen now, the tagging format chosen would probably have been xml.)



Public Listening Test [2010], Item Selection

Reply #37
1: No, I don't have personnal samples for AAC, the simple reason being that I actually don't use lossy at all.

But here (very helpful test, by the way!), you mentioned a Ginnungagap item. Is that AAC-critical? If yes, can you point us to that one, or upload it here?

Thanks,

Chris
If I don't reply to your reply, it means I agree with you.

Public Listening Test [2010], Item Selection

Reply #38
Ginnungagap (a sample from a Therion song) is a very noisy critical item for lossywav, I haven't seriously tested it with AAC but I doubt it would be as critical as the encoding technique used are very different. I honestly don't recall that I tested it with classic lossy encoder but it is likely that I quickly did but found nothing & that it's why I give up the idea of cross testing lossywav killer samples with classic lossy encoders ... but at the same time testing plenty of DCT killer samples on lossywav I found that Abfahrt Hinwil & Fool's Garden samples (which were originally found with classic lossy encoders) were also critical for lossywav ... so there is definitely the possibility that lossywav killer samples affects DCT codecs. But for Ginnungagap I honestly don't know, it was found by Martel on lossywav & has remained a lossywav specific test sample. It could be tested but as with all listening test it takes time. I am not sure it is really worth it because even if I think that this sample might be hard to encode at 96Kbps, it has very few chance to be as interesting as Harlem & Autechre at 128Kbps.

Public Listening Test [2010], Item Selection

Reply #39
TechVsLife:
I just tested the broken gapless playback problem with Mp3tag V2.46 in case it was already fixed, sadly the problem is still here, so I sended a PM with a link to Post #34 to Florian.

Edit: For fun I tested with aoTuVb5.7 ... with or without tags no glitch indeed. It shows that audio quality/compression is not everything, features are very important too.

Public Listening Test [2010], Item Selection

Reply #40
@sauvage78: thanks.  unlike the ms/apple bureaucracy, florian does correct bugs quickly (otoh, ms/apple have infinite wealth and life).

Public Listening Test [2010], Item Selection

Reply #41
For public (and personal) reference, some more potential items from LAME 3.96 tuning days (some have already been mentioned here):

http://www.hydrogenaudio.org/forums/index....showtopic=19882

I also re-uploaded the abovementioned Mandylion item in our accompanying upload thread.

Chris
If I don't reply to your reply, it means I agree with you.

Public Listening Test [2010], Item Selection

Reply #42
It seems to me this more a test of how different encoders react to various killer samples - which is interesting but far less useful than a general audio quality test. You should test a wide selection of musical genres, not statistical anomalies (which all codecs will have) which have no bearing on overall sound quality whatsoever. It seems like an enormous waste of effort, have I missed something crucial here?


If the "killer samples" are from readily available music, they are valid.

If the "killer samples" are very rare one-off's like some guy playing three notes on a violin in his bathroom, they are invalid.

Unless I misunderstood this test was not about music?

Public Listening Test [2010], Item Selection

Reply #43
I don't want to repeat myself so much, so please start with this (posts #195-200).

Who says that killer samples must stem from readily available music? They must be readily available (CD or download), but not necessarily what we consider music.

Plus, some musical pieces (e.g. Jazz) have isolated instruments in them, so it's not that far off. And guess why we have these items in our list - because solo instruments can reveal artifacts that spectrally complex music can't reveal.

Plus plus, only half of the samples used in MPEG tests are musical pieces. The rest are instruments from the SQAM CD, speech recordings, etc.

Chris
If I don't reply to your reply, it means I agree with you.

Public Listening Test [2010], Item Selection

Reply #44
2. Samples.
Different styles of music, different levels of difficulty, pointing issues etc....?  To be discussed here or in separate topic.

Chris,
I was working from the above, so I apologize for missing that the sample selection had migrated away from a broad range.  While I agree that 128kbps in modern codecs is substantially better than it was 5 years ago, I don't agree that 128kbps is too high for a valid test based on a broad range of samples including mainstream music.  I'll have to go along with guruboolez that with a test based on rare extreme samples, my interest is nil.

Public Listening Test [2010], Item Selection

Reply #45
As far as I understand, the result for a "broad range" 128kbit/s test can be known in advance "mostly transparent with few exceptions". That's niler than nil.

Public Listening Test [2010], Item Selection

Reply #46
Who says that killer samples must stem from readily available music? They must be readily available (CD or download), but not necessarily what we consider music.

I'm a layman, but fwiw, I'm somewhere in the middle of the two camps: I think there is value in going for killer or extreme samples, because some people are using lossy formats as their only formats, i.e. as formats for both casual portable and more demanding at-home listening, so there's a need to test the possibility of glaring artifacts (since they have no pure archive format as a backup).  (Also that might affect individual decisions about one's need for lossless fornats.)   

However, I don't see a purpose for picking non-music or super-artificial samples EXCEPT and INSOFAR as they help developers fine tune their encoders for non-artificial music samples. 

Lossy compression is always lossy with a purpose; otherwise, you couldn't decide which bits you can afford to lose or not.  I don't think lossy music compression should be construed as SOUND compression, i.e. faithful reproduction of any sound or noise, because I assume that would defeat some of the techniques used which depend on harmonics and because the reason that people use mp3/m4a compression is for the appreciation of music and songs, and not for copying arbitrary sounds.  Even speech compression is better served by speech-specific codecs. 

To me, extreme metal and synthetic music starts to enter a non-music world; that may be taste only and others might want a codec geared precisely to reproduce that genre very faithfully.  But if there has to be a tradeoff between faithful reproducing of "natural" music and say the most extreme amusical/synthetic samples, I'd say there's great justification for gearing it to the faithful reproduction of music (or if you will traditional music).  --Of course, if there's no tradeoff, there's no problem here, but I assume that there's some correlation between the character of the killer samples we see and the difficulty of encoding them.  Also a caveat, there's some traditional music that is difficult to encode (harpsichords) so it is somehow "unnatural" from the point of view of the encoder (--if the encoder had a musical taste).

I assume the lossy encoders can to some extent "know" what genre they are facing, speech, heavy metal, etc. and switch techniques accordingly, though apparently none of them are good enough to use all the bits they need for the killer samples, perhaps because of the constraints imposed by some common lossy techniques that work so well for 99% of music.

Public Listening Test [2010], Item Selection

Reply #47
I assume the lossy encoders can to some extent "know" what genre they are facing, speech, heavy metal, etc. and switch techniques accordingly, though apparently none of them are good enough to use all the bits they need for the killer samples, perhaps because of the constraints imposed by some common lossy techniques that work so well for 99% of music.


That's usually not happening. Every sample faces the same psy model. It will have a different states as a function of the sample's context, but that's all. There is usually no higher level mode-switching or genre detection going on. A local context is all that's needed for the encoder to make optimal decisions. 2-pass encoding is an exception, but that doesn't do anything, that could be called "genre detection", either.

And it's not a bug, it's a feature. The code would be a mess to maintain and tuning a generalized solution makes much more sense. It would be different, for example if one encoder for both guitar and avantgardistic electronic music would be a contradicting design goal, but it's not. The design goal is exploiting your ear and auditory system, for whatever one throws at it.

I would agree to not include any synthetic samples, that were especially created to fuck a specific implementation. With enough knowledge about piece of code, there will probably always be some way to exploit it. But that's not true for any of the submitted samples, as far as I can see. Some of them may sound strange to your ears, but it is an encoder's job to fool them anyway and be transparent.

And LAME, Fraunhofer, Nero, and Apple have become so good, that I really don't see much sense for yet another listening test, that turns out as transparent for item 1-14, almost transparent for item 14-15. We know that already and 1-14 will be a major pain to test, which would frustrate many potential listeners. So why not go for hard nuts only, and see which encoder can crack the most.

Public Listening Test [2010], Item Selection

Reply #48
A local context is all that's needed for the encoder to make optimal decisions. 2-pass encoding is an exception, but that doesn't do anything, that could be called "genre detection", either.

Thank you, I don't know how the encoders do their magic.  At least with the local context, it looks like they still "know" to throw more bits at certain things and not at others, which end up corresponding to higher bitrates on e.g. heavy metal music.  ok, not genre detection at all, but a detection of local difficulty/hardness that if sustained over the piece can perhaps correspond in some way to genre.  I also didn't know the same rules were used, switching to more bits with the same rules depending on the local context.

. It would be different, for example if one encoder for both guitar and avantgardistic electronic music would be a contradicting design goal, but it's not. The design goal is exploiting your ear and auditory system, for whatever one throws at it.

Here I'm not sure I agree.  If it turns out that encoding efficiently for avantgardistic electronic music (lowest bitrate, highest quality) produces worse quality than encoding efficiently for guitar music, then it seems there wouldn't in fact be one design goal (i.e. as a practical matter), regardless of intention.  Now that, I take it, is not the case.  But it could be the case, e.g. some sounds are not easily reduced to a score (sheet music)--"sheet music" encoding hasn't been designed efficiently for reproducing all sounds (likewise midi).  Some compressors work better on text formats, some on binaries and jpegs.  Even though we would prefer there to be one compressor that worked best for all, we are forced to make a choice (or the compression engine decides on the fly).  It could be that certain kinds of sounds are more efficiently compressed by one kind of encoder than another.

Public Listening Test [2010], Item Selection

Reply #49
Some compressors work better on text formats, some on binaries and jpegs.  Even though we would prefer there to be one compressor that worked best for all, we are forced to make a choice (or the compression engine decides on the fly).


That's a completely different case than (perceptual) compression of audio. Binary (non audio) data can have regularity (and also complexity) several magnitudes larger than usual audio data. Binary (non audio) data can have repetitive patterns spanning larger ranges than what could be found by brute force search. So it can make sense to tune a lossless data compressor for specifically structured patterns out of the infinite problem space. With the exception of looped, electronic music the same is not usually true for audio.

There are also perceptual encoders tuned for special use cases, like speech codecs. They have some modification above what a general purpose encoder would do on the fly. But their goal isn't necessarily transparency, but sounding good enough at very low rates*. For general purpose encoders, which target transparency, there is no "on the fly" mode switching. All it does, is look at how many bits it does still have available at the current position and what its estimate for the audibility of each component in the current window are.

* Especially lower sample rates, for which usual block sizes are insufficient.