Nightwish - Angels Fall First
Reply #19 – 2012-09-06 18:59:04
I wanted to try out the VBR+ mode (-V n+) of halb27's lame3.99.5y version compared to normal VBR mode, so unlike psycho, I didn't try just going for a higher quality -V n VBR mode. Using foobar2000's ABX tool with start time 0.9s and end time 1.9 s I found a slight wavering of the sustained right-panned guitar note by encoding using lame3.99.5y using plain -V 5 encoding option - not a high bitrate setting. After a bit of relaxing and comparing of A and B, this became easy to spot, resulting in 10/10 ABX. If anything I'd say it seems to waver in pitch or loudness every time a picking sound was heard, e.g. on the other strings of the arpeggiated chord, making it a regular time interval. This seems to tie in with halb27's notion that when short blocks are triggered (e.g. for the picking noise transients), in order to maintain frequency resolution in the simultaneous tonal signals requires an awful lot of bits to be thrown at those short blocks. For this reason he made the lame3.99.5y test version to allow the + version of the VBR modes, which reserves lots of bit reservoir and when short blocks are triggered, it increases the bitrate as much as possible to code with maximum accuracy in those blocks. I noticed 320kbps frames rise from 4/1154 to 141/1154 (though that's without using mp3packer to tidy up the wasted bit-reservoir filled with padding). I then tried lame3.99.5y using option -V 5+ on the commandline and found it practically impossible to spot. (6/10, though I identified it by a sort of subtle pitch difference - sharpness - on the -V5+ version, but failed, giving up on 2/4) I will mention that I'm not very good at spotting these artifacts and my listening environment isn't great, though I wouldn't imagine my cheap Philips Extra Bass earbuds would matter much compared to the background noises. A = original, B = 3.99.5y -V5 (normal VBR mode) SUCCESSFUL ABXfoo_abx 1.3.4 report foobar2000 v1.1.2 2012/09/06 17:22:13 File A: C:\Users\Dynamic\Music\Test signals\05___Angels_Fall_First_ringing.flac File B: C:\Users\Dynamic\Music\Test signals\05___Angels_Fall_First_ringing.wav.v5normal.mp3 17:22:13 : Test started. 17:23:55 : Trial reset. 17:24:22 : 01/01 50.0% 17:24:33 : 02/02 25.0% 17:24:47 : 03/03 12.5% 17:24:56 : 04/04 6.3% 17:25:13 : 05/05 3.1% 17:25:25 : 06/06 1.6% 17:26:10 : 07/07 0.8% 17:26:28 : 08/08 0.4% 17:26:48 : 09/09 0.2% 17:27:05 : 10/10 0.1% 17:27:08 : Test finished. ---------- Total: 10/10 (0.1%) A = original, B = 3.99.5y -V5+ (halb27's VBR-plus mode) FAILED ABXfoo_abx 1.3.4 report foobar2000 v1.1.2 2012/09/06 17:28:36 File A: C:\Users\Dynamic\Music\Test signals\05___Angels_Fall_First_ringing.flac File B: C:\Users\Dynamic\Music\Test signals\05___Angels_Fall_First_ringing.wav.v5plus.mp3 17:28:36 : Test started. 17:30:30 : 01/01 50.0% 17:30:44 : 01/02 75.0% 17:31:30 : 01/03 87.5% 17:33:28 : 02/04 68.8% 17:33:47 : 02/05 81.3% 17:44:01 : 03/06 65.6% 17:45:01 : 04/07 50.0% 17:45:21 : 05/08 36.3% 17:45:44 : 05/09 50.0% 17:45:56 : 06/10 37.7% 17:46:15 : Trial reset. 17:46:34 : 01/01 50.0% 17:46:53 : 01/02 75.0% 17:47:50 : 02/03 50.0% 17:48:49 : 02/04 68.8% 17:49:03 : Test finished. ---------- Total: 8/14 (39.5%) In short, though I don't think I'm good at spotting this sort of artifact and don't find it annoying in this case, I must commend Horst for his work on the -V n+ modes. I wasn't for sure expecting -V 5+ to do it - think it maybe needed -V 0+ instead - but it actually worked (for my ears) without increasing the VBR setting, just adding + to it. This VBR -V n+ mode seems to be a notable exception to the often reasonable rule-of-thumb of many codecs, not just MP3/LAME, which states that many artifacts get better only gradually because the extra bits are not applied exclusively to the right area but tend to get spread thinly when the encoder's psymodel doesn't know what the right area is. 3.995y -V n+ seems to narrow down the area where the bits are needed a lot better than most, probably because most of LAME's other artifacts are already fixed (and have been for many years).An analogy is to think of a square tray of plant pots, where you want to bury each seed under at least 1cm of soil. If a single seed is exposed by a gust of wind from one side blowing the soil from above it and you know where that was, you can apply a little extra soil (say 4cm³) just to the correct area of the correct plant pot, but if you don't know where the seed is (e.g. you're blindfolded or there's no light) or you are only able to add soil over the whole tray (e.g. you have restricted access from a great height), you have to add a lot more soil (e.g 1000cm³), only some of which contributes to covering over the exposed seed before it is sufficiently covered. To explain the analogy: exposed seeds = audible artifacts; soil = bits or bitrate; ability to see the location of exposed seeds = psychoacoustic model matching human hearing; soil placement accuracy = limitations of format (e.g. short/long block features allowed). As I understand how -V n+ mode works, which halb27 can correct me on if I'm wrong, whenever a short block is triggered by the normal -V0 psymodel, -V0+ tries to ensure that the maximum possible number of bits are made available to represent it as perfectly as MP3 allows, via the use of maximum bit reservoir and maximum (320kbps) frame size to reduce both pre-echo hiss and improve tonal accuracy during short blocks as much as possible within the normal MP3 format. I applaud Horst for coming up with a different approach that seems to work so well. In this way, because short blocks aren't too frequently used in most music and because it uses a sensible lowpass, it is one of the few techniques that really does apply a lot of the extra bits to the right areas. Using my seeds in a tray of plant pots analogy again , as I understand it, -V n+ is rather like knowing that statistically, exposed seeds are nearly all in the wind-facing half of the first row of plant pots on the side the wind was blowing from (analogy => most artifacts are in short blocks), so you can apply extra soil to only half of the pots on the windward side (=extra bits in short blocks) without wasting soil over the whole tray, and maybe use for example 100cm³ of extra soil, which is far better than the even-spreading approach which uses ten times more soil, but not quite as good as having the visibility of the exposed seeds and placing accuracy that tells you exactly where to deposit something more like 4cm³ of soil and lets you do so accurately. The actual relative volumes of soil are only for illustration, but get the picture across of how inefficiently extra bits normally deal with the problem, and how, if my understanding is approximately correct, lame 3.99.5y -V n+ deals with it much better. Perhaps the LAME psymodel can eventually be improved to detect situations where, let's say, strong tonal signals coinciding with transients in short blocks demand additional bitrate to ensure that both conflicting demands are met, and perhaps if it needs to LAME could step back a few frames and rearrange the data (stored in a buffer before being written out) to build up sufficient bit reservoir in advance of the need to exceed 320kbps local bitrate as much as required or as much as possible for these circumstances. The restrictions of the MP3 format definition clearly prevent clever and precise solutions such as in Opus/CELT where short-blocks or long-blocks can be chosen per frequency band so that a tonal signal and its main harmonics in a few bands can benefit from the frequency resolution of a long block at the expense of time resolution while a transient typically spread across many other bands can benefit from the time-resolution of a short block at the expense of frequency resolution. Brute force high bitrate allocation at selected instants seems to be the best approach permissible for MP3. If it's possible to derive a method for good detection of where these problems arise then many normal short blocks that don't contain tonal signals can retain the normal bitrate, within the limitations of mp3, perhaps it's equivalent to using 20cm³ of soil to hide each such artifact (seed)and little more to cover false positives mistaken for would-be artifacts, whereas, maybe an Opus CELT VBR encoder with a really great psymodel might need an additional 5cm³ of soil because it has the placement accuracy to deploy the bits in the best places. Anyway, enough analogies for today!