Making lame mp3 preserve surround sound info

Topic: Making lame mp3 preserve surround sound info (Read 6674 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Making lame mp3 preserve surround sound info

2004-03-23 12:30:09

We've been here before, but it would be nice to get this issue solved.

The relevant threads are:

http://www.hydrogenaudio.org/forums/index....showtopic=12004
http://www.hydrogenaudio.org/forums/index....showtopic=17799
http://www.hydrogenaudio.org/forums/index....showtopic=18014

etc!

1. It is well known that listening to the "surround" channel in isolation (at its most basic this is just the difference channel, L-R), or (more commonly) just putting your ear close to the rear speakers doesn't sound very nice when replaying surround sound material encoded with lame --alt-preset standard.

2. However, it is assumed (but has never been tested) that lame --alt-preset standard (and, for that matter, MusePack -q5) are generally transparent, even when processed through a matrix surround decoder (e..g Pro Logic, DPL II, DTS neo.6 etc) if you have your speakers (and yourself) positioned correctly, and your decoder configured correctly.

It would be nice to run a listening test to check assumption (2), or at least to probe it a little. There are anecdotal comments like "lame at 128kbps sounds fine with Dolby Pro-Logic" - but that's just not good enough - lame at 128kbps doesn't sound "fine" with stereo material, let alone Dolby Pro Logic! However, I think this is a difficult thing to test - not many of us have the equipment.

So it may be possible to solve issue (1) instead. You see, it may be that I know I'm going to use a vocal-cut plug-in on my mp3s, and I don't want them to sound bad. Or it may be that I like sitting next to my rear speakers, bathed in the surround signal. Or it may be that I want to use my mp3s with a different kind of decoder which hasn't been tested yet. Or it may just be that, until (2) is proven, I want a bit of insurance.

Current "solutions" are to force discrete stereo only, or to use a very low NSMSFIX value (1.0 or lower works well). However, these seem wasteful (especially on mono sections of the material, like dialogue in a movie!), because they're not really attacking the actual problem.

So, here's a suggestion (because I know the lame devs have absolutely nothing else to do at all! ): Add a switch to lame that tells it that the surround information (i.e. the difference channel) is important. Currently, it's assumed that the difference channel itself doesn't matter, as long as M+S can be used to reconstruct L+R well enough. I'd suggest that this new switch should cause the masking model to be applied to the S (difference) channel too, so that, whenever M/S coding is chosen, the S channel is treated as if it were audible in isolation (which, of course, it is, given certain decoding). Rather than starving it of bits, the encoder should give it enough bits to encode this channel properly. Or switch to discrete stereo for this frame, if it turns out to be more efficient.

Further still, the switch could take a parameter, the value of which indicates the likely breakthrough from the front channels to the rear channels in the listening room. The masking calculation from the front channels (i.e. L+R) could be taken, reduced by this value, and added to the masking calculation for the rear channel (i.e. S: difference) so that sounds hidden by acoustic breakthrough from the front channels weren't stored, making the process slightly more efficient.

This value could range from 0, meaning that masking from the front channels will completely swamp the rear channel, and it shouldn't be considered in isolation (as now), to -infinity, meaning that the front channels are not audible in the same room as the rear channel, meaning that I'm intending to listen to the difference channel in isolation at some point (e.g. a vocal-cut plug-in).

Comments?

Cheers,
David.

Making lame mp3 preserve surround sound info

Reply #1 – 2004-03-23 15:10:47

There's already one (two?) threads on this here:

<http://www.hydrogenaudio.org/forums/index.php?showtopic=20005&hl=>

Making lame mp3 preserve surround sound info

Reply #2 – 2004-03-24 09:49:50

Quote

There's already one (two?) threads on this here:

<http://www.hydrogenaudio.org/forums/index.php?showtopic=20005&hl=>

No, I don't mean that!

That's a new extension to the mp3 format to allow them to store extra information to recover discrete surround channels from a stereo mp3 encoding. This requires a discrete 5.1 source, new encoders, and new mp3 decoders to recover the information.

I'm talking about adding functionality to lame so that the "rear channel" information in Dolby Stereo material is preserved as well as the left or right channel is with normal stereo material.

Cheers,
David.

Making lame mp3 preserve surround sound info

Reply #3 – 2004-03-25 01:43:34

Quote

We've been here before, but it would be nice to get this issue solved.

Current "solutions" are to force discrete stereo only, or to use a very low NSMSFIX value (1.0 or lower works well).

Hi!

It's all great what are you talking about.
I have a small note about nsmsfix or stereo mode. Even if you add -ms switch to VBR mode the problem is still present. The artifacts is perceiving less but it not disappears it's changing from "tonality" to "noise" (sorry, i don't know how to describe what i am listen in english).
If you use stereo mode instead of joint mode in CBR or ABR the noise is appearing in surround channel even at very high bitrates, and in case of joint it is not. The solution is nsmsfix>=1. Of couse i have tested this many times. And spend very much time (more than i want. ) to give myself the decision to preserve surround channel. I detected that ABR mode produces less artifacts than CBR at the "same" bitrates. Maybe it will be helpful. Now, to preserve surround i used lame 3.93.1 with --alt-preset 256. It is about 20% higher bitrate than aps and about 20% lower than 320. Statisticaly ap256 uses more 320 frames than aps. I am understand that it can not be reason to talk that is sounds better, only abx can give a conclusion. But i think it is very close to trust if lame developed correctly, i am believe in it.

Making lame mp3 preserve surround sound info

Reply #4 – 2004-03-25 06:57:15

Quote

Current "solutions" are to force discrete stereo only, or to use a very low NSMSFIX value (1.0 or lower works well). However, these seem wasteful (especially on mono sections of the material, like dialogue in a movie!), because they're not really attacking the actual problem.

as i PMed you, i used a low NSMSFix value of between 0.2 and 0.4 which fixed the problem, but caused it to encode most all frames in stereo instead of M/S, making the file the same size as forcing to stereo.

Making lame mp3 preserve surround sound info

Reply #5 – 2004-03-25 12:06:33

Single -

Are you saying that even forcing stereo doesn’t solve the problem? There are very good theoretical reasons why what you say should be true, but no one has ever suggested this before, so I'm curious.

Can you explain this bit please...

Quote

If you use stereo mode instead of joint mode in CBR or ABR the noise is appearing in surround channel even at very high bitrates, and in case of joint it is not. The solution is nsmsfix>=1.

nsmsfix at what value, with what other settings?

Can you tell us more about the testing you've done? What settings, samples, and decoder you've tried?

Do you have any particularly useful samples that you could upload?

btw, using high bitrate ABR does make some sense - whenever the psychoacoustic model fails (as can with surround sound info, because it's not even "listening" to it) one sure way to make things better is to force the bitrate up. This ensures that at least some bits are used in each channel and/or each frequency range. This is in contrast to a confident VBR mode which trusts the psychoacoustic model - such a mode will starve any channels and/or frequency ranges which the psychoacoustic model believes is unimportant.

Cheers,
David.

Making lame mp3 preserve surround sound info

Reply #6 – 2004-04-26 11:56:32

plonk sent me some samples (too long to post here) that showed (yet again) that if you can hear the surround channel clearly (e.g. in isolation) then aps is a mess.

There was nothing special about the samples. It just reminded me about this thread, so I'm bumping it up to see if anyone has anything to add.

Maybe some talented programmer has time to try the ideas I suggest in lame?

Maybe Single can expand on what he said, if he's still around?

Cheers,
David.

Making lame mp3 preserve surround sound info

Reply #7 – 2004-04-26 13:31:31

Quote

That's a new extension to the mp3 format to allow them to store extra information to recover discrete surround channels from a stereo mp3 encoding. This requires a discrete 5.1 source, new encoders, and new mp3 decoders to recover the information.

I'm talking about adding functionality to lame so that the "rear channel" information in Dolby Stereo material is preserved as well as the left or right channel is with normal stereo material.

Cheers,
David.

Basically, the new idea is to quantize & code spatial cues as extra bitstream (up to 16 kb/s)

Spatial cues are basically various parameters such interaural level difference, interaural time difference, cross correlations, etc.. amount of these parameters is set to be sufficient to reconstruct:

a) 5.1 channel information out of monaural bit stream (low bit rate mode)
b) 5.1 channel information out of stereo bit stream (high bit rate mode)

With HE-AAC, it is possible to achieve 48 or 64 kb/s for a) or b) respectively.

With MP3, it is more like 96 and 144 kb/s for these modes.

There are couple of proposals how to this (of course, we are actively analysing all methods - I heard some of the proposals, and it sounds very cool - much better than earlier Dolby Surround methos.

Making lame mp3 preserve surround sound info

Reply #8 – 2004-04-26 14:47:14

Quote

Basically, the new idea is to quantize & code spatial cues as extra bitstream (up to 16 kb/s)

Yes I know, and it's very interesting, but that's not what I was trying to start a thread about!

Maybe I should have chosen the thread title more carefully.

Cheers,
David.

Making lame mp3 preserve surround sound info

Reply #9 – 2004-04-26 16:19:14

In regard to your second statement, i forsee that even if we do get around to testing it, the tests would become very hard to evaluate on the 1 to 5 scale we normally do. Some people will be listening exclusively to the surround channels which isn't necassarilly correct either for how tey perform, you need to listen to the whole sound environment and evaluate.

Hmm, is surround info generated continuously at the same quantities? I thought the dolby matrix stuff relies on channel steering, so when there's a particularly important effect that should be heard at the back, the drawback to hearing a strong effect in the surround channel is the sound at the front gets a lower priority or something. Maybe not. Can someone answer this.

Anyway, if this is the case then maybe a super smart developer could write a code that could dynamically adjust that paramater you've suggested, maybe based on an in loop DPL decoder, that way a user doesn't have to set it, it sets istelf for what's most optimal from the 3 (left right surroun) channel psycho-acoustic standpoint rather than the 2 channel one i'm guessing lame already supports

don't mind me, if what i say is complete crap, it's because i'm sleepy and "revising" for an exam

Making lame mp3 preserve surround sound info

Reply #10 – 2004-04-26 16:46:35

You could account for steering (the Pro-Logic decoding that increases the separation between channels), but to be honest, if you just ensure that the difference channel is encoded OK (rather than starved of bits, as it is at the moment) then it should take care of itself.

What's more, if you just worry about the difference channel, rather than tying it to a particular surround algorithm, then the encodes should be OK for any use, depending on source material (e.g. DPL II, Ambisonics, vocal cut! etc).

You're right thougth - if you just want to target Dolby Pro-Logic, then you can probably do things more efficiently. Harder to encode though!

Cheers,
David.

Making lame mp3 preserve surround sound info

Reply #11 – 2004-04-26 19:43:42

As I'm very interested in this subject, and I cannot test on my own 5.1 system at the moment, I made some test samples available. I used a 5.1 AC3 test sequence that has a voice speaking in each of the channels in order, and used besweet 1.5b26 to make four conversions: AC3 decoded with azid to Pro Logic wav and Pro Logic II wav, and AC3 > MP3 using azid and lame 3.90.3 --aps, also using both Pro Logic and Pro Logic II. Please do compare the wav-decoded samples and the aps samples and reply if you can tell a difference in 5.1. Samples and all applicable logfiles available here.

Making lame mp3 preserve surround sound info

Reply #12 – 2004-04-27 06:02:01

so we can't do like a 90sec clip?

here's shortened ones i guess... not sure how well i clipped them

http://plonkmedia.com/.../jewel-intution1.mp3 (APS forced stereo)
http://plonkmedia.com/.../jewel-intution1-l&r.mp3 (APS forced stereo: what you'll hear in the l&r surrounds)
http://plonkmedia.com/.../jewel-intution1-l&r-js.mp3 (APS default joint stereo: what you'll hear in the l&r surrounds)
http://plonkmedia.com/.../jewel-intution2.mp3 (APS forced stereo)
http://plonkmedia.com/.../jewel-intution2-l&r.mp3 (APS forced stereo: what you'll hear in the l&r surrounds)
http://plonkmedia.com/.../jewel-intution2-l&r-js.mp3 (APS default joint stereo: what you'll hear in the l&r surrounds)

http://plonkmedia.com/.../ses-dreamscometrue.mp3 (APS forced stereo)
http://plonkmedia.com/.../ses-dreamscometrue-l&r.mp3 (APS forced stereo: what you'll hear in the l&r surrounds)
http://plonkmedia.com/.../ses-dreamscometrue-l&r-js.mp3 (APS default joint stereo: what you'll hear in the l&r surrounds)

http://plonkmedia.com/.../utada_hikaru-sakura_drops.mp3
http://plonkmedia.com/.../utada_hikaru-sakura_drops-l&r.mp3 (surrounds/forced stereo)

edit: intuition1 has one type of artifact; intuition2 has another type; dreams come true has both types of artifacts but is the best example i had laying around and encoded of how discreet the matrixing can isolate something. pretty cool.

edit2: hiki! just another example of how discreetly it works. didn't create a APS JS of this one... i just used this one to test out WMA's 5.1 capabilities...

edit3: crikey, had to fix urls

it's unfortunate i don't listen to more mainstream stuff that i could use for examples

Making lame mp3 preserve surround sound info

Reply #13 – 2004-05-19 05:12:54

I think this needs to stay at the top until those who seek to keep improving LAME answer for why the "surround" information has clear and audible artifacts and how it should be approached to correct this. If you can decode part of the sound in a common way and it's coming up garbly, then I think some adjustments need to be made to improve the quality and tests need to be done at higher bitrates and different ms/stereo modes to determine if there is a way to improve the sound and if simply tweaking the current VBR fixes it.

tlc

Making lame mp3 preserve surround sound info

Reply #14 – 2004-05-19 05:20:17

plonk420 is not using surround source material, but some pseudo surround DSP (DPLII?).

Making lame mp3 preserve surround sound info

Reply #15 – 2004-05-24 11:15:38

True, but DPLII is designed to enhance stereo sources too.

I know the argument about not post-processing mp3s at all, but decoding stereo information to some kind of surround is just going to become more and more popular.

More importantly, there is no fundamental reason why lame will not mess up genuine surround signals in just the same way.

What I've suggested isn't algorithmically difficult, though as a non-programmer, I don't know how tricky it would be to implement.

It seems like it's worth a try though.

Cheers,
David.

Making lame mp3 preserve surround sound info

Reply #16 – 2004-05-24 12:57:03

I don't think that using "plain stereo" is near a good solution. In fact M/S-Matrixing is used in AC3 to be backwardscompatible to Dolby Prologic. M/S Matrixing is used to preserve the surround channel's quality in case of a 2 channel AC3 file that contains a Prologic signal.

I don't know what LAME does in this case, but I can agree with Dolby's proposal to do something like this:

1) M=0.7*(L+R) and S=0.7*(L-R)
2) Compute overall energy in all 4 channels (L,R,M,S)
3) Search for the channel with the lowest energy level
4) In case of L or R being that channel -> use L/R coding
5) In case of M or S being that channel -> use M/S coding

The remaining thing would be to assign appropriate scalefactors to the S channel as it were a single channel encoding.

The problem with a pure L/R encoding is IMHO: The L/R channels are independantly quantized thus introducing mostly orthogonal noise wich will be noticable on all speakers after Prologic Decoding - even if it's just a single centered voice that has to be encoded.

bye,
Sebi

edit: fixed typo

Making lame mp3 preserve surround sound info

Reply #17 – 2004-05-26 03:53:44

It would be cool to have a "preset standard surround" which keeps the quality intact even after decoding with matrix methods. IMO, I would use it even if it produces bigger files.

tlc

Notice