Skip to main content

Topic: lame3100h, a functional extension (Read 28052 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • SubV
  • [*]
lame3100h, a functional extension
Reply #25
I've compiled the LAME3100h with SSE2 support, which results in much faster encoding.

Also included the VC++ 2010 dll's and the latest version of mp3packer.

You can download it here.

lame3100h, a functional extension
Reply #26
Awesome!  Spasibo Sub! 
FLAC -2 w/ lossyWAV 1.3.0i -q X -i

  • lock67ca
  • [*]
lame3100h, a functional extension
Reply #27
So there's such an improvement over 3.99.5 that you're really comfortable using an alpha version for archiving? That does sound promising but I'm not sure I want to make the jump until there's an official release.

  • BFG
  • [*][*][*]
lame3100h, a functional extension
Reply #28
So there's such an improvement over 3.99.5 that you're really comfortable using an alpha version for archiving? That does sound promising but I'm not sure I want to make the jump until there's an official release.

Perhaps I'll regret it later, but I've already ripped 80 CDs using 3.100h, and have no regrets so far.  It's been equal or superior to 3.99.5 in every listening test I've done so far.  And it does notably reduce preecho.
  • Last Edit: 15 January, 2013, 12:01:44 AM by BFG

  • Kamedo2
  • [*][*][*][*]
lame3100h, a functional extension
Reply #29
I used both halb27's original binary and SubV's sse2-enabled compile, and I've got slightly different results, both encoded mp3 and decoded wav.
-V2+, 5 min snippet of pops and jazz musics.
halb27    SubV
8350652 8349508 filesize[Byte]
.049972 .049981 similarity

halb27's binary is producing 0.014% larger mp3 with slightly better accuracy, at least on the waveform.
I don't care about these subtle difference, but I report it anyway.

  • halb27
  • [*][*][*][*][*]
lame3100h, a functional extension
Reply #30
From the Wikipedia on SSE2 I see differences are to be expected. Precision of SSE2 calculations can be reduced. Whether this has an audible impact is another question.

I personally want to provide only exes which hopefully can be used on any Windows system, especially as to me it's fast enough using foobar and a multi-core system (on my 6-core HT system 12 tracks are encoded in parallel). But I welcome any variant like this one for those who want  to use it.
  • Last Edit: 15 January, 2013, 11:36:39 AM by halb27
lame3995o -Q1

  • SubV
  • [*]
lame3100h, a functional extension
Reply #31
I used both halb27's original binary and SubV's sse2-enabled compile, and I've got slightly different results, both encoded mp3 and decoded wav.
-V2+, 5 min snippet of pops and jazz musics.
halb27    SubV
8350652 8349508 filesize[Byte]
.049972 .049981 similarity

halb27's binary is producing 0.014% larger mp3 with slightly better accuracy, at least on the waveform.
I don't care about these subtle difference, but I report it anyway.

http://www.hydrogenaudio.org/forums/index....st&p=819634

Quote from: db1989 link=msg=0 date=
Binaries compiled with different settings/optimisations/etc. may produce insignificantly differing bitstreams. This is a non-issue in reality.


  • BFG
  • [*][*][*]
lame3100h, a functional extension
Reply #32
Would it be possible to build a single 3.100h binary which is able to detect if SSE2 is available on the user's machine, and if so, enables it by default?  (You'd want to add a flag that can force-disable or force-enable, most likely.)  This would be better than having two different binaries.

  • halb27
  • [*][*][*][*][*]
lame3100h, a functional extension
Reply #33
As far as I can see in the VC++ environment you just have the option to use configuraton 'release' or 'releaseSSE2'. So this is a static decision while developing the software.
lame3995o -Q1

  • [JAZ]
  • [*][*][*][*][*]
lame3100h, a functional extension
Reply #34
halb27 is correct (although the naming doesn't matter).

Most compilers only compile in one setting. (i.e. targeting a supported instruction set, like SSE, or not).

Back in the days, there were software that had different paths, because of hand-written code (i.e. writing assembly code by hand).  LAME used to have an assembly path, but with 64bits, "old" assembly doesn't even compile, so developers have to either use intrinsics, or maintain (again) separated branches.

A semi-automatic way could be done by the use of dlls, (each one built with one support, and loading the one needed), but again, that adds work to the developer.

And also, there's the "bundle" solution.  Like the "universal binary" in Apple's OS or the binary of "process explorer" (From Microsoft), which are actually two complete binaries built in each setting, wrapped inside a binary that knows when to use one or the other.


  • halb27
  • [*][*][*][*][*]
lame3100h, a functional extension
Reply #35
It's not a long time ago that I wrote 'I have no plans for a new version'. Well, things can change fast. A few days ago I listened carefully to Camille's 'Là où je suis née' which includes a spot with a tiny issue (see this thread from 3.98beta days). Quality has improved IMO with the newer Lame versions, but I was surprised that it's still a subtle issue. To be quite clear: I am not able to ABX it on a regular basis, but I can when I am in a good condition. I guess it wouldn't worry me if it was a spot in a say hard harpsichord sample, but this is 'just' female voice. And worse for me: my impression is that lame3100h -V0+ is inferior to plain lame3.100.a2 -V0. Sorry for being vague; the issue is really subtle to me, and maybe I'm just too nitpicking.

Nitpicking has its merits though, and looking up the problem I see that this is one of those cases where we're running out of channel and granule space. This was a reason for thinking things over, and I found that my target bitrate control can be improved. Target bitrate control is with respect to frames at the moment, but I can do it also on a granule or even channel basis. This may improve Camille's song or not, but I feel I should do it anyway. It is the more adequate way of doing it for frames of mixed block type (start/short, short/stop), as at the moment the granule of start resp. stop type gets too many bits in such a mixed block situation. Improving this means improving short block behavior a bit. So I will do it.

Sorry to everybody who has started testing, especially Kamedo2. I suggest to just continue testing with lame3100h.
  • Last Edit: 23 January, 2013, 09:18:43 AM by halb27
lame3995o -Q1

  • Kamedo2
  • [*][*][*][*]
lame3100h, a functional extension
Reply #36
halb27, Is the new version available? I'm considering discarding current results(only 2) and testing the new one. More people will be interested if it's the latest encoder.

  • halb27
  • [*][*][*][*][*]
lame3100h, a functional extension
Reply #37
I'm working on the new version. I guess it will take me a week or so for developing, and another week for testing.
Testing will be a bit of a problem because the changes of the new version are pretty substantial. After first tests of my own I plan to publish a prerelease, so hopefully some members will contribute to testing.
lame3995o -Q1

  • halb27
  • [*][*][*][*][*]
lame3100h, a functional extension
Reply #38
Here comes version 3.100i for testing.
I've also included a SSE2 version, just for experience, I am not happy with it: eig_essence is clearly worse to me than when encoded with the non-SSE2 version. So I probably will not include it in the completely tested version I'll publish in a separate thread.

Samples for testing can be downloaded from here. Of course you're welcome to do any tests you like.

Controlling quality of the granules (halfs of the frames) instead of frames as I do with this version for the very first time has proven to be essential. The major reason why there is an audible issue at ~sec. 3.0 of eig with original Lame is frame #109 which contains granules of type 'short' and 'stop'. The 'stop' granule needs a lot of bits. With 3100h it gets a lot of bits, but only because the frame has a short granule, and target bitrate control is on a frame basis. I added a bit of silence to the beginning of the track so that this stop granule goes into frame #110 (now a stop/start frame). Now 3.100h doesn't help much, and at least with the lower quality -Vn+ levels the issue is very audible. This modified version of eig_essence is included in the samples zip file described above.

As I wrote earlier 3.100i means a lot of changes compared to 3.100h. I'd be very happy if some of you could contribute with testing.
  • Last Edit: 05 February, 2013, 11:25:20 AM by halb27
lame3995o -Q1

lame3100h, a functional extension
Reply #39
As I wrote earlier 3.100i means a lot of changes compared to 3.100h. I'd be very happy if some of you could contribute with testing.


Using dbpoweramp batch converter, I'm recreating my 120,000 track mp3 library from my source flacs using -V0+.  It will take a few days.  I guess this is a test in it's own.


Sure, I'll probably have to repeat the process in a few weeks.  I wasn't completely happy with my 3.99.5 -V0 files so I don't mind allocating the processing power to the task even if it's temporary.

lame3100h, a functional extension
Reply #40
It is to bad the SSE compiled version doesn't seem to create the same results.  The non SSE version is slow as molasses in comparison.

  • halb27
  • [*][*][*][*][*]
lame3100h, a functional extension
Reply #41
Thanks for testing. But wouldn't it be a good idea to first test those tracks you were not totally happy with when using 3.99.5 -V0 before going to encode a huge collection with this prerelease version?

The SSE2 result must be necessarily different from the non-SSE2 one due to the different arithmetics. Not bad per se, but as I wrote my experience isn't a good one.
At least on my system the non-SSE2 version is not very much slower, something like ~145x (non-SSE2) against ~190x (SSE2) according to foobar.
  • Last Edit: 05 February, 2013, 04:25:43 PM by halb27
lame3995o -Q1

lame3100h, a functional extension
Reply #42
Thanks for testing. But wouldn't it be a good idea to first test those tracks you were not totally happy with when using 3.99.5 -V0 before going to encode a huge collection with this prerelease version?

The SSE2 result must be necessarily different from the non-SSE2 one due to the different arithmetics. Not bad per se, but as I wrote my experience isn't a good one.
At least on my system the non-SSE2 version is not very much slower, something like ~145x (non-SSE2) against ~190x (SSE2) according to foobar.


I did do some comparisons with tracks I'm vary familiar with.  Not scientific by any means but I liked what I heard or maybe thought I heard. My general observations don't really fit into the scientific nature of this forum.


Your right the difference isn't that large between your i versions.  I don't know what version I was comparing it to, I've been tossing in different lame.exe files left and right this AM.  That said a 33% increase in performance does add up when dealing with my volume of tracks.

  • IgorC
  • [*][*][*][*][*]
lame3100h, a functional extension
Reply #43
Have tried SSE2 and non SSE2 version at V3+ and couldn't hear any difference between them on eig_essence sample.

  • halb27
  • [*][*][*][*][*]
lame3100h, a functional extension
Reply #44
I heard the difference at second ~3.0, and I used -V0+ or -V0.7+ (don't remember exactly and can't redo the test at the moment because I would disturb my wife).
lame3995o -Q1

  • halb27
  • [*][*][*][*][*]
lame3100h, a functional extension
Reply #45
I just retried using my favorite setting -V0.7+, and around sec. 3.0 the difference is very audible to me.
lame3995o -Q1

  • Kamedo2
  • [*][*][*][*]
lame3100h, a functional extension
Reply #46
I encoded all of my 25 samples and test albums using both sse2_enabled and non-sse 3100i encoder.
Non-sse version had 0.017% more accurate equalized waveform, using 0.0005% more bitrates.
The sample that yielded the most accuracy difference was 15. davinci(speech), with non-sse was 0.25% more accurate.
The sample that yielded the most bitrate difference was FloorEssence(techno), with non-sse was 0.03% bigger.

I used V2+. This is a very superficial test, which counts only overall fidelity to the original and filesize.
So even if the overall difference is only 0.25%, it can have many audible effects.


  • Kamedo2
  • [*][*][*][*]
lame3100h, a functional extension
Reply #47
halb27, would you please get rid of the audible difference between the standard and sse2-enabled encoder?
It should be nice if the two outputs almost the same results, because no two separate listening tests are needed to assess its quality.

  • halb27
  • [*][*][*][*][*]
lame3100h, a functional extension
Reply #48
Sorry, I have no idea how to do that. I rather consider the SSE2 version having been worth a try, but IMO being not worth to be used.
lame3995o -Q1

  • IgorC
  • [*][*][*][*][*]
lame3100h, a functional extension
Reply #49
You're right, something is wrong with SSE2 version.
Some strong pre- or post-echo artifact at approx. 2.8-3.0 second.


Code: [Select]
ABC/HR Version 1.1 beta 2, 18 June 2004
Testname:

1R = D:\Audio\3100i\eis essense\eig_essence LAME V0.7+.wav
2L = D:\Audio\3100i\eis essense\eig_essence LAME V0.7+ SSE2.wav

---------------------------------------
General Comments:

---------------------------------------
1R File: D:\Audio\3100i\eis essense\eig_essence LAME V0.7+.wav
1R Rating: 4.0
1R Comment:
---------------------------------------
2L File: D:\Audio\3100i\eis essense\eig_essence LAME V0.7+ SSE2.wav
2L Rating: 3.0
2L Comment:
---------------------------------------
ABX Results:
D:\Audio\3100i\eis essense\eig_essence LAME V0.7+.wav vs D:\Audio\3100i\eis essense\eig_essence LAME V0.7+ SSE2.wav
   5 out of 5, pval = 0.031

In my opinion, people who use your encoder  first of all care about quality and then speed. So if SSE2 causes some
precision loss then it might be worth to drop this optimization. Though it's quite strange that it causes such distortion  because untill now there is a common belief that different compilers shouldn't produce noticeable audible differences.

I get a miserable 6x speed (non-SSE2 3.100i) on my Atom based netbook with all 4 threads fully loaded  , but it's not an issue at all.
  • Last Edit: 08 February, 2013, 10:05:23 PM by IgorC