Hey,
I compiled Lame 3.99.5 with Intel Libs... a little faster for me then John33"s build.
Let me know how it works for you.
Download (http://www.mediafire.com/download/kyzcpgpv4m42119/Lame_3.99.5_Intel.zip)
I don't use MP3 but I performed a quick test. 8 Radiohead albums went from ~177x ~2.10 mins to ~300x ~1.30 mins. Nice build thanks.
WareWares used Intel 12.1 for LAME 3.99.5 and 14.0.1 for LAME 3.100a2, what did you use?
I don't use MP3 but I performed a quick test. 8 Radiohead albums went from ~177x ~2.10 mins to ~300x ~1.30 mins. Nice build thanks.
WareWares used Intel 12.1 for LAME 3.99.5 and 14.0.1 for LAME 3.100a2, what did you use?
I used Intel 13.1.3
I also compiled the latest Lame 3.100a2 snapshot
Download (http://www.mediafire.com/download/h9em6c5brd04run/Lame_3.100-a2_Intel.zip)
Although the Intel compiler is extremely good, thats still a very impressive speed up.
Have you looked at the vecreport? I wonder if it could be further improved on systems with SSE4 or AVX(2).
Hello everybody,
i have also done tests like these but compiled the lame 3.99.5 source with MinGW/GCC - please also
see my [a href='index.php?act=findpost&pid=859779'] post[/a] for some tests with different systems during my oggenc trials.
The compiler options used are carefully modifed for generic platforms with SSE2.
I could not reach notable improvements by adding newer instruction sets like SSE4x or AVX.
Comparing these three builds (-V2 switch on Mobile Core i7-2720M) i got the following results:
~ 48,9x for the icl 12.1 RareWares build by John33
~ 53,2x for the icl 13.1 build by Fishman0919
~ 56,3x for the GCC 4.8.2 build by myself
Maybe someone could try my GCC (https://onedrive.live.com/redir?resid=B6FB8967FA31DDBE!108&authkey=!AAE08HTG6uzH2Do&ithint=file%2c.zip) version also on other CPU's and see if it scales comparable.
Intel i7-4770
Lame -V 0 -q 0 test.wav nul
Lame_3.100-a2_Intel - 55.468x
Lame_3.99.5_Intel - 62.740x
lame3.100.a2 - 46.574x
lame3.100.a2-64 - 58.871x
lame3.99.5 - 56.359x
lame3.99.5-64 - 62.946x
lame3.99.5-libsndfile - 57.060x
lame3.99.5-libsndfile-64 - 63.397x
lame_3.99.5_gcc482_145dBSPL.zip - 70.775x
RareWares - John33's Build
My Build
145dBSPL Build
(http://i467.photobucket.com/albums/rr37/fishman0919/lametest_zps2626c2a9.jpg) (http://s467.photobucket.com/user/fishman0919/media/lametest_zps2626c2a9.jpg.html)
AltiVec optimized build is the fastest one to me:
http://tmkk.undo.jp/lame/index_e.html (http://tmkk.undo.jp/lame/index_e.html)
But the max difference between the slowest build and AltiVec is about 5 seconds of encoding time for a 453 MB WAV input file, so really I can live with it
145dBSPL = ~1.20 mins ~310x
AltiVec optimized build x86-32 = ~1.15 mins ~330x
AltiVec optimized build x86-64 = ~1.10 mins ~340x
Please mind that the AltiVec optimized build relies on modified source code specially tuned for intel systems (only?).
My approach was to set up a fast generic build that is not crippled by compiler limitations like the ICL does on
non-intel systems. I'm glad to see that the GCC build is even faster on Intel CPUs.
Thank you all for testing.
gogo - no - coda 3.13
Core I5-2450M (!) -> 580x
ahaha
@145dBSPL:
Sorry for n00b question; just out of curiosity: What do I need the included DLL (libgcc_s_sjlj-1.dll) for?
Depends.exe doesn't reveal any dependencies to that library...
just experiment (not stable): Clang 3.5 (snapshot)/clang-cl & MSVC 2010 SP 1 + libmmt.lib (v12.0) +asmlib + libcmt.lib (MSVC 2008 Express/win2000 support)
http://www.mediafire.com/download/3rq4zcpf...intel_win32.zip (http://www.mediafire.com/download/3rq4zcpfcvz7vca/lame_3.99.5_clang_intel_win32.zip)
@145dBSPL:
Sorry for n00b question; just out of curiosity: What do I need the included DLL (libgcc_s_sjlj-1.dll) for?
Depends.exe doesn't reveal any dependencies to that library...
I suppose this file, which is part of the MinGW package, does the C++ exception handling.
The MinGW64 page states this:
C++ Exception Model: use SEH when available, SJLJ otherwise and avoid Dwarf2:
SJLJ: slower but available for every architecture.
SEH: fastest but limited to 64-bit because of a patent.
Dwarf2: faster than SJLJ and usually on par with SEH but has known limitations and bugs; avoid unless you're aware of all of them.
I put it to the ZIP because some of the other things i compiled require this.
In case of lame there seems to be no dependency - you are right.
On an i7 3770k:
Rarewares 64 bit:
F:\Testdir>lame -V 0 12.wav 12.mp3
LAME 3.99.5 64bits (http://lame.sf.net)
CPU features: SSE (ASM used), SSE2
polyphase lowpass filter disabled
Encoding 12.wav to 12.mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=0)
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
10578/10578 (100%)| 0:04/ 0:04| 0:04/ 0:04| 67.462x| 0:00
32 [ 151] ***
40 [ 1] *
48 [ 2] %
56 [ 1] %
64 [ 1] %
80 [ 0]
96 [ 2] %
112 [ 1] %
128 [ 1] %
160 [ 5] %
192 [ 49] %
224 [ 2688] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*******
256 [ 4951] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*********************
320 [ 2725] %%%%%%%%%%%%%%%%%%%%%%%%%************
-------------------------------------------------------------------------------
kbps LR MS % long switch short %
260.7 70.0 30.0 87.3 7.0 5.7
Writing LAME Tag...done
ReplayGain: 0.0dB
Intel 14.0.3 64 bit:
F:\Testdir>lame -V 0 12.wav 12.mp3
LAME 3.99.5 64bits (http://lame.sf.net)
CPU features: SSE (ASM used), SSE2
polyphase lowpass filter disabled
Encoding 12.wav to 12.mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=0)
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
10578/10578 (100%)| 0:03/ 0:03| 0:03/ 0:03| 71.531x| 0:00
32 [ 151] ***
40 [ 1] *
48 [ 2] %
56 [ 1] %
64 [ 1] %
80 [ 0]
96 [ 2] %
112 [ 1] %
128 [ 1] %
160 [ 5] %
192 [ 49] %
224 [ 2688] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*******
256 [ 4951] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*********************
320 [ 2725] %%%%%%%%%%%%%%%%%%%%%%%%%************
-------------------------------------------------------------------------------
kbps LR MS % long switch short %
260.7 70.0 30.0 87.3 7.0 5.7
Writing LAME Tag...done
ReplayGain: 0.0dB
Rarewares 32 bit
F:\Testdir>lame -V 0 12.wav 12.mp3
LAME 3.99.5 32bits (http://lame.sf.net)
CPU features: MMX (ASM used), SSE (ASM used), SSE2
polyphase lowpass filter disabled
Encoding 12.wav to 12.mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=0)
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
10578/10578 (100%)| 0:04/ 0:04| 0:04/ 0:04| 62.758x| 0:00
32 [ 151] ***
40 [ 1] *
48 [ 2] %
56 [ 1] %
64 [ 1] %
80 [ 0]
96 [ 2] %
112 [ 0]
128 [ 3] %
160 [ 4] %
192 [ 44] %
224 [ 2688] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*******
256 [ 4959] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*********************
320 [ 2722] %%%%%%%%%%%%%%%%%%%%%%%%%************
-------------------------------------------------------------------------------
kbps LR MS % long switch short %
260.7 70.0 30.0 87.3 7.0 5.7
Writing LAME Tag...done
ReplayGain: 0.0dB
Intel 14.0.3 32 bit:
F:\Testdir>lame -V 0 12.wav 12.mp3
LAME 3.99.5 32bits (http://lame.sf.net)
CPU features: MMX (ASM used), SSE (ASM used), SSE2
polyphase lowpass filter disabled
Encoding 12.wav to 12.mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=0)
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
10578/10578 (100%)| 0:04/ 0:04| 0:04/ 0:04| 64.562x| 0:00
32 [ 151] ***
40 [ 1] *
48 [ 2] %
56 [ 1] %
64 [ 1] %
80 [ 0]
96 [ 2] %
112 [ 0]
128 [ 3] %
160 [ 4] %
192 [ 44] %
224 [ 2688] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*******
256 [ 4959] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*********************
320 [ 2722] %%%%%%%%%%%%%%%%%%%%%%%%%************
-------------------------------------------------------------------------------
kbps LR MS % long switch short %
260.7 70.0 30.0 87.3 7.0 5.7
Writing LAME Tag...done
ReplayGain: 0.0dB
I'll post the new version at Rarewares.
Maybe someone could try my GCC (https://onedrive.live.com/redir?resid=B6FB8967FA31DDBE!108&authkey=!AAE08HTG6uzH2Do&ithint=file%2c.zip) version also on other CPU's and see if it scales comparable.
Do you have a x64 build?
@ Sparktank: No, i don't have a x86_64 build. I am sorry.
I wonder why the new RareWares ICL14 build still can not compete with the GCC one.
Neither on an i7 Sandybridge nor on an AMD Piledriver (which i expected).
On an i7-2620M:
~ 48,7 ICL 12 (RareWares)
~ 50,6 ICL 14 (RareWares)
~ 57,4 GCC 4.7.3 (My own SSE2)
As the ICL usually optimizes very high for its own CPUs - what could be the reason?
The only thing i have done, is finding, modifying and testing the most important and secure CFLAGS
alone and in combination to ensure they speed up encoding without producing any negative effects.
I think it is right that speed doesn't really matter today but such differences should be worth to investigate.
In the very beginning i was happy as the encoding speed increased from 1,4x to 2,1x ...
For tesing issues: GCC 4.7.3 (https://onedrive.live.com/redir?resid=B6FB8967FA31DDBE%21110) SSE2 and AVX build (only slightly faster).
@ Farch: Just for fun - Gogo 3.13a (https://onedrive.live.com/redir?resid=B6FB8967FA31DDBE%21109) has also seen a huge speed increase over the ICL 4.5 build.
The Rarewares win32 builds are generic without specific processor optimisations, apart from the nasm code, so they will run on most systems.
That was my objective too. The builds are not optimized for a specific platform and should run on any SSE2 capable processor.
I wonder why the new RareWares ICL14 build still can not compete with the GCC one.
I think you overlooked this post: http://www.hydrogenaud.io/forums/index.php...st&p=866291 (http://www.hydrogenaud.io/forums/index.php?showtopic=105832&view=findpost&p=866291)
You will see that the number of frames of each size used by the GCC build and the others is different.
This suggests that both compiles uses different floating point rounding modes.
You will see that the number of frames of each size used by the GCC build and the others is different.
This suggests that both compiles uses different floating point rounding modes.
So do/did the Rarewares compared to ICL14 builds: http://www.hydrogenaud.io/forums/index.php...st&p=866356 (http://www.hydrogenaud.io/forums/index.php?s=&showtopic=105832&view=findpost&p=866356)
As far as i remember this has been discussed here long time ago and is nothing to worry about. Each compiler shows a slightly
different behaviour if some kind of fast math instead of strict math is used. The fast math flag is also set by default in the LAME
source code. This should not be audible by any means.
It has always been the case that GCC builds emit bit different output from the Intel builds, and similarly, 32 bit and 64 bit builds. However, to the best of my knowledge, no one has yet been able to ABX or claim to hear any differences in the different encoded results.
@sundance: No, they do not. 32bit builds have different values than 64bit builds, but both 32, and both 64 have the same numbers.
@145dBSPL: I understand that, but since we're talking about speed differences, it is important to note that if the mathematical result is different, it is expected that the calculations are not the same.
In that context is where I wanted to make the point on that the GCC build might be doing things differently.
@john33: Sure, I didn't want to imply that the differences would cause audible issues. As said above, the difference in speed might come from the different math used. I remember making a pretty small program (sort of a cheap compression algorithm), that "zipped" a file in 12 seconds with a GCC 32bit build, and required 20 seconds with a msvc 32bit one.
@sundance: No, they do not. 32bit builds have different values than 64bit builds, but both 32, and both 64 have the same numbers.
@145dBSPL: I understand that, but since we're talking about speed differences, it is important to note that if the mathematical result is different, it is expected that the calculations are not the same.
In that context is where I wanted to make the point on that the GCC build might be doing things differently.
@john33: Sure, I didn't want to imply that the differences would cause audible issues. As said above, the difference in speed might come from the different math used. I remember making a pretty small program (sort of a cheap compression algorithm), that "zipped" a file in 12 seconds with a GCC 32bit build, and required 20 seconds with a msvc 32bit one.
@[JAZ]: Different compilers are sometime going to produce slightly different results... sometimes. But it's not always "different math". These results below are from my AMD machine... if I run them on my Intel PC, I get different results.
And like john33 said, no one can or can say they can ABX the difference between them.
If you notice the MSVS 2013 and GCC build produce the same results... also the same as my Intel 13.1.3 build and john33's from my above post.
Not sure why my Intel 14.0.3 is slower and different results.
Intel 14.0.3 / MSVS 2013
MinGW / GCC
(http://i467.photobucket.com/albums/rr37/fishman0919/Lametests_zps2598beac.jpg) (http://s467.photobucket.com/user/fishman0919/media/Lametests_zps2598beac.jpg.html)
my intel build
http://www.mediafire.com/download/n31ij9qd...intel_win32.zip (http://www.mediafire.com/download/n31ij9qddlkllu0/lame_3.99.5_intel_win32.zip)
intel 10.1 & MSVC 2008 Express + libc.lib (libirc.lib v9.1) + asmlib + libmmt.lib, libirc.lib, svml_disp.lib etc (v14)
my intel build
http://www.mediafire.com/download/n31ij9qd...intel_win32.zip (http://www.mediafire.com/download/n31ij9qddlkllu0/lame_3.99.5_intel_win32.zip)
intel 10.1 & MSVC 2008 Express + libc.lib (libirc.lib v9.1) + asmlib + libmmt.lib, libirc.lib, svml_disp.lib etc (v14)
So you use icl 10.1?? why?? whats its benefits??
my intel build
http://www.mediafire.com/download/n31ij9qd...intel_win32.zip (http://www.mediafire.com/download/n31ij9qddlkllu0/lame_3.99.5_intel_win32.zip)
intel 10.1 & MSVC 2008 Express + libc.lib (libirc.lib v9.1) + asmlib + libmmt.lib, libirc.lib, svml_disp.lib etc (v14)
So you use icl 10.1?? why?? whats its benefits??
Intel 10.1 more compatible
here is another build (more faster as far as)
http://www.mediafire.com/download/7rhxvwri...tel14_win32.zip (http://www.mediafire.com/download/7rhxvwri1tawbn7/lame_3.99.5_intel14_win32.zip)
here is LAME with ICL 15 Lib (not tested, should more faster)
http://www.mediafire.com/download/b4o65yxd...tel15_win32.zip (http://www.mediafire.com/download/b4o65yxdxla4h42/lame_3.99.5_intel15_win32.zip)
here is LAME with ICL 15 Lib (not tested, should more faster)
http://www.mediafire.com/download/b4o65yxd...tel15_win32.zip (http://www.mediafire.com/download/b4o65yxdxla4h42/lame_3.99.5_intel15_win32.zip)
Whats about quality? Is audio quality consistent?
here is LAME with ICL 15 Lib (not tested, should more faster)
http://www.mediafire.com/download/b4o65yxd...tel15_win32.zip (http://www.mediafire.com/download/b4o65yxdxla4h42/lame_3.99.5_intel15_win32.zip)
Whats about quality? Is audio quality consistent?
Assuming it compiled ok, it should be the same.
here is LAME with ICL 15 Lib (not tested, should more faster)
http://www.mediafire.com/download/b4o65yxd...tel15_win32.zip (http://www.mediafire.com/download/b4o65yxdxla4h42/lame_3.99.5_intel15_win32.zip)
Whats about quality? Is audio quality consistent?
Assuming it compiled ok, it should be the same.
Yes, ICL not always good, i.e. can generate "a broken code" on takehiro.obj/psymodel.obj/vbrquantize.obj as far as (if some "flag" enabled) like a GCC 4.7.x bug
here is "broken psymodel" LAME with ICL 15 Lib (with /QIfist to enable fast float to int conversion)
http://www.mediafire.com/download/e9u8m73d...l15_win32_2.zip (http://www.mediafire.com/download/e9u8m73dn0vq5a6/lame_3.99.5_intel15_win32_2.zip)
is it necessary to install intel C++ compiler to get better result??
is it necessary to install intel C++ compiler to get better result??
You need the Intel c++ compiler if you want to compile with the Intel C++ compiler. Otherwise, you don't need it.
Wow... I think something may have gone wrong with this build. Wish the output play correct.
(http://i467.photobucket.com/albums/rr37/fishman0919/FastMP3Encoding_zps2ce8b3e3.jpg) (http://s467.photobucket.com/user/fishman0919/media/FastMP3Encoding_zps2ce8b3e3.jpg.html)
Hi all,
I am using the libmp3lame.dll version 3.99.5 available at rarewares in all my systems but my old laptop with windows xp and without sse2 registers.
In order to use the programs I created in this laptop I had to download the lame_enc.dll version 3.99.3 from the Buanco site (has the same symbols than the one from rarewares).
So, now, at runtime I have to decide which of the 2 dlls to use.
I am not worried with deciding which dll to use at runtime, but I am worried with using different versions of LAME.
Is there a chance that someone can compile the version 3.99.5 with an old compiler that does not require sse2 and can be used in windows xp ?
I have also downloaded the version 3.100 from Buanzo's site but I couls not use it in any of my 2 computers, probably because it even requires more recent registers. Is there a chance that someone compiles it for the following 2 configs: 1) the same config as the dll from rarewares; and 2) a config where sse2 registers are not available ?
If no one can compile the 3.100 version as requested, it would be great if, at least, someone compiled the version 3.99.5 for a setting where sse2 registers are not required....
best regards,
Hi,
Indeed, I have been verifying and my cpu does not have the sse1 registers... so, if someone compiles the version 3.99.5 for me, I will have to ask to not include the sse1 registers.
best regards,
I'll put a 3.100 compile up later today that should work as required. :) I'll post again when it's uploaded to Rarewares.
Hi,
That would be really great! Many thanks!
Just remember that what I need is the lame_enc.dll with the same symbols of your libmp3lame.dll; for the 2 settings...
best regards,
Hi,
That would be really great! Many thanks!
Just remember that what I need is the lame_enc.dll with the same symbols of your libmp3lame.dll; for the 2 settings...
best regards,
I assume that you need the same dll that works with Audacity? If so, this one works with the latest version of Audacity and should work on XP. Hopefully it will work for you, but let me know! ;)
lame_enc.dll 3.100 for Audacity (http://www.rarewares.org/files/mp3/lame_enc-3.100-Audacity.zip)
Hi,
Indeed, I have been verifying and my cpu does not have the sse1 registers... so, if someone compiles the version 3.99.5 for me, I will have to ask to not include the sse1 registers.
best regards,
The SSE registers launched 20 years ago this February. Do you really have a CPU that completely lacks them?
Hi,
Thanks for the dl, I will test it now.
About the SSE registers, the computer I am using to write this message lacks them; it was acquired on 2001! My other computer has been acquired on 2007!
I have done a program to batch convert wav and mp3 files https://en.stateful-audio-converter.frdafonseca.com and I have found the mentioned problem.
As I know several people that is till using old computers I pretend to make the program compatible with old cpus...
best regards,
Hi,
I have looked at the symbols exported by the provided dll and it lacks some which I need, like the hip_* ones, which are available on your libmp3lame.dll.
Could you please recompile ? And could you provide one version for cpus without sse and one for cpus with them (corresponding to the libmp3lame one you provide on your site) ?
I pretend to make my program compatible with old cpus but at the same time, I pretend to make it as fast as possible on the said cpus... so I will be testing whether some registers exist and, then, load the correct dll.
best regards,
OK, I'll see what I can do. I'll get back here when I have something for you. :)
Hi,
I have looked at the symbols exported by the provided dll and it lacks some which I need, like the hip_* ones, which are available on your libmp3lame.dll.
Could you please recompile ? And could you provide one version for cpus without sse and one for cpus with them (corresponding to the libmp3lame one you provide on your site) ?
I pretend to make my program compatible with old cpus but at the same time, I pretend to make it as fast as possible on the said cpus... so I will be testing whether some registers exist and, then, load the correct dll.
best regards,
lame_enc.dll without CPU optimisation (http://www.rarewares.org/files/mp3/lame_enc-3.100-Audacity.zip)
lame_enc.dll with NASM optimisation (http://www.rarewares.org/files/mp3/lame_enc-3.100-NASM-Audacity.zip)
Both contain the library and exports and include all the symbols from the libmp3lame dll as well as the lame_enc symbols.
The first compile is 'plain vanilla' without any NASM or other CPU optimisations, the second is a standard NASM build. Both should run on XP and both work with the current Audacity. Let me know how it goes, please. :)
Hi,
Many thanks! They both work on my old xp laptop.
I was expecting one of them not to work, however!
Can you tel me what are the different registers the dlls use, so that I can choose when to use one and when to use the other one ?
best regards,
Probably the operating system is swalling the opcode not found exception (for the dll with optimizations) and providing alternative code for it...
best regards,
The one compiled without NASM has no CPU optimisations at all. The one compiled with NASM identifes the CPU and implements optimisations as appropriate:
"assembler routines to detect CPU-features
;
; MMX / 3DNow! / SSE / SSE2"
So, the second one should work even if none of those CPU features is present, which kind of makes the first one redundant!
Once more, many thanks!
best regards,
You're welcome. ;)
It is a pitty you do not provide ffmpeg compiles, because I am having the same problem with ffmpeg in what regards this program: https://en.slideshow-creator.frdafonseca.com .
This is, I have to use a very old version of ffmpeg for both windows xp and vista and other for the remaining... I believe the problem, once more, has to do with sse registers.
best regards,
There are some relatively recent ffmpeg compiles around that claim XP compatibility, but certainly one I looked at requires SSE. Unfortunately it looks like it is quite a lot of work to put together as apparently some of the included libraries no longer have XP compatibility. I'm not making any promises about this one, I'm afraid!! ;)
I was not expecting you to try, as your site only has compiles for audio... if you can do it, then it would be really great!
If you do it, then great. Otherwise, I will just keep my configuration.
Thanks in advance for the time taken!
best regards,