Skip to main content
Topic: Lame 3.99.5 Intel Lib Compile (Read 15240 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Lame 3.99.5 Intel Lib Compile

Hey,

I compiled Lame 3.99.5 with Intel Libs... a little faster for me then John33"s build.

Let me know how it works for you.

Download

Lame 3.99.5 Intel Lib Compile

Reply #1
I don't use MP3 but I performed a quick test. 8 Radiohead albums went from ~177x ~2.10 mins to ~300x ~1.30 mins. Nice build thanks.

WareWares used Intel 12.1 for LAME 3.99.5 and 14.0.1 for LAME 3.100a2, what did you use?

Lame 3.99.5 Intel Lib Compile

Reply #2
I don't use MP3 but I performed a quick test. 8 Radiohead albums went from ~177x ~2.10 mins to ~300x ~1.30 mins. Nice build thanks.

WareWares used Intel 12.1 for LAME 3.99.5 and 14.0.1 for LAME 3.100a2, what did you use?


I used Intel 13.1.3

I also compiled the latest Lame 3.100a2 snapshot

Download

Lame 3.99.5 Intel Lib Compile

Reply #3
Although the Intel compiler is extremely good, thats still a very impressive speed up. 

Have you looked at the vecreport?  I wonder if it could be further improved on systems with SSE4 or AVX(2).

Lame 3.99.5 Intel Lib Compile

Reply #4
Hello everybody,

i have also done tests like these but compiled the lame 3.99.5 source with MinGW/GCC - please also
see my [a href='index.php?act=findpost&pid=859779'] post[/a] for some tests with different systems during my oggenc trials.

The compiler options used are carefully modifed for generic platforms with SSE2.
I could not reach notable improvements by adding newer instruction sets like SSE4x or AVX.

Comparing these three builds (-V2 switch on Mobile Core i7-2720M) i got the following results:

~ 48,9x for the icl 12.1 RareWares build by John33
~ 53,2x for the icl 13.1 build by Fishman0919
~ 56,3x for the GCC 4.8.2 build by myself

Maybe someone could try my GCC version also on other CPU's and see if it scales comparable.

Lame 3.99.5 Intel Lib Compile

Reply #5
Intel i7-4770
Lame -V 0 -q 0 test.wav nul

Lame_3.100-a2_Intel - 55.468x
Lame_3.99.5_Intel - 62.740x
lame3.100.a2 - 46.574x
lame3.100.a2-64 - 58.871x
lame3.99.5 - 56.359x
lame3.99.5-64 - 62.946x
lame3.99.5-libsndfile - 57.060x
lame3.99.5-libsndfile-64 - 63.397x
lame_3.99.5_gcc482_145dBSPL.zip - 70.775x

Lame 3.99.5 Intel Lib Compile

Reply #6
RareWares - John33's Build

My Build

145dBSPL Build



Lame 3.99.5 Intel Lib Compile

Reply #7
AltiVec optimized build is the fastest one to me:
http://tmkk.undo.jp/lame/index_e.html

But the max difference between the slowest build and AltiVec is about 5 seconds of encoding time for a 453 MB WAV input file, so really I can live with it

Lame 3.99.5 Intel Lib Compile

Reply #8
145dBSPL = ~1.20 mins ~310x

AltiVec optimized build x86-32 = ~1.15 mins ~330x

AltiVec optimized build x86-64 = ~1.10 mins ~340x

Lame 3.99.5 Intel Lib Compile

Reply #9
Please mind that the AltiVec optimized build relies on modified source code specially tuned for intel systems (only?).

My approach was to set up a fast generic build that is not crippled by compiler limitations like the ICL does on
non-intel systems. I'm glad to see that the GCC build is even faster on Intel CPUs.

Thank you all for testing.

Lame 3.99.5 Intel Lib Compile

Reply #10
gogo - no - coda 3.13
Core I5-2450M (!)  -> 580x
ahaha

Lame 3.99.5 Intel Lib Compile

Reply #11
@145dBSPL:
Sorry for n00b question; just out of curiosity: What do I need the included DLL (libgcc_s_sjlj-1.dll) for?
Depends.exe doesn't reveal any dependencies to that library...


Lame 3.99.5 Intel Lib Compile

Reply #13
@145dBSPL:
Sorry for n00b question; just out of curiosity: What do I need the included DLL (libgcc_s_sjlj-1.dll) for?
Depends.exe doesn't reveal any dependencies to that library...

I suppose this file, which is part of the MinGW package, does the C++ exception handling.

The MinGW64 page states this:
C++ Exception Model: use SEH when available, SJLJ otherwise and avoid Dwarf2:
    SJLJ: slower but available for every architecture.
    SEH: fastest but limited to 64-bit because of a patent.
    Dwarf2: faster than SJLJ and usually on par with SEH but has known limitations and bugs; avoid unless you're aware of all of them.

I put it to the ZIP because some of the other things i compiled require this.
In case of lame there seems to be no dependency - you are right.

Lame 3.99.5 Intel Lib Compile

Reply #14
On an i7 3770k:

Code: [Select]
Rarewares 64 bit:

F:\Testdir>lame -V 0 12.wav 12.mp3
LAME 3.99.5 64bits (http://lame.sf.net)
CPU features: SSE (ASM used), SSE2
polyphase lowpass filter disabled
Encoding 12.wav to 12.mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=0)
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
10578/10578 (100%)|    0:04/    0:04|    0:04/    0:04|   67.462x|    0:00
32 [  151] ***
40 [    1] *
48 [    2] %
56 [    1] %
64 [    1] %
80 [    0]
96 [    2] %
112 [    1] %
128 [    1] %
160 [    5] %
192 [   49] %
224 [ 2688] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*******
256 [ 4951] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*********************
320 [ 2725] %%%%%%%%%%%%%%%%%%%%%%%%%************
-------------------------------------------------------------------------------
   kbps        LR    MS  %     long switch short %
  260.7       70.0  30.0        87.3   7.0   5.7
Writing LAME Tag...done
ReplayGain: 0.0dB

Intel 14.0.3 64 bit:

F:\Testdir>lame -V 0 12.wav 12.mp3
LAME 3.99.5 64bits (http://lame.sf.net)
CPU features: SSE (ASM used), SSE2
polyphase lowpass filter disabled
Encoding 12.wav to 12.mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=0)
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
10578/10578 (100%)|    0:03/    0:03|    0:03/    0:03|   71.531x|    0:00
32 [  151] ***
40 [    1] *
48 [    2] %
56 [    1] %
64 [    1] %
80 [    0]
96 [    2] %
112 [    1] %
128 [    1] %
160 [    5] %
192 [   49] %
224 [ 2688] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*******
256 [ 4951] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*********************
320 [ 2725] %%%%%%%%%%%%%%%%%%%%%%%%%************
-------------------------------------------------------------------------------
   kbps        LR    MS  %     long switch short %
  260.7       70.0  30.0        87.3   7.0   5.7
Writing LAME Tag...done
ReplayGain: 0.0dB

Rarewares 32 bit

F:\Testdir>lame -V 0 12.wav 12.mp3
LAME 3.99.5 32bits (http://lame.sf.net)
CPU features: MMX (ASM used), SSE (ASM used), SSE2
polyphase lowpass filter disabled
Encoding 12.wav to 12.mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=0)
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
10578/10578 (100%)|    0:04/    0:04|    0:04/    0:04|   62.758x|    0:00
32 [  151] ***
40 [    1] *
48 [    2] %
56 [    1] %
64 [    1] %
80 [    0]
96 [    2] %
112 [    0]
128 [    3] %
160 [    4] %
192 [   44] %
224 [ 2688] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*******
256 [ 4959] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*********************
320 [ 2722] %%%%%%%%%%%%%%%%%%%%%%%%%************
-------------------------------------------------------------------------------
   kbps        LR    MS  %     long switch short %
  260.7       70.0  30.0        87.3   7.0   5.7
Writing LAME Tag...done
ReplayGain: 0.0dB

Intel 14.0.3 32 bit:

F:\Testdir>lame -V 0 12.wav 12.mp3
LAME 3.99.5 32bits (http://lame.sf.net)
CPU features: MMX (ASM used), SSE (ASM used), SSE2
polyphase lowpass filter disabled
Encoding 12.wav to 12.mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=0)
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
10578/10578 (100%)|    0:04/    0:04|    0:04/    0:04|   64.562x|    0:00
32 [  151] ***
40 [    1] *
48 [    2] %
56 [    1] %
64 [    1] %
80 [    0]
96 [    2] %
112 [    0]
128 [    3] %
160 [    4] %
192 [   44] %
224 [ 2688] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*******
256 [ 4959] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*********************
320 [ 2722] %%%%%%%%%%%%%%%%%%%%%%%%%************
-------------------------------------------------------------------------------
   kbps        LR    MS  %     long switch short %
  260.7       70.0  30.0        87.3   7.0   5.7
Writing LAME Tag...done
ReplayGain: 0.0dB

I'll post the new version at Rarewares.
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/


Lame 3.99.5 Intel Lib Compile

Reply #16
@ Sparktank: No, i don't have a x86_64 build. I am sorry.

I wonder why the new RareWares ICL14 build still can not compete with the GCC one.
Neither on an i7 Sandybridge nor on an AMD Piledriver (which i expected).

On an i7-2620M:
~ 48,7 ICL 12 (RareWares)
~ 50,6 ICL 14 (RareWares)
~ 57,4 GCC 4.7.3 (My own SSE2)

As the ICL usually optimizes very high for its own CPUs - what could be the reason?
The only thing i have done, is finding, modifying and testing the most important and secure CFLAGS
alone and in combination to ensure they speed up encoding without producing any negative effects.

I think it is right that speed doesn't really matter today but such differences should be worth to investigate.
In the very beginning i was happy as the encoding speed increased from 1,4x to 2,1x ...

For tesing issues: GCC 4.7.3 SSE2 and AVX build (only slightly faster).

@ Farch: Just for fun - Gogo 3.13a has also seen a huge speed increase over the ICL 4.5 build.

Lame 3.99.5 Intel Lib Compile

Reply #17
The Rarewares win32 builds are generic without specific processor optimisations, apart from the nasm code, so they will run on most systems.
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/

Lame 3.99.5 Intel Lib Compile

Reply #18
That was my objective too. The builds are not optimized for a specific platform and should run on any SSE2 capable processor.

Lame 3.99.5 Intel Lib Compile

Reply #19
I wonder why the new RareWares ICL14 build still can not compete with the GCC one.


I think you overlooked this post:  http://www.hydrogenaud.io/forums/index.php...st&p=866291

You will see that the number of frames of each size used by the GCC build and the others is different.

This suggests that both compiles uses different floating point rounding modes.

Lame 3.99.5 Intel Lib Compile

Reply #20
Quote
You will see that the number of frames of each size used by the GCC build and the others is different.
This suggests that both compiles uses different floating point rounding modes.

So do/did the Rarewares compared to ICL14 builds: http://www.hydrogenaud.io/forums/index.php...st&p=866356

Lame 3.99.5 Intel Lib Compile

Reply #21
As far as i remember this has been discussed here long time ago and is nothing to worry about. Each compiler shows a slightly
different behaviour if some kind of fast math instead of strict math is used. The fast math flag is also set by default in the LAME
source code. This should not be audible by any means.

Lame 3.99.5 Intel Lib Compile

Reply #22
It has always been the case that GCC builds emit bit different output from the Intel builds, and similarly, 32 bit and 64 bit builds. However, to the best of my knowledge, no one has yet been able to ABX or claim to hear any differences in the different encoded results.
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/

Lame 3.99.5 Intel Lib Compile

Reply #23
@sundance: No, they do not. 32bit builds have different values than 64bit builds, but both 32, and both 64 have the same numbers.

@145dBSPL: I understand that, but since we're talking about speed differences, it is important to note that if the mathematical result is different, it is expected that the calculations are not the same.
In that context is where I wanted to make the point on that the GCC build might be doing things differently.

@john33: Sure, I didn't want to imply that the differences would cause audible issues. As said above, the difference in speed might come from the different math used. I remember making a pretty small program (sort of a cheap compression algorithm), that "zipped" a file in 12 seconds with a GCC 32bit build, and required 20 seconds with a msvc 32bit one.

Lame 3.99.5 Intel Lib Compile

Reply #24
[quote author=[JAZ] link=msg=866698 date=1401305597]
@sundance: No, they do not. 32bit builds have different values than 64bit builds, but both 32, and both 64 have the same numbers.

@145dBSPL: I understand that, but since we're talking about speed differences, it is important to note that if the mathematical result is different, it is expected that the calculations are not the same.
In that context is where I wanted to make the point on that the GCC build might be doing things differently.

@john33: Sure, I didn't want to imply that the differences would cause audible issues. As said above, the difference in speed might come from the different math used. I remember making a pretty small program (sort of a cheap compression algorithm), that "zipped" a file in 12 seconds with a GCC 32bit build, and required 20 seconds with a msvc 32bit one.[/quote]

@[JAZ]: Different compilers are sometime going to produce slightly different results... sometimes. But it's not always "different math". These results below are from my AMD machine... if I run them on my Intel PC, I get different results.
And like john33 said, no one can or can say they can ABX the difference between them.

If you notice the MSVS 2013 and GCC build produce the same results... also the same as my Intel 13.1.3 build and john33's from my above post.
Not sure why my Intel 14.0.3 is slower and different results.

Intel 14.0.3 / MSVS 2013

MinGW / GCC


 
SimplePortal 1.0.0 RC1 © 2008-2018