Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: LAME compiled executable slow! (Read 1003 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

LAME compiled executable slow!

Hi, I just compiled LAME v3.100 (from official sourceforge) with MinGW + MSYS on Win7 32bit and everything was perfect (zero errors and warnings), the configuration I used was:

./configure --prefix=/mingw --enable-expopt=full

and an executable of approximately 500Kb was generated. This one works perfectly, but... I was comparing my executable with the one on rarewares.org (LAME v3.100.1) (Win32) (2020-09-08) and this last one is MUCH FASTER converting WAV to MP3 (when in my exe a conversion takes about 35 seconds, in the rareware exe it takes about 10 seconds. Same PC, same OS, same arch, same test files).

Why this difference? I have compiled with --enable-expopt=full (which supposedly performs various experimental optimizations) but also without this option the result is the same. Also try --disable-shared --enable-static and nothing.

Maybe it is a consequence of compiling with MinGW? because in rarewares it is specified that they used the "Intel 19 Compiler" and also that "The internal mpglib decoding library has been replaced with the libmpg123 decoding library". Is it because of libmpg123? I have compiled the sources of libmpg123 but I don't know how to add it to LAME (there is no such thing as --enable-libmpg123).

Is there another configuration option that improves the conversion speed?

Re: LAME compiled executable slow!

Reply #1
The decoding library would make no difference whatsoever. Is your compile using the nasm code? If not, that would account for most of the difference.

Re: LAME compiled executable slow!

Reply #2
The decoding library would make no difference whatsoever. Is your compile using the nasm code? If not, that would account for most of the difference.
I have compiled using NASM and indeed there was an improvement (from 35 seconds down to 26, but still far from the 8 seconds of the rareware exe). The following is the output of the same WAV converted with my executable vs the one from rarewares. You can see the big speed difference in "play/CPU" (5x vs 16x) and "CPU time/estim":
Code: [Select]
C:\>lame -b 320 IN.wav OUT.mp3
LAME 3.100 32bits (http://lame.sf.net)
CPU features: MMX (ASM used), SSE (ASM used), SSE2
Using polyphase lowpass filter, transition band: 20094 Hz - 20627 Hz
Encoding IN.wav to OUT.mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III (4.4x) 320 kbps qval=3
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
  5624/5624  (100%)|    0:26/    0:26|    0:00/    0:00|   5.5449x|    0:00 h
-------------------------------------------------------------------------------
   kbps        LR    MS  %     long switch short %
  320.0       93.1   6.9        91.0   5.2   3.8
Writing LAME Tag...done
ReplayGain: -3.2dB

C:\>rlame -b 320 IN.wav OUT.mp3
LAME 3.100.1 32bits (https://lame.sourceforge.io)
CPU features: MMX (ASM used), SSE (ASM used), SSE2
Using polyphase lowpass filter, transition band: 20094 Hz - 20627 Hz
Encoding IN.wav to OUT.mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III (4.4x) 320 kbps qval=3
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
  5624/5624  (100%)|    0:08/    0:08|    0:08/    0:08|   16.744x|    0:00
-------------------------------------------------------------------------------
   kbps        LR    MS  %     long switch short %
  320.0       93.1   6.9        91.0   5.2   3.8
Writing LAME Tag...done
ReplayGain: -3.2dB

Re: LAME compiled executable slow!

Reply #3
The remaining difference will be in the compiler. From my own experience, 32 bit MinGW/MSYS compiles are relatively slow in execution. The Rarewares binary was compiled using Intel 19 which certainly in this case, but not all, is clearly faster.

Re: LAME compiled executable slow!

Reply #4
When you say mingw and MSYS, you mean the old mingw and the old msys?   I ask because of https://www.msys2.org/

Re: LAME compiled executable slow!

Reply #5
When you say mingw and MSYS, you mean the old mingw and the old msys?   I ask because of https://www.msys2.org/
I assume that on the basis of the original post, but I have also found 32 bit compiles via msys2 to be on the slow side.

Re: LAME compiled executable slow!

Reply #6
What compiler flags were used?  Adding -O2 or -O3 will help a lot if it's not already included by default.  -march=native might help as well.

Re: LAME compiled executable slow!

Reply #7
Kind of new to building for Windows.  Tested a build with MinGW (GCC 12.2) on Debian 64-bit.  Built for 32-bit, static lib.


 

Re: LAME compiled executable slow!

Reply #8
Ignore the previous build.  It's not 32-bit, it's 64-bit.  These builds might be slightly faster anyway.

Tested on a 1.022GB WAV file.  Testing on Win7 x64 (VM).
Rarewares version =  CPU time: 1:08  Play/CPU: 90.450x
My Build (W32) = CPU time: 1:08  Play/CPU: 90.539x
My Build (W64) = CPU time: 1:05  Play/CPU: 95.403x


Re: LAME compiled executable slow!

Reply #9
What compiler flags were used?  Adding -O2 or -O3 will help a lot if it's not already included by default.  -march=native might help as well.
You hit the nail. I compiled again with those flags and... success!, the -O2 and -O3 optimization flags increase the transcoding speed A LOT (almost as fast as the rarewares binary). I was testing them separately and this is what I got:
* -O2 and -O3 increased speed from 5x (see comments above) to 12x! I found no differences in both flags. The two binaries returned approximately 12.4x.
* -march=native increased speed very slightly (about 13.7x, both with -O2 and -O3). Not significant but any improvement counts.
* Of course, using NASM.
Thanks to all for the help.
Code: [Select]
C:\>lameO3N -b 320 IN.wav OUT.mp3
LAME 3.100 32bits (http://lame.sf.net)
CPU features: MMX (ASM used), SSE (ASM used), SSE2
Using polyphase lowpass filter, transition band: 20094 Hz - 20627 Hz
Encoding IN.wav to OUT.mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III (4.4x) 320 kbps qval=3
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
  5624/5624  (100%)|    0:10/    0:10|    0:10/    0:10|   13.756x|    0:00
-------------------------------------------------------------------------------
   kbps        LR    MS  %     long switch short %
  320.0       93.1   6.9        91.0   5.2   3.8
Writing LAME Tag...done
ReplayGain: -3.2dB
P.D: I couldn't test your binaries. The first one was indeed not 32 bit, and in the most recent ones, the 32bit version (I use Win7 32bit, old PC) throws me the classic ".exe stopped working".

Re: LAME compiled executable slow!

Reply #10
Mind, that compiling with -march=native, the binary is optimized with all instruction sets *your* (compiling) machine supports.
Therefore it might not run on different machines (which support less instruction sets maybe - like e.g. AVX2).

Re: LAME compiled executable slow!

Reply #11
Mind, that compiling with -march=native, the binary is optimized with all instruction sets *your* (compiling) machine supports.
Therefore it might not run on different machines (which support less instruction sets maybe - like e.g. AVX2).
Good point. So, in this case, it is not worth turning it on because the performance gain is very small.

Re: LAME compiled executable slow!

Reply #12
I wouldn't worry about using -march=native if you're using an older machine.  Your build should run on anything newer as long as 32-bit is still supported.  In my previous post, my build used -march=haswell (4th gen Core series), so that's probably why you couldn't run it.  I also forgot to enable nasm, so using haswell was giving me similar performance to the RareWare's build.  I tested another build using -march=nehalem (1st gen Core series) with nasm enabled, and it slightly edges out the RareWare's build on my machine. 

You can always see what CPU flags are being enabled with GCC or MinGW using this command.  (I'm using 64-bit Linux, so your command might differ slightly)
Code: [Select]
i686-w64-mingw32-gcc -Q --help=target -march=native | grep "enabled"

Re: LAME compiled executable slow!

Reply #13
Using -march=native usually is most advantageous when just running on the compiling machine.

Current CPU's support a lot of "special" instructions/instruction sets older ones do not.
The differences between AMD/Intel architectures not to forget. So, using -march=native is
kind of a gamble if the binary should run across several different platforms.

The lowest common denominator would rather be -march=x86-64(-v2|-v3|-v4), depending on the target system.
Or you might choose the -mtune settings. The GCC manual details the different options.

For personal purposes, i also compiled the Lame binary for several years now. Aim was always to get
the fastest binary possible, that runs on every CPU - let's say - of the past decade. I also use GCC,
as it provides a solid performance and does not favor Intel over AMD as the ICL did (at least until last year).
It is even able to outperform ICL and MSVC builds speedwise, according to my experience and tests.