Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: I just optimized Musepack for OS X (Read 7106 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

I just optimized Musepack for OS X

I am lazy, so the code will only run on a G4 or higher.

http://s91752438.onlinehome.us/public/mppenc.zip

Please do some extensive testing against the officially released build using Java ABC/HR. If no serious sound issues are revealed, I will release the (relatively minor) changes for mass consumption.

I just optimized Musepack for OS X

Reply #1
Can you give some details on the kinds of optimizations you've made?  Btw, haven't tried it yet, but if it works well, it'll be very nice and very welcome

I just optimized Musepack for OS X

Reply #2
Quote
If no serious sound issues are revealed, I will release the (relatively minor) changes for mass consumption.
[a href="index.php?act=findpost&pid=234668"][{POST_SNAPBACK}][/a]


The (L)GPL zealots will come after you.

I just optimized Musepack for OS X

Reply #3
Quote
Quote
If no serious sound issues are revealed, I will release the (relatively minor) changes for mass consumption.
[a href="index.php?act=findpost&pid=234668"][{POST_SNAPBACK}][/a]


The (L)GPL zealots will come after you.
[a href="index.php?act=findpost&pid=234672"][{POST_SNAPBACK}][/a]


Yeah, I know.

All I did was use a square-root estimate (which is very close, and MUUUUCH faster), enable the built-in fast math routines, change the FFT code to Apple's vBigDSP, and change the compiler options to "-fast -mcpu=7400 -mtune=7400 -ftracer --param max-gcse-passes=8 -fsingle-precision-constant".

In fact, it should be me complaining about (L)GPL violations! The source code that is available to the public compiles into a completely different build than the binaries released on www.musepack.net. They apparently have some more optimization tricks up their sleeve that are not enabled / available to the public. For example, compiling the public source code with ONLY the compiler optimizations, I got around 2.0x performance. The official build, got around 2.5x. The interesting thing to me was that:
Building from the publicly available SOURCE, the biggest performance hit was the use of sqrt(). It was over 25% of execution time. However, in the public BUILDS, it appeared that they never even USED the system's sqrt(), instead using some magical inlined sqrt function instead. Not to mention they disabled debug symbols, so I can't even tell what functions are slow in the official build without looking at the assembler.

I'd like to know who made the OS X builds so I can add their "hidden" optimizations to my code!

Anyway, I believe this thread should be about testing for quality regressions, and not bickering over whether there is source or not. It's not like I'm selling it, and if you really REALLY want the source, I'll talk to you in private. The optimizations are nothing special.

Here are some of my tests.

Code: [Select]
official build:
   %|avg.bitrate| speed|play time (proc/tot)| CPU time (proc/tot)| ETA
100.0  179.7 kbps  2.81x     2:00.6    2:00.6     0:42.9    0:42.9  
optimized:
   %|avg.bitrate| speed|play time (proc/tot)| CPU time (proc/tot)| ETA
100.0  179.3 kbps  3.60x     2:00.6    2:00.6     0:33.5    0:33.5


There is a 0.4 kbps drop in bitrate, which is most likely caused by the combination of reduced sqrt precision and Apple's FFT. The speed gain is quite substantial.

I can't ABX the difference between the two files encoded at --standard --xlevel. Of course, I can't ABX them against the original either...

I just optimized Musepack for OS X

Reply #4
Quote
I'd like to know who made the OS X builds so I can add their "hidden" optimizations to my code!
[a href="index.php?act=findpost&pid=234679"][{POST_SNAPBACK}][/a]


If the OSX build available there is the same one in RareWares, then Dibrom did it, even before MPC was open-sourced (so, he didn't have to adhere to the LGPL back then)

I just optimized Musepack for OS X

Reply #5
Quote
Quote
I'd like to know who made the OS X builds so I can add their "hidden" optimizations to my code!
[a href="index.php?act=findpost&pid=234679"][{POST_SNAPBACK}][/a]


If the OSX build available there is the same one in RareWares, then Dibrom did it, even before MPC was open-sourced (so, he didn't have to adhere to the LGPL back then)
[a href="index.php?act=findpost&pid=234686"][{POST_SNAPBACK}][/a]


I have no idea if these are the same binaries I built, but if they are, I simply built them with XLC instead of GCC, which would explain the differences.  I didn't make any actual code optimizations.

I just optimized Musepack for OS X

Reply #6
Quote
(...snip...)
If the OSX build available there is the same one in RareWares (...snip...)

The Mac OS X build offered at musepack.net seems to be identical to the one at rarewares.org (when comparing files).
"ONLY THOSE WHO ATTEMPT THE IMPOSSIBLE WILL ACHIEVE THE ABSURD"
        - Oceania Association of Autonomous Astronauts

I just optimized Musepack for OS X

Reply #7
Quote
I have no idea if these are the same binaries I built, but if they are, I simply built them with XLC instead of GCC, which would explain the differences.  I didn't make any actual code optimizations.


How is xlc smart enough to optimize calls to sqrt completely out of the program? If there were no modifications, then a recompilation of this code with xlc would probably yield even better results than my compilation with gcc.

Too bad you have to pay for xlc.

I just optimized Musepack for OS X

Reply #8
Quote
Quote
I have no idea if these are the same binaries I built, but if they are, I simply built them with XLC instead of GCC, which would explain the differences.  I didn't make any actual code optimizations.


How is xlc smart enough to optimize calls to sqrt completely out of the program? If there were no modifications, then a recompilation of this code with xlc would probably yield even better results than my compilation with gcc.

Too bad you have to pay for xlc.
[{POST_SNAPBACK}][/a]


I don't know, but from what I have gathered XLC is known to produce faster code than GCC in many cases.  This is especially so for the G5, although the benefits obviously have shown through on the G4 as well.

There's a [a href="http://www14.software.ibm.com/webapp/download/search.jsp?go=y&rs=xlc]60 day trial version[/url], which is what I used.

I just optimized Musepack for OS X

Reply #9
Okay, I just recompiled with XLC, and it is _SLOWER_.

It only gets around 3.0x, while gcc gets about 3.6x. What's the deal?

I just optimized Musepack for OS X

Reply #10
Quote
Okay, I just recompiled with XLC, and it is _SLOWER_.

It only gets around 3.0x, while gcc gets about 3.6x. What's the deal?
[a href="index.php?act=findpost&pid=234905"][{POST_SNAPBACK}][/a]


I don't know.  Maybe you didn't use the same compiler flags that I did.  Unfortunately, I did the original compiles long enough ago that I don't remember much about it.

And this is all assuming they are even the same compiles.  I haven't been following any of this for awhile, but I would be a bit surprised if nobody else has released something more "official."

Did the XLC compile you made show the same sqrt behavior that the "official" compiles did?  If not, then it's probably a good bet that indeed you aren't using the same flags somewhere.

Edit:  And just the be sure, these are the same release versions of mppenc being compiled, right?

 

I just optimized Musepack for OS X

Reply #11
Quote
Okay, I just recompiled with XLC, and it is _SLOWER_.

It only gets around 3.0x, while gcc gets about 3.6x. What's the deal?
[a href="index.php?act=findpost&pid=234905"][{POST_SNAPBACK}][/a]

I tried to do an optimised build of LAME yesterday, and I got a 10-15% speedup on --preset medium and --preset standard. I used "-qaltivec -O5 -qnocommon" as CFLAGS. To do profile directed optimisation, I appended "-qpdf1 -qipa=pdfname=/tmp/lame-pdf.tmp", ran the resulting binary through a lot of files with the two preset, and rebuilt with "-qpdf1 -qipa=pdfname=/tmp/lame-pdf.tmp" appended instead.