HydrogenAudio

Lossy Audio Compression => Ogg Vorbis => Ogg Vorbis - Tech => Topic started by: AshenTech on 2010-01-06 00:41:37

Title: aoTuV 5.7/bs1 x64/x86 Compiles for AMD chips
Post by: AshenTech on 2010-01-06 00:41:37
http://www.agner.org/optimize/blog/read.php?i=49 (http://www.agner.org/optimize/blog/read.php?i=49)

as this artical explains, Intels CPU dispatcher automatically generates multi optimization paths when compiling, Any cpu the dispatcher detects as not being Intel is sent a FAR slower code path, this means that most likely if this guide is followed
http://www.agner.org/optimize/#manual_cpp (http://www.agner.org/optimize/#manual_cpp)
the compiled software would endup being alot faster on non-intel cpus such as amd and via.

Was hoping somebody with the skill (john33 if hes got the time) could compile a non-cpu bias version so we can see how this extreamly bias dispatcher is effecting non-intel users.

btw, in pcmark the dif on a via nano was 47.4% perf boost by making the program see it as an intel cpu rather then via, thats HUGE......
Title: aoTuV 5.7/bs1 x64/x86 Compiles for AMD chips
Post by: lvqcl on 2010-01-06 01:17:56
Obviously you can use generic aoTuV 5.7 compile from http://www.rarewares.org/ogg-oggenc.php (http://www.rarewares.org/ogg-oggenc.php)

You can also test aoTuV 5.7 P4 compile vs. P3 vs. generic...  Usually ICC compile is faster than generic (MSVC) not only on Intel but on AMD processors too (usually but not always).
Title: aoTuV 5.7/bs1 x64/x86 Compiles for AMD chips
Post by: AshenTech on 2010-01-06 01:54:56
Obviously you can use generic aoTuV 5.7 compile from http://www.rarewares.org/ogg-oggenc.php (http://www.rarewares.org/ogg-oggenc.php)

You can also test aoTuV 5.7 P4 compile vs. P3 vs. generic...  Usually ICC compile is faster than generic (MSVC) not only on Intel but on AMD processors too (usually but not always).


yes, but intels compiler sends AMD chips and any non-intel chip a less then optimal code path is my point, IF you can make the software THINK your using a "Genuine_Intel" cpu you get the optimal path, otherwise, intels cpu dispatcher choose a less then optimal path(manytimes still faster then other compilers work, but still slower then it should be)

This just just a request to give it a shot to see if it makes a diffrance, till intel puts out an unbias version of their compiler(they already signed an agreement with amd to do this for AMD chips, but that isnt gonna help via if they dont change the use if CPUID strings for choosing code path rather then quaring the cpu for supported features.

ars showed a 47.4% boost on the via nano by faking the cpuid string as intel, and a smaller boost(think it was like 10%) by identifying the cup as amd...

Quote
My my. Swap CentaurHauls for AuthenticAMD, and Nano's performance magically jumps about 10 percent. Swap for GenuineIntel, and memory performance goes up no less than 47.4 percent. This is not a test error or random occurrence; I benchmarked each CPUID multiple times across multiple reboots on completely clean Windows XP installations. The gains themselves are not confined to a small group of tests within the memory subsystem evaluation, but stretch across the entire series of read/write tests. Only the memory latency results remain unchanged between the two CPUIDs.


http://arstechnica.com/hardware/reviews/20...no-review.ars/6 (http://arstechnica.com/hardware/reviews/2008/07/atom-nano-review.ars/6)

if the link wont work you can use google cache to view it(its been loading VERY slow the last few days)

this is what im talking about, huge diffrances by changing cpuid string.
Title: aoTuV 5.7/bs1 x64/x86 Compiles for AMD chips
Post by: roozhou on 2010-01-28 16:52:21
And why not manually optimize the time critical part by assembly, just like ffmpeg and x264 devels have done?
Title: aoTuV 5.7/bs1 x64/x86 Compiles for AMD chips
Post by: Yirkha on 2010-02-13 02:43:47
Because unless you can find a nice mathematical or programming trick how to do things differently, the extra work is rarely worth it with todays optimizing compilers.
Title: aoTuV 5.7/bs1 x64/x86 Compiles for AMD chips
Post by: forart.eu on 2010-03-07 13:59:05
Another interesting solution could be adopt Orc (http://code.entropywave.com/projects/orc/)* that - according to the newest Schrödinger release (http://diracvideo.org/2010/03/schroedinger-1-0-9-released/) - seems to optimize code very mutch:
Quote
  • Orc: Complete conversion to Orc and removal of liboil dependency.
  • Added a lot of orc code to make things faster.  A lot faster.
Quote
we’ve switched over to using Orc instead of liboil for signal processing code.  Dirac is a very configurable format, and normally would require thousands of lines of assembly code — Orc generates this at runtime from simple rules.  (Hey, it was easier to write Orc than write all that assembly!)


I'm not a developer (nor a binary builder), so I simply don't know if it's applicable to Vorbis (and, why not, Theora) too.

BTW, hope that inspire...

* note: ORC means Oil Runtime Compiler, not Open Research Compiler...
Title: aoTuV 5.7/bs1 x64/x86 Compiles for AMD chips
Post by: X-Fi6 on 2010-03-30 08:07:06
"Runtime Compiler" really should give your answer away  not to be anti-Java or anything. Though a well-developed compiler with the right optimizations should typically produce code that's faster than a self-compiling program, unless it was done right... Could we see some benchmarks of Orc?