http://www.agner.org/optimize/blog/read.php?i=49 (http://www.agner.org/optimize/blog/read.php?i=49)
as this artical explains, Intels CPU dispatcher automatically generates multi optimization paths when compiling, Any cpu the dispatcher detects as not being Intel is sent a FAR slower code path, this means that most likely if this guide is followed
http://www.agner.org/optimize/#manual_cpp (http://www.agner.org/optimize/#manual_cpp)
the compiled software would endup being alot faster on non-intel cpus such as amd and via.
Was hoping somebody with the skill (john33 if hes got the time) could compile a non-cpu bias version so we can see how this extreamly bias dispatcher is effecting non-intel users.
btw, in pcmark the dif on a via nano was 47.4% perf boost by making the program see it as an intel cpu rather then via, thats HUGE......
Obviously you can use generic aoTuV 5.7 compile from http://www.rarewares.org/ogg-oggenc.php (http://www.rarewares.org/ogg-oggenc.php)
You can also test aoTuV 5.7 P4 compile vs. P3 vs. generic... Usually ICC compile is faster than generic (MSVC) not only on Intel but on AMD processors too (usually but not always).
Obviously you can use generic aoTuV 5.7 compile from http://www.rarewares.org/ogg-oggenc.php (http://www.rarewares.org/ogg-oggenc.php)
You can also test aoTuV 5.7 P4 compile vs. P3 vs. generic... Usually ICC compile is faster than generic (MSVC) not only on Intel but on AMD processors too (usually but not always).
yes, but intels compiler sends AMD chips and any non-intel chip a less then optimal code path is my point, IF you can make the software THINK your using a "Genuine_Intel" cpu you get the optimal path, otherwise, intels cpu dispatcher choose a less then optimal path(manytimes still faster then other compilers work, but still slower then it should be)
This just just a request to give it a shot to see if it makes a diffrance, till intel puts out an unbias version of their compiler(they already signed an agreement with amd to do this for AMD chips, but that isnt gonna help via if they dont change the use if CPUID strings for choosing code path rather then quaring the cpu for supported features.
ars showed a 47.4% boost on the via nano by faking the cpuid string as intel, and a smaller boost(think it was like 10%) by identifying the cup as amd...
My my. Swap CentaurHauls for AuthenticAMD, and Nano's performance magically jumps about 10 percent. Swap for GenuineIntel, and memory performance goes up no less than 47.4 percent. This is not a test error or random occurrence; I benchmarked each CPUID multiple times across multiple reboots on completely clean Windows XP installations. The gains themselves are not confined to a small group of tests within the memory subsystem evaluation, but stretch across the entire series of read/write tests. Only the memory latency results remain unchanged between the two CPUIDs.
http://arstechnica.com/hardware/reviews/20...no-review.ars/6 (http://arstechnica.com/hardware/reviews/2008/07/atom-nano-review.ars/6)
if the link wont work you can use google cache to view it(its been loading VERY slow the last few days)
this is what im talking about, huge diffrances by changing cpuid string.
And why not manually optimize the time critical part by assembly, just like ffmpeg and x264 devels have done?
Because unless you can find a nice mathematical or programming trick how to do things differently, the extra work is rarely worth it with todays optimizing compilers.
Another interesting solution could be adopt Orc (http://code.entropywave.com/projects/orc/)* that - according to the newest Schrödinger release (http://diracvideo.org/2010/03/schroedinger-1-0-9-released/) - seems to optimize code very mutch:
- Orc: Complete conversion to Orc and removal of liboil dependency.
- Added a lot of orc code to make things faster. A lot faster.
we’ve switched over to using Orc instead of liboil for signal processing code. Dirac is a very configurable format, and normally would require thousands of lines of assembly code — Orc generates this at runtime from simple rules. (Hey, it was easier to write Orc than write all that assembly!)
I'm not a developer (nor a binary builder), so I simply don't know if it's applicable to Vorbis (and, why not, Theora) too.
BTW, hope that inspire...
* note: ORC means Oil Runtime Compiler, not Open Research Compiler...
"Runtime Compiler" really should give your answer away not to be anti-Java or anything. Though a well-developed compiler with the right optimizations should typically produce code that's faster than a self-compiling program, unless it was done right... Could we see some benchmarks of Orc?