Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Ogg Vorbis acceleration project (Read 190109 times) previous topic - next topic
0 Members and 3 Guests are viewing this topic.

Ogg Vorbis acceleration project

Reply #151
Anyone that can give some guidance on how to compile this under Linux (Ubuntu)? Regards.

Ogg Vorbis acceleration project

Reply #152
A few more statistics, transcoding 01:42:28 h of a 5.1 AC3 on a Phenom-II X4 945 using BeSweet with DPL-II downmix and fixed gain (to avoid including the normalization pass):

Generic
06:42 (686)
06:04 (P4)

Lancer
04:30 (SSE)
03:51 (SSE2)
03:50 (SSE3)

The gap between generic and extreme optimization is quite impressive. And even the gap between SSE and SSE2 is still remarkable. But after all, decoding and downmixing takes its time too, so a certain degree of saturation is expectable.

Ogg Vorbis acceleration project

Reply #153
@john33, any interest in also posting the cli encoder binary at Rarewares?

Done.

Thanks but unfortunately, and unlike your previous builds, it's not running anymore on older OSes pre-XP SP2 on which VC2010 runtimes can't be installed

But this might be helpfull: http://mulder.googlecode.com/svn/trunk/Uti...rLib/README.txt

Ogg Vorbis acceleration project

Reply #154
There are reasons why such old OS are deprecated. An excuse would be running them offline.

Ogg Vorbis acceleration project

Reply #155
Thanks but unfortunately, and unlike your previous builds, it's not running anymore on older OSes pre-XP SP2 on which VC2010 runtimes can't be installed

But this might be helpfull: http://mulder.googlecode.com/svn/trunk/Uti...rLib/README.txt

OK, what optimisation does your CPU support?

Ogg Vorbis acceleration project

Reply #156
It's rather a question of PE-building and linking than of CPU optimizations, john33. Not the CPU is the limit, but the OS and its set of supported Windows API functions.


Ogg Vorbis acceleration project

Reply #158
@john33, any interest in also posting the cli encoder binary at Rarewares?

Done.


Hi, John. But what is the difference between your new compiles and this?

I don't remember where I got it, but it was more than one year ago and actually this is also OggEnc v2.87 LancerMod(SSE3) based on aoTuV b6.03 [20110424]. Could you clarify?
🇺🇦 Glory to Ukraine!

Ogg Vorbis acceleration project

Reply #159
OK, what optimisation does your CPU support?

MMX, SSE, SSE2, SSE3, SSSE3 and I'm usually using your P4 optimized builds.
Thank you

Try this: http://www.rarewares.org/files/ogg/oggenc2...cerSSE2_OLD.zip
and perhaps you could let me know if it's OK?

Ogg Vorbis acceleration project

Reply #160
@john33, any interest in also posting the cli encoder binary at Rarewares?

Done.


Hi, John. But what is the difference between your new compiles and this?

I don't remember where I got it, but it was more than one year ago and actually this is also OggEnc v2.87 LancerMod(SSE3) based on aoTuV b6.03 [20110424]. Could you clarify?

I couldn't say with any certainty, but probably the only difference from looking at the size of the executables is that I don't think they were compiled with the libsamplerate resampler.

Ogg Vorbis acceleration project

Reply #161
Try this: http://www.rarewares.org/files/ogg/oggenc2...cerSSE2_OLD.zip
and perhaps you could let me know if it's OK?

Brilliant! Works like a charm, thanks a lot
Code: [Select]
G:\Test\>oggenc2 -h
OggEnc v2.87 (LancerMod(SSE2) based on aoTuV b6.03 [20110424])
(c) 2000-2005 Michael Smith <msmith@xiph.org>
& portions by John Edwards <john.edwards33@ntlworld.com>

Ogg Vorbis acceleration project

Reply #162
My versions of oggenc2.exe doesn't include SRC and FLAC libraries and I commented out all relevant options and calls.

@john33: in your compiles these options are disabled too    I think it's not what you want, and 3 source files with re-enabled options are attached to the post.

Ogg Vorbis acceleration project

Reply #163
Quote
Hi, John. But what is the difference between your new compiles and this?
I don't remember where I got it, but it was more than one year ago and actually this is also OggEnc v2.87 LancerMod(SSE3) based on aoTuV b6.03 [20110424]. Could you clarify?

Some tests (out of interest) on my PC reveal that john33's current binaries are slightly but noticably faster than these in your link, in the very least.

Ogg Vorbis acceleration project

Reply #164
My versions of oggenc2.exe doesn't include SRC and FLAC libraries and I commented out all relevant options and calls.

@john33: in your compiles these options are disabled too    I think it's not what you want, and 3 source files with re-enabled options are attached to the post.

Thanks, but the versions at Rarewares have these enabled.

EDIT: I just realised that the options were disabled in the oggenc2 code!  I had enabled the inclusion of the libs in the compiles and hadn't checked the code!

Ogg Vorbis acceleration project

Reply #165
All of the above oggenc2 compiles have been updated at Rarewares. Sorry for the confusion!

Ogg Vorbis acceleration project

Reply #166
All of the above oggenc2 compiles have been updated at Rarewares. Sorry for the confusion!


Great work, thanks a lot.

Although the version by lvqcl is still faster on my machine. I use oggenc2 32bit sse3 from [a href='index.php?act=findpost&pid=784966']here[/a] and foobar converts a flac around 49x while your compile is at 42x.

Ogg Vorbis acceleration project

Reply #167
Your machine. Aha.

We all know your machine.

Oh, no, this is your first post, so how could we?

Hint: http://hwinfo.com/

Ogg Vorbis acceleration project

Reply #168
It's a core2duo laptop with a P8600+ 4gb ram on win7.

Ogg Vorbis acceleration project

Reply #169
Although the version by lvqcl is still faster on my machine. I use oggenc2 32bit sse3 from [a href='index.php?act=findpost&pid=784966']here[/a] and foobar converts a flac around 49x while your compile is at 42x.


Try LancerSSE2_OLD build. It is faster than other versions (except x64).

Ogg Vorbis acceleration project

Reply #170
With johns lancer sse2 old i get the same speed like using your sse3 version. 

Ogg Vorbis acceleration project

Reply #171
Out of curiosity i tested all 32bit oggenc2 compiles again and here are the results:


John33:

sse  35.69x
sse2 38.40x
sse3 38.60x

sse2old 47.19x


lvqcl:

sse  38.80x
sse2 47.94x
sse3 47.73x


I'm not familiar with compiling, so i wonder why there is such a huge step in speed from sse to sse2 while sse2 and sse3 are on the same level?

Ogg Vorbis acceleration project

Reply #172
This effect doesn't belong to the "Compiling" as such (the C compiler only translates the source routines which are not very CPU optimized; the in-depth CPU instruction set optimization is more efficiently done via manual Assembler code).

The efficiency boost between different instruction sets depends on the algorithm to be optimized and the differences between the instruction sets. So specifically for the Vorbis encoding, SSE2 seems to introduce very useful new instructions (relative to SSE only), but the new instructions in SSE3 (relatively to SSE2 only) are only marginal for the Vorbis algorithms.

Ogg Vorbis acceleration project

Reply #173
The efficiency boost between different instruction sets depends on the algorithm to be optimized and the differences between the instruction sets. So specifically for the Vorbis encoding, SSE2 seems to introduce very useful new instructions (relative to SSE only), but the new instructions in SSE3 (relatively to SSE2 only) are only marginal for the Vorbis algorithms.

Thanks for clarifying.

Is it the reason there is no sse4 compile, because it introduces too little useful instructions compared to sse3 as well?

Ogg Vorbis acceleration project

Reply #174
Quote
Is it the reason there is no sse4 compile, because it introduces too little useful instructions compared to sse3 as well?


I was under the impression the reason is more along the lines of SSE4* being an umbrella term for a clustermess of very different instruction sets some of which only work on newish Intel CPUs and others only on newish AMD CPUs and all of which only can be effectively optimized for on pretty new and specific compilers.