Skip to main content

Topic: aoTuVbeta6.02 (Read 18626 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • john33
  • [*][*][*][*][*]
  • Developer
aoTuVbeta6.02
Reply #25
In the meantime you can test my compile.
It doesn't have built-in FLAC reader and resampler but since you use oggenc2 as encoding backend for foobar2000 they are useless anyway.

What compiler optimisations are you using?
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/

  • Anakunda
  • [*][*][*][*][*]
aoTuVbeta6.02
Reply #26
I have compared speeds of my test oggenc compile posted earlier with Rarewarez's and the speeds are up to pair, used IntelC++ with maximum optimizations, I'd be curious too about the LancerMod compiling params, maybe reduced floating precission? But still I don't expect so much speedup from only that.
  • Last Edit: 05 May, 2011, 06:51:44 AM by Anakunda

aoTuVbeta6.02
Reply #27
Quick comparison (single thread):

john33 (x64) -  45.15x realtime

lvqcl (x64, SSE3) - 60.50x realtime

Windows 7 x64, Intel Core i3 530

  • lvqcl
  • [*][*][*][*][*]
  • Developer
aoTuVbeta6.02
Reply #28
Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.

My tests (Core2 Q9300 @2.5 GHz):
Code: [Select]
venc: 20.9x realtime

Rarewares compiles:
generic: 21.2x
P4: 34.5x
x64: 37.1x

My compiles of oggenc2 without code from Lancer:
32-bit: 34.2x
x64: 36.8x
(almost the same as oggenc2 from Rarewares)

My compiles of oggenc2 with code from Lancer (these were uploaded):
32-bit SSE: 38.1x
32-bit SSE2: 46.1x
32-bit SSE3: 46.0x

64-bit SSE2: 47.8x
64-bit SSE3: 48.9x

I DIDN'T test these compiles on AMD processors.


What compiler optimisations are you using?

Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3.
Options:
Whole program optimization = Yes
C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo
Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2

Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options?

  • john33
  • [*][*][*][*][*]
  • Developer
aoTuVbeta6.02
Reply #29
Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.
...

I didn't realise that you had ported some of the Lancer mods.
Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3.
Options:
Whole program optimization = Yes
C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo
Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2

Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options?

Compiler: MSVS 2008 + Intel Compiler 11.1.067.
Options:
Whole program optimization = No
C/C++ optimisation: /O3 /Ob2 /Oi /Ot /Og /Qip /Qfp-speculation:fast
Code Generation: /GF /EHsc /MT /GS /QaxSSSE3 /fp:fast
(That's for x64)
I've not tried fast=2, does that win you anything?

The P4 compile is the same except: /arch:IA32 /QaxSSE2 in place of /QaxSSSE3
  • Last Edit: 05 May, 2011, 01:49:44 PM by john33
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/

  • lvqcl
  • [*][*][*][*][*]
  • Developer
aoTuVbeta6.02
Reply #30
Quote
I've not tried fast=2, does that win you anything?

I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).

Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page) and test again?

  • john33
  • [*][*][*][*][*]
  • Developer
aoTuVbeta6.02
Reply #31
Quote
I've not tried fast=2, does that win you anything?

I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).
...

I'll give it a try.
John
----------------------------------------------------------------
My compiles and utilities are at http://www.rarewares.org/

  • _m²_
  • [*][*][*]
aoTuVbeta6.02
Reply #32
Quote
I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).

But how about sound quality? Is it affected? You know, 0.3% ain't much.
  • Last Edit: 05 May, 2011, 03:43:53 PM by _m²_

  • Destroid
  • [*][*][*][*][*]
aoTuVbeta6.02
Reply #33
Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page) and test again?
I am sorry to inform that I have not tried compiling these encoders before.
But... I can concur with your some of your other benches:
Quote
My tests (Core2 Q9300 @2.5 GHz):
CODE
venc: 20.9x realtime
...
My compiles of oggenc2 without code from Lancer:
32-bit: 34.2x

If in regards to the ICL "bias" in disfavor of AMD, I'm not 100% sure if this is the case.

Would it be worth asking john33 like to attempt compiles of MSVC that used SSE/2? I thought the generic compile only ended at ASM (just an half-wit suggestion).

edit: lvqcl- just realized it is patch, not compiler thing, report back when later. Also, I seem to recall something about 'early' SSE2 vs. 'true' SSE2 instruction  after all, this is early Athlon64 processor and dilapidated :\

edit2: quick test of iccpatch definitely improved Rarewares P4 compile on my AMD about 15-20 percent at default Vorbis rate -q 3 setting.
  • Last Edit: 06 May, 2011, 07:51:44 AM by Destroid
"Something bothering you, Mister Spock?"

  • Destroid
  • [*][*][*][*][*]
aoTuVbeta6.02
Reply #34
Back with a new batch of test results. Same commentary track as previous test in this thread but at -q3 (still overkill bitrate). Threw in blacksword lancer, which I included only as a perspective on optimizations.
Code: [Select]
Oggenc 2.83 aoTuv 5 Lancer 20061103 SSE2	31.956x	  89.0 kb/s
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 25.679x  89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH 20.078x   89.9 kb/s
Venc aoTuV 6.03 13.381x   89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 12.335x   89.9 kb/s
The ICCPATCH really does have quite an impact on this particular AMD processor running Rarewares P4 compile.
"Something bothering you, Mister Spock?"

  • Destroid
  • [*][*][*][*][*]
aoTuVbeta6.02
Reply #35
I re-ran the Vorbis tests again, this time at -q2. Tested effect of ICCpatch on lvqcl's compile and changed to last Blacksword compile (1 whole week newer). I was also curious to test LAME compiles from Rarewares with ICCpatch. Here's the results:

Code: [Select]
using test WAV 16bit, 48KHz, 2ch, 1,025,507,372 bytes

encoder & version  (all run at -q2)            time    rate    filesize
_____________________________________________  ______  _______  ________________
Oggenc 2.83 aoTuv 5 Lancer 20061110 SSE2      2m 57s  30.196x  52,521,704 bytes
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2              3m 27s  25.801x  51,621,665 bytes
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 w/ICCpatch  3m 34s  24.959x  51,621,665 bytes
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCpatch    4m 30s  19.782x  51,621,285 bytes
Venc aoTuV 6.03                                6m 22s  13.978x  51,621,326 bytes
OggEnc 2.87 aoTuV 6.03 john33 P4              7m 10s  12.421x  51,621,530 bytes

Foobar2000 bit-compare tracks:
OGG files of lvqcl patched vs. unpatched = No differences in decoded data found
OGG files of john33 patched vs. unpatched = Differences found: 47294972 sample(s), starting at 3.2973333 second(s), peak: 0.0511622 at 4980.8489065 second(s), 2ch


version (all run at -V6)      time    rate    filesize
___________________________  ______  _______  ________________
LAME 3.98.4                  4m 52s  18.256x  60,421,848 bytes
LAME 3.98.4 (ICCpatch)      4m 46s  18.673x  60,421,848 bytes
LAME 3.99 beta 0            6m 29s  13.706x  59,409,552 bytes
LAME 3.99 beta 0 (ICCpatch)  4m 36s  19.306x  59,409,552 bytes

Foobar2000 bit-compare tracks:
MP3 files of 3.98.4 patched vs. unpatched = No differences in decoded data found
MP3 files of 3.99 beta 0 patched vs. unpatched = No differences in decoded data found
"Something bothering you, Mister Spock?"

  • lvqcl
  • [*][*][*][*][*]
  • Developer
aoTuVbeta6.02
Reply #36
Code: [Select]
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2        25.679x   89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH    20.078x      89.9 kb/s

As I said, my compiles (with some optimizations from Lancer) are 25...30% faster than pure C code.  25.679/20.078 = 1.28, as expected.

Code: [Select]
OggEnc 2.87 aoTuV 6.03 john33 P4        12.335x      89.9 kb/s

IMHO using /arch:.... option in addition to (or instead of) /Qax...  should increase encoding speed on non-Intel processors.