Hi, why aoTuVbeta6.02 Compiles are not available @ Rarewarez, could somebody so kind and made oggenc2 + dlls available to public, thankyou
Do you mean 6.03?
sure, 6.03
it's couple of days
sure, 6.03
it's couple of days
Okay I `ve made an attempt for optimized OggEnc2 build. Please for testing.
http://www.mediafire.com/?1pl3o7vqb3ljxe3
I replace the oggenc2.exe from rarewares with yours and the included dll's but I get the following message from Foobar2000 when I convert:
Source: "D:\Audio\Pennywise - Live @ The Key Club\Pennywise - 02 - Wouldn't it Be Nice.flac"
An error occurred while finalizing the encoding process (Object not found) : "D:\Pennywise - 02 - Wouldn't it Be Nice.ogg"
Conversion failed: Object not found
I use the same command line as I did with the rarewares 6.02 that worked and it is the following:
%s -q 3.5 --advanced-encode-option lowpass_frequency=17 --advanced-encode-option impulse_noisetune=-15 -o %d
I use Windows XP SP3 on this laptop.
Regards
I replace the oggenc2.exe from rarewares with yours and the included dll's but I get the following message from Foobar2000 when I convert:
Source: "D:\Audio\Pennywise - Live @ The Key Club\Pennywise - 02 - Wouldn't it Be Nice.flac"
An error occurred while finalizing the encoding process (Object not found) : "D:\Pennywise - 02 - Wouldn't it Be Nice.ogg"
Conversion failed: Object not found
I use the same command line as I did with the rarewares 6.02 that worked and it is the following:
%s -q 3.5 --advanced-encode-option lowpass_frequency=17 --advanced-encode-option impulse_noisetune=-15 -o %d
I use Windows XP SP3 on this laptop.
Regards
Do you get an crash or error message during conversion?
I think it requires SSE 4.1 or higher to run as it was compiled as optimized build.
Try to copy msvcr100.dll and libmmd.dll to same directory.
Oh, I see. This Intel Centrino only has SSE2 I think. Regards.
Ah so, If I get some positive confirmation about proper conversion I can try to make a more generic build tomorrow.
I just tried it on my Intel Q9550 with Windows 7 x64 and it works. The processor was in 1.9GHz mode to save power and not in full 2.83GHz and it was about 50x times faster than realtime.
Just for the case I`ve made a more generic build that should run on any CPU:
http://www.mediafire.com/?81janz0arwccpva
Not Tested
The generic version worked on my Intel Centrino 1.75GHz and the speed was about 9x. Regards
By the way, thank you a lot for these builds! I really appreciate it. Regards.
Is it a lot of work to make a SSE2 version. I do not expect you to make one but just wonder if it is something you might be interested in making? Regards.
Is it a lot of work to make a SSE2 version. I do not expect you to make one but just wonder if it is something you might be interested in making? Regards.
Not so much work, if there'll be some interest for SSE2 I might do that level also.
Thanks for testing.
Is it a lot of work to make a SSE2 version. I do not expect you to make one but just wonder if it is something you might be interested in making? Regards.
Not so much work, if there'll be some interest for SSE2 I might do that level also.
Thanks for testing.
No problem at all. You are the one to be thanked! Regards.
In the meantime you can test my compile (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=32932&view=findpost&p=753991).
It doesn't have built-in FLAC reader and resampler but since you use oggenc2 as encoding backend for foobar2000 they are useless anyway.
Thanks for that. Tested SSE2 version on Athlon64 3000+ and got twice the encoding speed of VEnc without any anomalous bitrate/quality differences (tested on commentary DVD track at @100kbps). Speed of your OggEnc2 was about on-par with Musepack and right between Nero and CELT.
If you want gory details I'll post later
This test was for multiple reasons: a) to expand my knowledge of DVD video extraction process (heard that CELT did better with 48KHz and figured stereo AC3 converted to WAV would work; b) to test CELT encoding speed in regards to the statement that it was a magnitude faster because of no psy-model and other reasons; c) just another general encoder speed test carried out on a not-so-modern machine. Seems CELT could benefit from optimizations. But anyhow, I ended up putting results here since the OggEnc2 compile (SSE2) above seemed dramatically faster than Venc.
This test focused on lossy encoders aiming for a VBR setting which would result in @100kbps since I figured a DVD commentary track was less complex and needed no 5.1->stereo conversion. I think 100kbps for this material is overkill since there is virtually no audible winner here. Lossless codecs were added to reflect the material was non-complex and easily compressed.
Encoder/Settings File size Bitrate* Speed* TimeThis
_________________________ __________ __________ _______ ___________
Musepack 1.30 -q 4.99 65,978,894 98.8 kbps 23.99x 222.687 sec
AoTuv 6.03 -q 4 65,996,209 98.8 kbps 13.37x 399.593 sec
OggEnc 2.7 SSE2 -q 4 65,996,560 98.8 kbps 26.71x 200.781 sec
CELT 0.11.2 --bitrate 100 66,843,352 [97.8 kbps] [30.49x] 175.156 sec
Nero AAC 1.5.1.0 -q 0.4 67,192,851 [98.9 kbps] [20.86x] 256.093 sec
OggEnc 2.7 SSE2 -q 4.1 67,515,708 101.1 kbps 26.57x 201.078 sec
AoTuv 6.03 -q 4.1 67,516,144 101.1 kbps 13.31x 401.265 sec
LAME 3.99 alpha16 -V 5 68,150,160 102.1 kbps 14.10x 379.500 sec
Musepack 1.16 -q 4.99 68,544,320 102.7 kbps 24.03x 222.281 sec
Helix MP3 5.1 -V90 69,001,032 103.4 Kbps [104.99x] 50.875 sec
Musepack 1.30 -q 5 72,427,994 108.5 kbps 23.73x 225.171 sec
Helix MP3 5.1 -V100 72,959,688 109.3 Kbps [102.59x] 52.062 sec
Musepack 1.16 -q 5 75,120,664 112.5 kbps 23.77x 225.406 sec
WAVE (16bit 48KHz 2ch) 1,025,507,372 1536 kbps - - - - - - - - - -
FLAC 1.2.0 -5 312,449,472 468 kbps [88.90x] 60.078 sec
Monkey's Audio -c2000** 312,394,832 468 kbps [166.22x] 32.328 sec
WavPack 4.60 311,117,884 465 kbps [95.57x] 55.890 sec
TAK 2.1.0 -p 2 281,036,191 421 kbps 122.28x 43.718 sec
* values reported by encoder, brackets indicate user-calculated values
** old/deprecated version of codec used, don't ask
Those were interesting numbers.
I think 100kbps for this material is overkill since there is virtually no audible winner here.
Since it is DVD audio track then even 64-80 kbps will be more than enough (excluding MP3 perhaps)
A full set of beta6.03 compiles is now at Rarewares.
john33- I noticed your P4 compile is much slower than lvqcl's compile, close to Venc . This was running an Athlon64. Seemed strange enough that I re-ran tests on both compiles to confirm.
I´ve experienced slow encoding speed too with rarewares compilation on my AMD cpu. http://www.hydrogenaudio.org/forums/index....st&p=726406 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=79682&view=findpost&p=726406)
Great, it's the compiler. The Rarewares generic compile gets 10x speed, the SSE's get 12x. Meanwhile the lvqcl gets 26x
I have compared the speeds and lvqcl's oggenc is about 1.5x faster than every other compile.
If raw streams binary compared, each compile generated binary different output, maybe it's due to each compiler operates with different floating point precision.
john33- I noticed your P4 compile is much slower than lvqcl's compile, close to Venc . This was running an Athlon64. Seemed strange enough that I re-ran tests on both compiles to confirm.
Hmmm, I'll check the compiler options I've used.
In the meantime you can test my compile (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=32932&view=findpost&p=753991).
It doesn't have built-in FLAC reader and resampler but since you use oggenc2 as encoding backend for foobar2000 they are useless anyway.
What compiler optimisations are you using?
I have compared speeds of my test oggenc compile posted earlier with Rarewarez's and the speeds are up to pair, used IntelC++ with maximum optimizations, I'd be curious too about the LancerMod compiling params, maybe reduced floating precission? But still I don't expect so much speedup from only that.
Quick comparison (single thread):
john33 (x64) - 45.15x realtime
lvqcl (x64, SSE3) - 60.50x realtime
Windows 7 x64, Intel Core i3 530
Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.
My tests (Core2 Q9300 @2.5 GHz):
venc: 20.9x realtime
Rarewares compiles:
generic: 21.2x
P4: 34.5x
x64: 37.1x
My compiles of oggenc2 without code from Lancer:
32-bit: 34.2x
x64: 36.8x
(almost the same as oggenc2 from Rarewares)
My compiles of oggenc2 with code from Lancer (these were uploaded):
32-bit SSE: 38.1x
32-bit SSE2: 46.1x
32-bit SSE3: 46.0x
64-bit SSE2: 47.8x
64-bit SSE3: 48.9x
I DIDN'T test these compiles on AMD processors.
What compiler optimisations are you using?
Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3.
Options:
Whole program optimization = Yes
C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo
Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2
Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options?
Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.
...
I didn't realise that you had ported some of the Lancer mods.
Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3.
Options:
Whole program optimization = Yes
C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo
Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2
Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options?
Compiler: MSVS 2008 + Intel Compiler 11.1.067.
Options:
Whole program optimization = No
C/C++ optimisation: /O3 /Ob2 /Oi /Ot /Og /Qip /Qfp-speculation:fast
Code Generation: /GF /EHsc /MT /GS /QaxSSSE3 /fp:fast
(That's for x64)
I've not tried fast=2, does that win you anything?
The P4 compile is the same except: /arch:IA32 /QaxSSE2 in place of /QaxSSSE3
I've not tried fast=2, does that win you anything?
I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).
Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page (http://www.hydrogenaudio.org/forums/index.php?showtopic=74345&st=75)) and test again?
I've not tried fast=2, does that win you anything?
I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).
...
I'll give it a try.
I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).
But how about sound quality? Is it affected? You know, 0.3% ain't much.
Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page (http://www.hydrogenaudio.org/forums/index.php?showtopic=74345&st=75)) and test again?
I am sorry to inform that I have not tried compiling these encoders before.
But... I can concur with your some of your other benches:
My tests (Core2 Q9300 @2.5 GHz):
CODE
venc: 20.9x realtime
...
My compiles of oggenc2 without code from Lancer:
32-bit: 34.2x
If in regards to the ICL "bias" in disfavor of AMD, I'm not 100% sure if this is the case.
Would it be worth asking john33 like to attempt compiles of MSVC that used SSE/2? I thought the generic compile only ended at ASM (just an half-wit suggestion).
edit: lvqcl- just realized it is patch, not compiler thing, report back when later. Also, I seem to recall something about 'early' SSE2 vs. 'true' SSE2 instruction after all, this is early Athlon64 processor and dilapidated :\
edit2: quick test of iccpatch definitely improved Rarewares P4 compile on my AMD about 15-20 percent at default Vorbis rate -q 3 setting.
Back with a new batch of test results. Same commentary track as previous test in this thread but at -q3 (still overkill bitrate). Threw in blacksword lancer, which I included only as a perspective on optimizations.
Oggenc 2.83 aoTuv 5 Lancer 20061103 SSE2 31.956x 89.0 kb/s
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 25.679x 89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH 20.078x 89.9 kb/s
Venc aoTuV 6.03 13.381x 89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 12.335x 89.9 kb/s
The ICCPATCH really does have quite an impact on this particular AMD processor running Rarewares P4 compile.
I re-ran the Vorbis tests again, this time at -q2. Tested effect of ICCpatch on lvqcl's compile and changed to last Blacksword compile (1 whole week newer). I was also curious to test LAME compiles from Rarewares with ICCpatch. Here's the results:
using test WAV 16bit, 48KHz, 2ch, 1,025,507,372 bytes
encoder & version (all run at -q2) time rate filesize
_____________________________________________ ______ _______ ________________
Oggenc 2.83 aoTuv 5 Lancer 20061110 SSE2 2m 57s 30.196x 52,521,704 bytes
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 3m 27s 25.801x 51,621,665 bytes
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 w/ICCpatch 3m 34s 24.959x 51,621,665 bytes
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCpatch 4m 30s 19.782x 51,621,285 bytes
Venc aoTuV 6.03 6m 22s 13.978x 51,621,326 bytes
OggEnc 2.87 aoTuV 6.03 john33 P4 7m 10s 12.421x 51,621,530 bytes
Foobar2000 bit-compare tracks:
OGG files of lvqcl patched vs. unpatched = No differences in decoded data found
OGG files of john33 patched vs. unpatched = Differences found: 47294972 sample(s), starting at 3.2973333 second(s), peak: 0.0511622 at 4980.8489065 second(s), 2ch
version (all run at -V6) time rate filesize
___________________________ ______ _______ ________________
LAME 3.98.4 4m 52s 18.256x 60,421,848 bytes
LAME 3.98.4 (ICCpatch) 4m 46s 18.673x 60,421,848 bytes
LAME 3.99 beta 0 6m 29s 13.706x 59,409,552 bytes
LAME 3.99 beta 0 (ICCpatch) 4m 36s 19.306x 59,409,552 bytes
Foobar2000 bit-compare tracks:
MP3 files of 3.98.4 patched vs. unpatched = No differences in decoded data found
MP3 files of 3.99 beta 0 patched vs. unpatched = No differences in decoded data found
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 25.679x 89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH 20.078x 89.9 kb/s
As I said, my compiles (with some optimizations from Lancer) are 25...30% faster than pure C code. 25.679/20.078 = 1.28, as expected.
OggEnc 2.87 aoTuV 6.03 john33 P4 12.335x 89.9 kb/s
IMHO using
/arch:.... option in addition to (or instead of)
/Qax... should increase encoding speed on non-Intel processors.