HydrogenAudio

Lossy Audio Compression => Ogg Vorbis => Ogg Vorbis - General => Topic started by: Anakunda on 2011-04-29 10:55:06

Title: aoTuVbeta6.02
Post by: Anakunda on 2011-04-29 10:55:06
Hi, why aoTuVbeta6.02 Compiles are not available @ Rarewarez, could somebody so kind and made oggenc2 + dlls available to public, thankyou
Title: aoTuVbeta6.02
Post by: alter4 on 2011-04-29 11:01:37
Do you mean 6.03?
Title: aoTuVbeta6.02
Post by: Anakunda on 2011-05-01 17:15:26
sure, 6.03
it's couple of days
Title: aoTuVbeta6.02
Post by: Anakunda on 2011-05-01 18:58:47
sure, 6.03
it's couple of days


Okay I `ve made an attempt for optimized OggEnc2 build. Please for testing.

Code: [Select]
http://www.mediafire.com/?1pl3o7vqb3ljxe3
Title: aoTuVbeta6.02
Post by: punkrockdude on 2011-05-01 20:40:17
I replace the oggenc2.exe from rarewares with yours and the included dll's but I get the following message from Foobar2000 when I convert:

Source: "D:\Audio\Pennywise - Live @ The Key Club\Pennywise - 02 - Wouldn't it Be Nice.flac"
An error occurred while finalizing the encoding process (Object not found) : "D:\Pennywise - 02 - Wouldn't it Be Nice.ogg"
Conversion failed: Object not found

I use the same command line as I did with the rarewares 6.02 that worked and it is the following:
%s -q 3.5 --advanced-encode-option lowpass_frequency=17 --advanced-encode-option impulse_noisetune=-15 -o %d

I use Windows XP SP3 on this laptop.
Regards
Title: aoTuVbeta6.02
Post by: Anakunda on 2011-05-01 20:53:02
I replace the oggenc2.exe from rarewares with yours and the included dll's but I get the following message from Foobar2000 when I convert:

Source: "D:\Audio\Pennywise - Live @ The Key Club\Pennywise - 02 - Wouldn't it Be Nice.flac"
An error occurred while finalizing the encoding process (Object not found) : "D:\Pennywise - 02 - Wouldn't it Be Nice.ogg"
Conversion failed: Object not found

I use the same command line as I did with the rarewares 6.02 that worked and it is the following:
%s -q 3.5 --advanced-encode-option lowpass_frequency=17 --advanced-encode-option impulse_noisetune=-15 -o %d

I use Windows XP SP3 on this laptop.
Regards


Do you get an crash or error message during conversion?
I think it requires SSE 4.1 or higher to run as it was compiled as optimized build.
Try to copy msvcr100.dll and libmmd.dll to same directory.
Title: aoTuVbeta6.02
Post by: punkrockdude on 2011-05-01 20:54:08
Oh, I see. This Intel Centrino only has SSE2 I think. Regards.
Title: aoTuVbeta6.02
Post by: Anakunda on 2011-05-01 20:57:37
Ah so, If I get some positive confirmation about proper conversion I can try to make a more generic build tomorrow.
Title: aoTuVbeta6.02
Post by: punkrockdude on 2011-05-01 21:10:54
I just tried it on my Intel Q9550 with Windows 7 x64 and it works. The processor was in 1.9GHz mode to save power and not in full 2.83GHz and it was about 50x times faster than realtime.
Title: aoTuVbeta6.02
Post by: Anakunda on 2011-05-01 21:16:09
Just for the case I`ve made a more generic build that should run on any CPU:

Code: [Select]
http://www.mediafire.com/?81janz0arwccpva


Not Tested
Title: aoTuVbeta6.02
Post by: punkrockdude on 2011-05-01 21:22:22
The generic version worked on my Intel Centrino 1.75GHz and the speed was about 9x. Regards
Title: aoTuVbeta6.02
Post by: punkrockdude on 2011-05-01 21:27:43
By the way, thank you a lot for these builds! I really appreciate it. Regards.
Title: aoTuVbeta6.02
Post by: punkrockdude on 2011-05-01 21:32:35
Is it a lot of work to make a SSE2 version. I do not expect you to make one but just wonder if it is something you might be interested in making? Regards.
Title: aoTuVbeta6.02
Post by: Anakunda on 2011-05-01 21:45:15
Is it a lot of work to make a SSE2 version. I do not expect you to make one but just wonder if it is something you might be interested in making? Regards.


Not so much work, if there'll be some interest for SSE2 I might do that level also.
Thanks for testing.
Title: aoTuVbeta6.02
Post by: punkrockdude on 2011-05-01 21:47:15
Is it a lot of work to make a SSE2 version. I do not expect you to make one but just wonder if it is something you might be interested in making? Regards.


Not so much work, if there'll be some interest for SSE2 I might do that level also.
Thanks for testing.

No problem at all. You are the one to be thanked! Regards.
Title: aoTuVbeta6.02
Post by: lvqcl on 2011-05-01 22:05:36
In the meantime you can test my compile (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=32932&view=findpost&p=753991).
It doesn't have built-in FLAC reader and resampler but since you use oggenc2 as encoding backend for foobar2000 they are useless anyway.
Title: aoTuVbeta6.02
Post by: Destroid on 2011-05-03 10:08:54
Thanks for that. Tested SSE2 version on Athlon64 3000+ and got twice the encoding speed of VEnc without any anomalous bitrate/quality differences (tested on commentary DVD track at @100kbps). Speed of your OggEnc2 was about on-par with Musepack and right between Nero and CELT.

If you want gory details I'll post later
Title: aoTuVbeta6.02
Post by: Destroid on 2011-05-03 19:18:24
This test was for multiple reasons: a) to expand my knowledge of DVD video extraction process (heard that CELT did better with 48KHz and figured stereo AC3 converted to WAV would work; b) to test CELT encoding speed in regards to the statement that it was a magnitude faster because of no psy-model and other reasons; c) just another general encoder speed test carried out on a not-so-modern machine. Seems CELT could benefit from optimizations. But anyhow, I ended up putting results here since the OggEnc2 compile (SSE2) above seemed dramatically faster than Venc.

This test focused on lossy encoders aiming for a VBR setting which would result in @100kbps since I figured a DVD commentary track was less complex and needed no 5.1->stereo conversion. I think 100kbps for this material is overkill since there is virtually no audible winner here. Lossless codecs were added to reflect the material was non-complex and easily compressed.
Code: [Select]
Encoder/Settings           File size    Bitrate*    Speed*     TimeThis  
_________________________  __________  __________  _______  ___________ 
Musepack 1.30 -q 4.99      65,978,894  98.8 kbps  23.99x  222.687 sec
AoTuv 6.03 -q 4            65,996,209  98.8 kbps  13.37x  399.593 sec
OggEnc 2.7 SSE2 -q 4      65,996,560  98.8 kbps  26.71x  200.781 sec
CELT 0.11.2 --bitrate 100  66,843,352  [97.8 kbps] [30.49x] 175.156 sec
Nero AAC 1.5.1.0 -q 0.4    67,192,851  [98.9 kbps] [20.86x] 256.093 sec
OggEnc 2.7 SSE2 -q 4.1    67,515,708  101.1 kbps  26.57x  201.078 sec
AoTuv 6.03 -q 4.1          67,516,144  101.1 kbps  13.31x  401.265 sec
LAME 3.99 alpha16 -V 5    68,150,160  102.1 kbps  14.10x  379.500 sec
Musepack 1.16 -q 4.99      68,544,320  102.7 kbps  24.03x  222.281 sec
Helix MP3 5.1 -V90        69,001,032  103.4 Kbps [104.99x]  50.875 sec
Musepack 1.30 -q 5        72,427,994  108.5 kbps  23.73x  225.171 sec
Helix MP3 5.1 -V100        72,959,688  109.3 Kbps [102.59x]  52.062 sec
Musepack 1.16 -q 5        75,120,664  112.5 kbps  23.77x  225.406 sec
WAVE (16bit 48KHz 2ch)  1,025,507,372  1536 kbps  - - - -  - - - - - -
FLAC 1.2.0 -5            312,449,472  468 kbps  [88.90x]  60.078 sec
Monkey's Audio -c2000**  312,394,832  468 kbps  [166.22x]  32.328 sec
WavPack 4.60              311,117,884  465 kbps  [95.57x]  55.890 sec
TAK 2.1.0 -p 2            281,036,191  421 kbps  122.28x  43.718 sec


*  values reported by encoder, brackets indicate user-calculated values
** old/deprecated version of codec used, don't ask
Title: aoTuVbeta6.02
Post by: IgorC on 2011-05-03 19:56:28
Those were interesting numbers.

I think 100kbps for this material is overkill since there is virtually no audible winner here.

Since it is DVD audio track then even 64-80 kbps will be more than enough (excluding MP3 perhaps)
Title: aoTuVbeta6.02
Post by: john33 on 2011-05-04 15:47:04
A full set of beta6.03 compiles is now at Rarewares.
Title: aoTuVbeta6.02
Post by: Destroid on 2011-05-05 00:46:50
john33- I noticed your P4 compile is much slower than lvqcl's compile, close to Venc . This was running an Athlon64. Seemed strange enough that I re-ran tests on both compiles to confirm.
Title: aoTuVbeta6.02
Post by: IgorC on 2011-05-05 03:00:34
I´ve experienced slow encoding speed too with rarewares compilation on my AMD cpu. http://www.hydrogenaudio.org/forums/index....st&p=726406 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=79682&view=findpost&p=726406)
Title: aoTuVbeta6.02
Post by: Destroid on 2011-05-05 04:17:17
Great, it's the compiler. The Rarewares generic compile gets 10x speed, the SSE's get 12x. Meanwhile the lvqcl gets 26x
Title: aoTuVbeta6.02
Post by: Anakunda on 2011-05-05 09:43:19
I have compared the speeds and lvqcl's oggenc is about 1.5x faster than every other compile.
If raw streams binary compared, each compile generated binary different output, maybe it's due to each compiler operates with different floating point precision.
Title: aoTuVbeta6.02
Post by: john33 on 2011-05-05 09:54:22
john33- I noticed your P4 compile is much slower than lvqcl's compile, close to Venc . This was running an Athlon64. Seemed strange enough that I re-ran tests on both compiles to confirm.

Hmmm, I'll check the compiler options I've used.
Title: aoTuVbeta6.02
Post by: john33 on 2011-05-05 11:06:53
In the meantime you can test my compile (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=32932&view=findpost&p=753991).
It doesn't have built-in FLAC reader and resampler but since you use oggenc2 as encoding backend for foobar2000 they are useless anyway.

What compiler optimisations are you using?
Title: aoTuVbeta6.02
Post by: Anakunda on 2011-05-05 11:49:46
I have compared speeds of my test oggenc compile posted earlier with Rarewarez's and the speeds are up to pair, used IntelC++ with maximum optimizations, I'd be curious too about the LancerMod compiling params, maybe reduced floating precission? But still I don't expect so much speedup from only that.
Title: aoTuVbeta6.02
Post by: Steve Forte Rio on 2011-05-05 13:02:06
Quick comparison (single thread):

john33 (x64) -  45.15x realtime

lvqcl (x64, SSE3) - 60.50x realtime

Windows 7 x64, Intel Core i3 530
Title: aoTuVbeta6.02
Post by: lvqcl on 2011-05-05 17:59:51
Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.

My tests (Core2 Q9300 @2.5 GHz):
Code: [Select]
venc: 20.9x realtime

Rarewares compiles:
generic: 21.2x
P4: 34.5x
x64: 37.1x

My compiles of oggenc2 without code from Lancer:
32-bit: 34.2x
x64: 36.8x
(almost the same as oggenc2 from Rarewares)

My compiles of oggenc2 with code from Lancer (these were uploaded):
32-bit SSE: 38.1x
32-bit SSE2: 46.1x
32-bit SSE3: 46.0x

64-bit SSE2: 47.8x
64-bit SSE3: 48.9x

I DIDN'T test these compiles on AMD processors.


What compiler optimisations are you using?

Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3.
Options:
Whole program optimization = Yes
C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo
Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2

Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options?
Title: aoTuVbeta6.02
Post by: john33 on 2011-05-05 18:45:40
Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.
...

I didn't realise that you had ported some of the Lancer mods.
Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3.
Options:
Whole program optimization = Yes
C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo
Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2

Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options?

Compiler: MSVS 2008 + Intel Compiler 11.1.067.
Options:
Whole program optimization = No
C/C++ optimisation: /O3 /Ob2 /Oi /Ot /Og /Qip /Qfp-speculation:fast
Code Generation: /GF /EHsc /MT /GS /QaxSSSE3 /fp:fast
(That's for x64)
I've not tried fast=2, does that win you anything?

The P4 compile is the same except: /arch:IA32 /QaxSSE2 in place of /QaxSSSE3
Title: aoTuVbeta6.02
Post by: lvqcl on 2011-05-05 19:38:03
Quote
I've not tried fast=2, does that win you anything?

I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).

Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page (http://www.hydrogenaudio.org/forums/index.php?showtopic=74345&st=75)) and test again?
Title: aoTuVbeta6.02
Post by: john33 on 2011-05-05 19:48:15
Quote
I've not tried fast=2, does that win you anything?

I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).
...

I'll give it a try.
Title: aoTuVbeta6.02
Post by: _m²_ on 2011-05-05 20:43:37
Quote
I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).

But how about sound quality? Is it affected? You know, 0.3% ain't much.
Title: aoTuVbeta6.02
Post by: Destroid on 2011-05-06 12:22:22
Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page (http://www.hydrogenaudio.org/forums/index.php?showtopic=74345&st=75)) and test again?
I am sorry to inform that I have not tried compiling these encoders before.
But... I can concur with your some of your other benches:
Quote
My tests (Core2 Q9300 @2.5 GHz):
CODE
venc: 20.9x realtime
...
My compiles of oggenc2 without code from Lancer:
32-bit: 34.2x

If in regards to the ICL "bias" in disfavor of AMD, I'm not 100% sure if this is the case.

Would it be worth asking john33 like to attempt compiles of MSVC that used SSE/2? I thought the generic compile only ended at ASM (just an half-wit suggestion).

edit: lvqcl- just realized it is patch, not compiler thing, report back when later. Also, I seem to recall something about 'early' SSE2 vs. 'true' SSE2 instruction  after all, this is early Athlon64 processor and dilapidated :\

edit2: quick test of iccpatch definitely improved Rarewares P4 compile on my AMD about 15-20 percent at default Vorbis rate -q 3 setting.
Title: aoTuVbeta6.02
Post by: Destroid on 2011-05-06 21:10:15
Back with a new batch of test results. Same commentary track as previous test in this thread but at -q3 (still overkill bitrate). Threw in blacksword lancer, which I included only as a perspective on optimizations.
Code: [Select]
Oggenc 2.83 aoTuv 5 Lancer 20061103 SSE2	31.956x	  89.0 kb/s
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 25.679x  89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH 20.078x   89.9 kb/s
Venc aoTuV 6.03 13.381x   89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 12.335x   89.9 kb/s
The ICCPATCH really does have quite an impact on this particular AMD processor running Rarewares P4 compile.
Title: aoTuVbeta6.02
Post by: Destroid on 2011-05-07 07:03:13
I re-ran the Vorbis tests again, this time at -q2. Tested effect of ICCpatch on lvqcl's compile and changed to last Blacksword compile (1 whole week newer). I was also curious to test LAME compiles from Rarewares with ICCpatch. Here's the results:

Code: [Select]
using test WAV 16bit, 48KHz, 2ch, 1,025,507,372 bytes

encoder & version  (all run at -q2)            time    rate    filesize
_____________________________________________  ______  _______  ________________
Oggenc 2.83 aoTuv 5 Lancer 20061110 SSE2      2m 57s  30.196x  52,521,704 bytes
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2              3m 27s  25.801x  51,621,665 bytes
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 w/ICCpatch  3m 34s  24.959x  51,621,665 bytes
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCpatch    4m 30s  19.782x  51,621,285 bytes
Venc aoTuV 6.03                                6m 22s  13.978x  51,621,326 bytes
OggEnc 2.87 aoTuV 6.03 john33 P4              7m 10s  12.421x  51,621,530 bytes

Foobar2000 bit-compare tracks:
OGG files of lvqcl patched vs. unpatched = No differences in decoded data found
OGG files of john33 patched vs. unpatched = Differences found: 47294972 sample(s), starting at 3.2973333 second(s), peak: 0.0511622 at 4980.8489065 second(s), 2ch


version (all run at -V6)      time    rate    filesize
___________________________  ______  _______  ________________
LAME 3.98.4                  4m 52s  18.256x  60,421,848 bytes
LAME 3.98.4 (ICCpatch)      4m 46s  18.673x  60,421,848 bytes
LAME 3.99 beta 0            6m 29s  13.706x  59,409,552 bytes
LAME 3.99 beta 0 (ICCpatch)  4m 36s  19.306x  59,409,552 bytes

Foobar2000 bit-compare tracks:
MP3 files of 3.98.4 patched vs. unpatched = No differences in decoded data found
MP3 files of 3.99 beta 0 patched vs. unpatched = No differences in decoded data found
Title: aoTuVbeta6.02
Post by: lvqcl on 2011-05-07 10:09:14
Code: [Select]
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2        25.679x   89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH    20.078x      89.9 kb/s

As I said, my compiles (with some optimizations from Lancer) are 25...30% faster than pure C code.  25.679/20.078 = 1.28, as expected.

Code: [Select]
OggEnc 2.87 aoTuV 6.03 john33 P4        12.335x      89.9 kb/s

IMHO using /arch:.... option in addition to (or instead of) /Qax...  should increase encoding speed on non-Intel processors.