Some Japanese guys work on speed optimization of libvorbis by using SSE. Blacksword (or 637) launched an Ogg Vorbis acceleration project (in Japanese only) (http://homepage3.nifty.com/blacksword/) and releases oggenc binary and libvorbis patch (http://homepage3.nifty.com/blacksword/OggEnc_SSE_20041101ArcherB03.zip) based on libvorbis 1.1. This optimization includes SSE implementations of FFT, MDCT, windowing, channel coupling, sorting, psymodel, floor/residue encode, and so on. In my computer (Pentium IV 2.4GHz), ICL8.1 compiled oggenc binary of the optimized version (Archer Beta03) encodes at 23.4x while the one without optimization (ICL8.1 compiled but no SSE patches) does at 15.5x. Hence, this optimization archives ca. 1.5x speed gain.
Unlike GoGo-no-coder, it's not forking: he releases a patch for libvorbis source code without absolutely changing algorithm or data structure. This is very good for source code maintenance to keep up with up-to-date official libvorbis, but limits optimization possibility in some degree. Actually, the author says in readme.txt that there's little room left for optimization. So I think it's time for quality evaluation although this optimization is in development stage. After several bugs are found and fixed for the last week, bitrates are quite similar to the reference encoder for all quality values. If you find any bugs or quality degressions from official 1.1 one, please tell us.
Contributors are:
- Blacksword (or 637)'s SSE optimization (Japanese only) (http://homepage3.nifty.com/blacksword/): A number of functions in libvorbis are vectorized to take advantage of SSE instruction set as well as Opt-Sort and wuvorbis. For complete list of optimized functions, see readme.txt (in Japanese but you may easily find it) attached with the binary.
- Manuke's OptSort (http://www.cug.net/~manuke/vorbis-optsort-en.html): Optimization of qsort function that consumes 20% of compression processing time, by assuming that _vp_quantize_couple_sort and _vp_noise_normalize_sort functions in psy.c call qsort with 8 or 32 element. This accelerates the whole compression process by 10%.
- W.Dee's wuvorbisfile (Japanese only?) (http://kikyou.info/tvp/#side_product): wuvorbis.dll is a fast Ogg Vorbis decoder with SSE and 3DNow!, which is a part of KiriKiri software (useful for developing multi-media contents or adventure games). wuvorbis.dll decodes 1.4x-1.8x faster (SSE) and 1.5x-1.9x faster (3DNow!) than official libvorbis.
Happy encoding!
fefe (http://www.fefe.de/diffs/) was working on a (apparently buggy) SSE optimization of libvorbis too.
Do the optimizations only effect encoding or decoding as well?
I archived almost 100% (rather 85%, actually ) speed incrase (against ICL 8.1 on AMD Athlon XP 1800+)
ICL 8.1: 9,8x
Optimized 18,0x.
Pretty good
Wow Now that is FAST. My results were similar to ilikedirtthe2nd's (actually a little better).
fefe (http://www.fefe.de/diffs/) was working on a (apparently buggy) SSE optimization of libvorbis too.
Do the optimizations only effect encoding or decoding as well?
[a href="index.php?act=findpost&pid=252028"][{POST_SNAPBACK}][/a]
Oh, I didn't know fefe's optimization. I'll check whether it benefits Blacksword's optimization.
IMHO this optimization effects on both encoding and decoding sides although optimized oggdec is not tested or released. Several functions for decodnig (e.g., vorbis_synthesis_blockin, mapping0_inverse, mdct_backward, etc.) are optimized too.
Whoa, it's really fast
On my P4 2.4 GHz:
ICL compiled oggenc from rarewares: 13.2x
SSE optimised oggenc: 20.5x
Pretty nice speedup here too:
oggenc from rarewares 10.4x
SSE optimized 15.3x
Hello!
Well, I have got an older machine (p3 700) and recieved a speedup from 4.4 to 9.3x realtime.
Have you guys tested the SSE2 optimized build at http://homepage3.nifty.com/blacksword/ (http://homepage3.nifty.com/blacksword/)
?
I wonder how big the speedup with this build is for p 4 and amd 64 cpus.
According to my tests...
ICL 8.1 Standard:
File length: 4m 58,0s
Elapsed time: 0m 18,0s
Rate: 16,5778
Average bitrate: 236,7 kb/s
ICL 8.1 Pentium 4:
File length: 4m 58,0s
Elapsed time: 0m 17,0s
Rate: 17,5529
Average bitrate: 236,7 kb/s
SSE:
File length: 4m 58,0s
Elapsed time: 0m 18,0s
Rate: 16,5778
Average bitrate: 236,7 kb/s
SSE2:
File length: 4m 58,0s
Elapsed time: 0m 18,0s
Rate: 16,5778
Average bitrate: 236,7 kb/s
Tested with "Toto - Africa" on a Pentium 4 with 3.2 GHz, 512 MB RAM, running Windows XP Professional Service Pack 1.
I got a good increase, too...
SSE2
File length: 5m 23.0s
Elapsed time: 0m 12.0s
Rate: 26.9556
Average bitrate: 175.3 kb/s
ILC 8.1
File length: 5m 23.0s
Elapsed time: 0m 19.0s
Rate: 17.0246
Average bitrate: 175.3 kb/s
But I can't seem to get it to work on FLAC files...
ERROR: Input file "01.flac" is not a supported format.
Am I missing something??
Thanks,
~esa
:edit: typo
But I can't seem to get it to work on FLAC files...
ERROR: Input file "01.flac" is not a supported format.
Am I missing something??
Standard oggenc doesn't input lossless files directly. Only Oggenc2.3 from rarewares does.
Regards; ilikedirt
But I can't seem to get it to work on FLAC files...
ERROR: Input file "01.flac" is not a supported format.
Am I missing something??
Standard oggenc doesn't input lossless files directly. Only Oggenc2.3 from rarewares does.
Regards; ilikedirt
[a href="index.php?act=findpost&pid=252321"][{POST_SNAPBACK}][/a]
The standard oggenc supports FLAC input perfectly. It's a compile-time option AFAIK.
The standard oggenc supports FLAC input perfectly. It's a compile-time option AFAIK.
[a href="index.php?act=findpost&pid=252328"][{POST_SNAPBACK}][/a]
It sure is.
Standard oggenc doesn't input lossless files directly.
The standard oggenc supports FLAC input perfectly.
Well, I can't say that the issue is any clearer for me now...
It's a compile-time option AFAIK.
That means, oggenc is able to input flac, if this is enabled when compiling. So: generally it is able to read flac, but this version is not.
...oggenc is able to input flac, if this is enabled when compiling. So: generally it is able to read flac, but this version is not.
Ah... thank you for the clarification!
~esa
Have you guys tested the SSE2 optimized build at http://homepage3.nifty.com/blacksword/ (http://homepage3.nifty.com/blacksword/)
?
I wonder how big the speedup with this build is for p 4 and amd 64 cpus.
[a href="index.php?act=findpost&pid=252295"][{POST_SNAPBACK}][/a]
I could not find speed difference between SSE and SSE2 versions on my Pentium IV machine. Is there anybody who gets speed increase? The author wants to know the effect to determine whether if he should continue SSE2 version or not.
According to my tests...
ICL 8.1 Standard:
File length: 4m 58,0s
Elapsed time: 0m 18,0s
Rate: 16,5778
Average bitrate: 236,7 kb/s
SSE:
File length: 4m 58,0s
Elapsed time: 0m 18,0s
Rate: 16,5778
Average bitrate: 236,7 kb/s
[a href="index.php?act=findpost&pid=252297"][{POST_SNAPBACK}][/a]
Are SSE and SSE2 binaries your own builds? If so, don't forget to define a symbol __SSE__ to activate the optimization when compiling.
I got a good increase, too...
ILC 8.1
File length: 5m 23.0s
Elapsed time: 0m 12.0s
Rate: 26.9556
Average bitrate: 175.3 kb/s
SSE2
File length: 5m 23.0s
Elapsed time: 0m 19.0s
Rate: 17.0246
Average bitrate: 175.3 kb/s
But I can't seem to get it to work on FLAC files...
ERROR: Input file "01.flac" is not a supported format.
Am I missing something??
Thanks,
~esa
[{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=252317")
Huh? The ICL 8.1 compile is faster.
Have you guys tested the SSE2 optimized build at [a href="http://homepage3.nifty.com/blacksword/]http://homepage3.nifty.com/blacksword/[/url]
?
I wonder how big the speedup with this build is for p 4 and amd 64 cpus.
[a href="index.php?act=findpost&pid=252295"][{POST_SNAPBACK}][/a]
I could not find speed difference between SSE and SSE2 versions on my Pentium IV machine. Is there anybody who gets speed increase? The author wants to know the effect to determine whether if he should continue SSE2 version or not.
According to my tests...
ICL 8.1 Standard:
File length: 4m 58,0s
Elapsed time: 0m 18,0s
Rate: 16,5778
Average bitrate: 236,7 kb/s
SSE:
File length: 4m 58,0s
Elapsed time: 0m 18,0s
Rate: 16,5778
Average bitrate: 236,7 kb/s
[a href="index.php?act=findpost&pid=252297"][{POST_SNAPBACK}][/a]
Are SSE and SSE2 binaries your own builds? If so, don't forget to define a symbol __SSE__ to activate the optimization when compiling.
[a href="index.php?act=findpost&pid=252344"][{POST_SNAPBACK}][/a]
Nope, they're not my own compiles.
Huh? The ICL 8.1 compile is faster.
Whoops! No, that's a typo... I'll edit immediately...
OK, here are some partial translations:
OggEnc_SSE_20041101ArcherB03.zip
Changes regarding/surrounding comments
Improved low-bitrate quality
Current problems are:
- When encoding at low bitrates, treble quality suffers, and size bloat occurs.
- Could hang immediately on running, depending on the environment
- Bugs due to changes to comment handling?
OK, here are some partial translations:
OggEnc_SSE_20041101ArcherB03.zip
Changes regarding/surrounding comments
Improved low-bitrate quality
Current problems are:- When encoding at low bitrates, treble quality suffers, and size bloat occurs.
- Could hang immediately on running, depending on the environment
- Bugs due to changes to comment handling?
[a href="index.php?act=findpost&pid=252353"][{POST_SNAPBACK}][/a]
Thanks for the translation. I think all of the current problems listed above are solved in Archer B03. These problems existed in Archer B02.
IIRC, SSE2 is optimised for double point precision so maybe there isn't that much difference with SSE since libvorbis doesn't use many of them?
Tested on my AMD64 3400+, 1GB RAM
ICL 8.1:
File length: 4m 27.0s
Elapsed time: 0m 14.0s
Rate: 19.1190
Average bitrate: 132.9 kb/s
ICL 8.1 (John33):
File length: 4m 27.0s
Elapsed time: 0m 11.0s
Rate: 24.3333
Average bitrate: 132.9 kb/s
SSE/SSE2 Optimized:
File length: 4m 27.0s
Elapsed time: 0m 08.0s
Rate: 33.4583
Average bitrate: 132.9 kb/s
SSE2 optimization doesn't change encoding speed
As QK says, there's very little use of double precision in libvorbis, so the use of SSE2 optimisation is virtually a waste of effort.
IIRC, SSE2 is optimised for double point precision so maybe there isn't that much difference with SSE since libvorbis doesn't use many of them?
[a href="index.php?act=findpost&pid=252403"][{POST_SNAPBACK}][/a]
As QK says, there's very little use of double precision in libvorbis, so the use of SSE2 optimisation is virtually a waste of effort.
[a href="index.php?act=findpost&pid=252436"][{POST_SNAPBACK}][/a]
Actually, he expects higher quality (or speed) of float to integer and vice-versa conversion but, at the same time, doubts the effect. I'll tell him these results.
As QK says, there's very little use of double precision in libvorbis, so the use of SSE2 optimisation is virtually a waste of effort.
[a href="index.php?act=findpost&pid=252436"][{POST_SNAPBACK}][/a]
That explains why my SSE and SSE2 tests achieve the same result.
OK, for the newb with no ability for critical thinking (me ), would you recommend switching to this version from "OggEnc v2.3 (libvorbis 1.1.0)"? I'd like to have the extra speed, but if it introduces bugs I can wait
I thought of encoding with both then comparing the files, but the size was a few bytes different and they were not identical (there were 80ish different bytes every Y bytes). What's that about?
OK, for the newb with no ability for critical thinking (me ), would you recommend switching to this version from "OggEnc v2.3 (libvorbis 1.1.0)"? I'd like to have the extra speed, but if it introduces bugs I can wait
I thought of encoding with both then comparing the files, but the size was a few bytes different and they were not identical (there were 80ish different bytes every Y bytes). What's that about?
[a href="index.php?act=findpost&pid=252582"][{POST_SNAPBACK}][/a]
IMO, it's best to stick to the normal compile of oggenc. More testing is required.
I see no speed gain when compared to the Pentium 4 builds from RareWares.
I'd be more interested in decoder speedups - especially for portable devices. Vorbis playback in my Tungsten T3 eats battery like crazy.
Here's a late reply. I tested on two titles and the results look great. The sse2 version offers zero speed increase; the numbers are exactly the same. System: Athlon 64 3000. Turns out I've been previously using Ogg Vorbis 1.1 rc1 from rarewares. Oh well. Quality level is 5.
Die fantastischen Vier - Mein Schwert [hip hop-ish, CD rip]
1.1rc1 - 14,9936
sse/sse2 - 22,9893
G&M Project - Sunday Afternoon (Nu Nrg Mix) [trance, wav previously decoded from mpc q7]
1.1rc1 - 15,7454
sse/sse2 - 27,9919
I was evaluating if I should use ogg or mp3 on my soon to be shipped iRiver , so I will do a lot of transcoding. I don't know if the fact that I am using an allready lossy source accounts for the speed increase.
These speeds even surpass mpc encoding (usually 22-23x)! Lame 3.96.1 clocks in at about 8x for aps and 17x for apfs.
BTW: version "Archer B04" is out, which is claimed to be even a bit faster.
edit2: well, not for me. Speeds are identical to B03.
how should i apply the patch? i get all hunks failed...
using linux, official libvorbis-1.1.0 and the same happens for both B03 and B04
I remeber trying to apply it, there were bunch of whitespace diffs, so try 'patch -l ...'
Oops, actually, it was the case with current svn.
For 1.1.0 running dos2unix on patch should do.
For 1.1.0 running dos2unix on patch should do.
[a href="index.php?act=findpost&pid=253316"][{POST_SNAPBACK}][/a]
oh crap it was that simple... haven't thought of that, thanks. compiling right now
I see no speed gain when compared to the Pentium 4 builds from RareWares.
[a href="index.php?act=findpost&pid=252596"][{POST_SNAPBACK}][/a]
Weird...
BTW: version "Archer B04" is out, which is claimed to be even a bit faster.
edit2: well, not for me. Speeds are identical to B03.
[a href="index.php?act=findpost&pid=253210"][{POST_SNAPBACK}][/a]
I got slight speed increase (23.73x) from B03 (23.37x).
on the first run i got 38.2381x, on the second run 33.4583 which is the same as B03
It looks like the resample option is broken? I get a crash using the resample option on B04. I'm trying to resample a 16 kHZ stereo wav file to a -q0 44100 ogg.
Does anyone have the sse optimizations in the form of a patch to 1.1?
I'd like to try building a linux binary of this.
The patch is the first file on the project web page:
http://homepage3.nifty.com/blacksword/ (http://homepage3.nifty.com/blacksword/)
This is great stuff by the way, I hope that development / testing continues.
The patch is the first file on the project web page:
http://homepage3.nifty.com/blacksword/ (http://homepage3.nifty.com/blacksword/)
This is great stuff by the way, I hope that development / testing continues.
[a href="index.php?act=findpost&pid=254091"][{POST_SNAPBACK}][/a]
I have tryed it and doesn't work for me.
It does compile after some editing, but both enconding and playback are badly broken.
It does compile after some editing, but both enconding and playback are badly broken.
[a href="index.php?act=findpost&pid=254355"][{POST_SNAPBACK}][/a]
Could you give a better description of "badly broken"? Actually I didn't complie B04, but my own compile (ICL8.1) of B03 worked fine.
It looks like the resample option is broken? I get a crash using the resample option on B04. I'm trying to resample a 16 kHZ stereo wav file to a -q0 44100 ogg.
[a href="index.php?act=findpost&pid=254062"][{POST_SNAPBACK}][/a]
Archer Beta05 is released mainly to solve this problem.
- Use of libogg 1.1.2 (version up from 1.1.1)
- Fixed a crash (16 byte-alignement exception) of resample/downmix routines in audio.c (for oggenc and oggdropXPd)
- Update build script for automake/autoconf
- Activated FLAC reading suport in oggenc, using FLAC 1.1.1 (ICL compile)
It does compile after some editing, but both enconding and playback are badly broken.
[a href="index.php?act=findpost&pid=254355"][{POST_SNAPBACK}][/a]
Could you give a better description of "badly broken"? Actually I didn't complie B04, but my own compile (ICL8.1) of B03 worked fine.
[a href="index.php?act=findpost&pid=254474"][{POST_SNAPBACK}][/a]
One thing I forget to mention. It is strongly recommended to use gcc 3.3. The patch does not work with gcc 3.4 and other versions.
Could anybody please provide a linux binary. As my box is using gcc 3.4.3 I am not able to compile it on my own
Thanks
It looks like the resample option is broken? I get a crash using the resample option on B04. I'm trying to resample a 16 kHZ stereo wav file to a -q0 44100 ogg.
[a href="index.php?act=findpost&pid=254062"][{POST_SNAPBACK}][/a]
Archer Beta05 is released mainly to solve this problem.
- Use of libogg 1.1.2 (version up from 1.1.1)
- Fixed a crash (16 byte-alignement exception) of resample/downmix routines in audio.c (for oggenc and oggdropXPd)
- Update build script for automake/autoconf
- Activated FLAC reading suport in oggenc, using FLAC 1.1.1 (ICL compile)
It does compile after some editing, but both enconding and playback are badly broken.
[a href="index.php?act=findpost&pid=254355"][{POST_SNAPBACK}][/a]
Could you give a better description of "badly broken"? Actually I didn't complie B04, but my own compile (ICL8.1) of B03 worked fine.
[a href="index.php?act=findpost&pid=254474"][{POST_SNAPBACK}][/a]
One thing I forget to mention. It is strongly recommended to use gcc 3.3. The patch does not work with gcc 3.4 and other versions.
[a href="index.php?act=findpost&pid=254530"][{POST_SNAPBACK}][/a]
Hi nyaochi
I have tested right now B05 and it applyed and compiled cleanly, but it does have the same problem than B04.
It encodes, but the result is a big file which sounds as noise (using normal oggenc castanets2.ogg is 97247 bytes, using oggenc-sse it is 221705 bytes).
Playing normal ogg files doesn't work either, it sounds as noise too, and segfaults when reaching the end of the file. Vorbisgain segfaults when reaching the end of a file.
I'm on a suse 9.1 linux system, gcc 3.3.3, glibc 2.3.3, libogg 1.1.2, athlon xp 2600 (Barthon core).
This is the gdb output
(gdb) run castanets2.ogg
Starting program: /usr/bin/ogg123 castanets2.ogg
Reading symbols from /usr/lib/libvorbisfile.so.3...(no debugging symbols found)...done.
...
Dispositivo de sonido: Advanced Linux Sound Architecture (ALSA) output
[New Thread 1087495088 (LWP 27008)]
Reproduciendo: castanets2.ogg
Ogg Vorbis stream: 2 channel, 44100 Hz
Tiempo: 00:06,63 [00:00,00] de 00:06,63 ( 0,0 kbps) Búfer de Salida 0,0% (EOS (Fin de flujo))
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1077510816 (LWP 27005)]
0x402e51bd in _int_free () from /lib/tls/libc.so.6
(gdb) bt
#0 0x402e51bd in _int_free () from /lib/tls/libc.so.6
#1 0x402e55fb in free () from /lib/tls/libc.so.6
#2 0x400552ed in vorbis_comment_clear () from /usr/lib/libvorbis.so.0
#3 0x00000000 in ?? ()
#4 0x400386f0 in ?? () from /usr/lib/libvorbisfile.so.3
#5 0x08070590 in ?? ()
#6 0x00000001 in ?? ()
#7 0x40036a96 in ov_clear () from /usr/lib/libvorbisfile.so.3
...
Hi nyaochi
I have tested right now B05 and it applyed and compiled cleanly, but it does have the same problem than B04.
It encodes, but the result is a big file which sounds as noise (using normal oggenc castanets2.ogg is 97247 bytes, using oggenc-sse it is 221705 bytes).
Playing normal ogg files doesn't work either, it sounds as noise too, and segfaults when reaching the end of the file. Vorbisgain segfaults when reaching the end of a file.
I'm on a suse 9.1 linux system, gcc 3.3.3, glibc 2.3.3, libogg 1.1.2, athlon xp 2600 (Barthon core).
[a href="index.php?act=findpost&pid=254556"][{POST_SNAPBACK}][/a]
Thank you for the detailed information. I've just got an email from the author. He found a bug around ov_read function that probably causes your crash. He also told me that he doesn't use Makefile generated by configure script but uses Makefile in Win32_MinGW that is based on a converted project from MSVC to compile it by gcc version 3.3.1 (mingw special 20030804-1).
I suppose linux support of B05 is not enough/adequate at present. So we have to inspect what causes bitrate-bloat/noise problem. Although I have Fedora Core 1 with gcc 3.3.1, unfortunately I'm not familiar with linux programing and have little time to debug it now. The author recognizes this problem but anyone can solve this problem?
I see no speed gain when compared to the Pentium 4 builds from RareWares.
[a href="index.php?act=findpost&pid=252596"][{POST_SNAPBACK}][/a]
Weird...
[a href="index.php?act=findpost&pid=253429"][{POST_SNAPBACK}][/a]
In fact, the SSE/SSE2 optimized versions are slower by about 1x as seen here:
[a href="index.php?act=findpost&pid=252297"][{POST_SNAPBACK}][/a]
does anyone have binary aotuvb3 oggenc w/ sse patch applied?
does anyone have binary aotuvb3 oggenc w/ sse patch applied?
[{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=258635")
I've uploaded linux binaries in this [a href="http://www.hydrogenaudio.org/forums/index.php?showtopic=29974]thread[/url].
thank you
do you have one for windows?
thank you
do you have one for windows?
[{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=259785")
Have a look at this page
[a href="http://homepage3.nifty.com/blacksword/]http://homepage3.nifty.com/blacksword/[/url]
thnx
something wrong was with my eyes... i visited that page earlier, but haven't seen subj
strange thing...
i compressed track via standart aotuvb3 and w/ sse optimized one. then i decompressed them to waves. waves were not the same...
maybe this is not a very fair optimisation?
i compressed track via standart aotuvb3 and w/ sse optimized one. then i decompressed them to waves. waves were not the same...
maybe this is not a very fair optimisation?
[a href="index.php?act=findpost&pid=259798"][{POST_SNAPBACK}][/a]
Even when you use different compilers for the same source code you can get different vorbis streams. So, it's to be expected that assembly optimizations will introduce differences.
It's up to the users to test and see if these differences are noticeable.
I used the build from this url (http://homepage3.nifty.com/blacksword/OggEnc_SSE_20041213ArcherB10.zip).
Here's my results with an Athlon XP 3200+:
ArcherB10 oggenc:
File length: 5m 05.0s
Elapsed time: 0m 11.0s
Rate: 27.7879
Average bitrate: 161.5 kb/s
rarewares icl oggenc:
File length: 5m 05.0s
Elapsed time: 0m 20.0s
Rate: 15.2833
Average bitrate: 152.5 kb/s
I didn't notice other people getting different bitrates out of their tests. I did a simple:
oggenc -q5 testl.wav
Ideas?
Seems like a significant difference. Which specific ICL oggenc from rarewares did you use? There are a couple there.
Ideas?
[a href="index.php?act=findpost&pid=271241"][{POST_SNAPBACK}][/a]
The optimized binary (Archer Beta10) is based on aoTuV b3. Did you use aoTuV b3 binary at rarewares for comparison?
My mistake... correct data:
Done encoding file "testl.ogg"
File length: 5m 05.0s
Elapsed time: 0m 12.0s
Rate: 25.4722
Average bitrate: 161.5 kb/s
Done encoding file "testl.ogg"
File length: 5m 05.0s
Elapsed time: 0m 20.0s
Rate: 15.2833
Average bitrate: 161.5 kb/s
Has any testing been done on these builds with regard to output quality? (ie no noticable differences vs regular 2.1)
BTW, GCC 4.0 alpha snapshot from yesterday compiles the SSE version fine.
BTW, encoding time for a test file went down form 5.5 to 3.0 seconds on my Athlon XP...
Hmm, but on the other hand a non-SSE version compiled with gcc-3.4 only needs 4.3 seconds... so GCC 4.0 still needs a lot of work before it is ready for prime time.
OK, it seems using more conservative flags is better for gcc 4.0: Using
CFLAGS="-O2 -fweb -frename-registers -mno-ieee-fp -D_REENTRANT -fsigned-char -march=athlon-xp -mfpmath=sse -fomit-frame-pointer"
gcc4.0 is about as fast as gcc 3.4.3 w/o SSE.
Do I dare asking John33 for an english OggdropXPd SSE optimized version when the time is ready? Would be like a dream for an Iriver H140 owner
Do I dare asking John33 for an english OggdropXPd SSE optimized version when the time is ready? Would be like a dream for an Iriver H140 owner
[a href="index.php?act=findpost&pid=275610"][{POST_SNAPBACK}][/a]
I wonder if it would be a good idea, considering we still didn't see any listening tests comparing the optimized version versus the official one.
Do I dare asking John33 for an english OggdropXPd SSE optimized version when the time is ready? Would be like a dream for an Iriver H140 owner
[a href="index.php?act=findpost&pid=275610"][{POST_SNAPBACK}][/a]
I'm not averse to the idea. I simply haven't managed to get a clean compile to work with yet!! I keep taking another look, and I'll continue to do so, but until then, it's a 'no can do'!
I wonder if it would be a good idea, considering we still didn't see any listening tests comparing the optimized version versus the official one.
[a href="index.php?act=findpost&pid=275615"][{POST_SNAPBACK}][/a]
You're right. Although, with the short abx tests I did yesterday, I am very satisfied with the quality in the sse version vs aoTuV b3 and official 1.1 - so I would definetly use it. The speed gain compared to a possible difference between the versions (wich I couldn't abx) is reason enough EDIT: for me
I'm not averse to the idea. I simply haven't managed to get a clean compile to work with yet!! I keep taking another look, and I'll continue to do so, but until then, it's a 'no can do'!
[a href="index.php?act=findpost&pid=275616"][{POST_SNAPBACK}][/a]
Fully understandable, John. You're already doing a great job.
I'd be more interested in decoder speedups - especially for portable devices. Vorbis playback in my Tungsten T3 eats battery like crazy.
[{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=252605")
[a href="http://www.fefe.de/diffs/]fefe[/url] was working on a (apparently buggy) SSE optimization of libvorbis too.
Do the optimizations only effect encoding or decoding as well?
[a href="index.php?act=findpost&pid=252028"][{POST_SNAPBACK}][/a]
Oh, I didn't know fefe's optimization. I'll check whether it benefits Blacksword's optimization.
IMHO this optimization effects on both encoding and decoding sides although optimized oggdec is not tested or released. Several functions for decodnig (e.g., vorbis_synthesis_blockin, mapping0_inverse, mdct_backward, etc.) are optimized too.
[a href="index.php?act=findpost&pid=252096"][{POST_SNAPBACK}][/a]
Hello,
I'm newb here, so please be patient
OK, my question is:
Can using these compiles (actually OggEnc_SSE_20041213ArcherB10) instead of "normal one" (I'm using aoTuV b3 - OggEnc Win32 version - from Aoyumi pages) speed up decoding of vorbis when played on portable player (iaudio M3 in my case) and cause lower energy consumption?
Increase of encoding speed isn't important for me, but if this will happend...
One offtopic subquestion : mp3 files plays almost gaplessly comparing to vorbis ones on my player (also when encoded with this SSE comp.) Is this because of slow decoding (with Tremor decoder?), is this a hardware issue (slow processor / vorbis requirements?) or is this a firmware issue (...)?
So maybe I'm totally off topic, maybe not? THX for your reactions.
OK, my question is:
Can using these compiles (actually OggEnc_SSE_20041213ArcherB10) instead of "normal one" (I'm using aoTuV b3 - OggEnc Win32 version - from Aoyumi pages) speed up decoding of vorbis when played on portable player (iaudio M3 in my case) and cause lower energy consumption?
Increase of encoding speed isn't important for me, but if this will happend...
[a href="index.php?act=findpost&pid=276378"][{POST_SNAPBACK}][/a]
encoding and decoding is diffrent process (diffrent engine) so I think decoding speed is a matter of player.
One offtopic subquestion : mp3 files plays almost gaplessly comparing to vorbis ones on my player (also when encoded with this SSE comp.) Is this because of slow decoding (with Tremor decoder?), is this a hardware issue (slow processor / vorbis requirements?) or is this a firmware issue (...)?
[a href="index.php?act=findpost&pid=276378"][{POST_SNAPBACK}][/a]
hmm, Tremor is vorbis decoder so it may be because of mp3 decoders.
Originally mp3 can't play gaplessly (some players are cutting gap when playing, so it's like gapless playing).
Archer Release-Candidate 1 is out.
Thanks. Now is the time to test again.
Archer Release-Candidate 1 is out.
[a href="index.php?act=findpost&pid=281497"][{POST_SNAPBACK}][/a]
Accidentally I found very strange (rare) bug in Archer RC1:
With one sample (Laurie Anderson / Big Science / song no. 08 - Let X=X) the encoder fail, but only when -q4 is used. (e.g. -q4,1; -q3 -q5 etc. makes no problem)
Lenght of the song is 3:54, when encoding fail, it ends at 3:27 (whole song till this point is encoded and tags are properly added). Doesn't matter if the source for encoding is wav or flac. It happend with EAC as well as with Foobar:
INFO (foo_clienc) : CLI encoder: C:\Program Files\Eac\Encoders\Vorbis\OggEnc_SSE_20050312ArcherRC1\oggenc.exe
INFO (foo_clienc) : Destination file: file://C:\Documents and Settings\Martin Radimecky\My Documents\My Music\OGG\Rock\Anderson, Laurie\Big Science\08 - Let X=X.ogg
INFO (foo_clienc) : Source file: file://C:\Documents and Settings\Martin Radimecky\My Documents\My Music\FLAC\Rock\Anderson, Laurie\Big Science\08 - Let X=X.flac
INFO (foo_clienc) : 44100Hz 32bps 2ch
ERROR (foo_clienc) : Writing to encoder failed
INFO (foo_clienc) : Encoding took 10828 milliseconds, speed 19.24x
INFO (CORE) : attempting to edit file info : file://C:\Documents and Settings\Martin Radimecky\My Documents\My Music\OGG\Rock\Anderson, Laurie\Big Science\08 - Let X=X.ogg
INFO (CORE) : file info update successful on : file://C:\Documents and Settings\Martin Radimecky\My Documents\My Music\OGG\Rock\Anderson, Laurie\Big Science\08 - Let X=X.ogg
ERROR (foo_diskwriter) : Conversion failed.
Encoding does not fail when another compiles (e.g. oggenc2.41-aoTuVb3P3 from RW or Aoyumi reference compile) are used.
The strangest thing is, that only the full lenght wav must be used to cause the fail. I tried to isolate just part of the sample which causes the fail for uploading it here, but it encodes without problems. Even when small part is cut off from very beginning of the wav, it encodes well. But when the whole wav is resaved, the problem stays the same.
(the whole sample in flac is 20 Mb, so I can't upload it here)
Edit: oops, I forgot this: OggEnc_SSE_20041213ArcherB10 perform without problems on this sample !?!
Edit 2: Anyway, encoding speed is amazing
I also have a WAV which fails to encode with RC1 (doesn't dump any error message, just creates a dummy 0 bytes big OGG file) but with previous versions encodes just fine.
I can confirm the bug here too:
Opening with wav module: WAV file reader
Encoding "Wilco - Spiders (Kidsmoke).wav" to
"Wilco - Spiders (Kidsmoke).ogg"
at quality 4.00
[ 52.3%] [ 0m08s remaining] \
The encoder cuts out at exactly that spot every time. The file is fine and playable even, however the encoder stops at that point. I would assume it is a sample problem, as the part of the song it fails in would probably pose a problem to the encoder. I'm not sure though. I did test the encoder on about 10 other files of varying length and genre. All of the other files encoded without fail.
I agree though, the speedup of this encoder over the standard ICL auTov encode is amazing on my A64 3500+ :
Opening with wav module: WAV file reader
Encoding "Death Cab for Cutie - Stability.wav" to
"Death Cab for Cutie - Stability.ogg"
at quality 4.00
[100.0%] [ 0m00s remaining] /
Done encoding file "Death Cab for Cutie - Stability.ogg"
File length: 12m 21.0s
Elapsed time: 0m 20.0s
Rate: 37.0800
Average bitrate: 116.6 kb/s
20 seconds for a twelve and half minute song, nice!
20 seconds for a twelve and half minute song, nice!
[a href="index.php?act=findpost&pid=281642"][{POST_SNAPBACK}][/a]
Well, but it's not applicable (specially to batch encoding) with such unpredictable results as posted above
Archer RC2 is out.
Archer RC2 is out.
[{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=283272")
Regrettably, exactly the same problem (with the same sample) as RC1 detected here
(RC1 bug report can be found [a href="http://www.hydrogenaudio.org/forums/index.php?showtopic=29161&view=findpost&p=281593]here[/url])
Regrettably, exactly the same problem (with the same sample) as RC1 detected here
(RC1 bug report can be found here (http://www.hydrogenaudio.org/forums/index.php?showtopic=29161&view=findpost&p=281593))
[a href="index.php?act=findpost&pid=283279"][{POST_SNAPBACK}][/a]
What's the point of posting the report at a forum the developer probably doesn't read?
If I were you I would send him an e-mail, and hope that he speaks at least some english.
If I were you I would send him an e-mail, and hope that he speaks at least some english.
I just sent him an email referencing this thread and that specific post. It would probably be helpful if someone could supply a test file.
Edit: I finally found a file that I have that crashes the encoder. It's track 14 off the Pain of Salvation - Be album. Let's see.. crash at around 65,3% completed....
Edit 2: Very tricky to pin down. This track only crashes at -q 3 of the different qualities I tried.
Edit 3: Okay, running under debugger:
"oggenc_archer.exe The instruction at 0x0042D568 referenced memory at 0xBF4EB730. The memory could not be read."
.text:0042D512 cvtss2si ecx, [eax+edi*4+0Ch]
.text:0042D518 cvtss2si ebx, [eax+edi*4+8]
.text:0042D51E cvtss2si esi, [eax+edi*4+4]
.text:0042D524 cvtss2si eax, [eax+edi*4]
.text:0042D529 mov edi, [esp+50h+var_20]
.text:0042D52D add ecx, edi
.text:0042D52F add ebx, edi
.text:0042D531 add esi, edi
.text:0042D533 add edi, eax
.text:0042D535 mov eax, [esp+50h+var_18]
.text:0042D539 imul eax, [edx+8]
.text:0042D53D mov [esp+50h+var_20], edi
.text:0042D541 mov edi, [esp+50h+var_14]
.text:0042D545 add eax, [edi+ecx*4]
.text:0042D548 imul eax, [edx+8]
.text:0042D54C add eax, [edi+ebx*4]
.text:0042D54F imul eax, [edx+8]
.text:0042D553 add eax, [edi+esi*4]
.text:0042D556 imul eax, [edx+8]
.text:0042D55A mov edx, [esp+50h+var_20]
.text:0042D55E add eax, [edi+edx*4]
.text:0042D561 mov edx, [esp+50h+var_10]
.text:0042D565 mov edx, [edx+8]
.text:0042D568 cmp dword ptr [edx+eax*4], 0 <--------------
.text:0042D56C jle loc_42D6FE
.text:0042D572 mov edx, [ebp+arg_C]
.text:0042D575 mov ecx, [edx+10h]
It's a function that starts at 0x42D2FC and takes four parameters. Seems to only get called explicitly from one place, but its address is taken twice, so it could called as a function pointer too?. I'm not familiar enough with the code to identify it any further, and I don't think I even have the tools to build the source.
Alright, the author got back to me. I'm going to assume he won't mind me posting his reply here:
"This BBS is seen.
This problem occurs in local_book_besterror_dim8.
I received the data which a problem occurs by RC1.
The data can be normally encoded by RC2.
The samples of the data which a problem occurs are insufficient. "
I'm going to try and figure out if he's got the bandwidth to recieve "samples" from us or not, though it doesn't seem _too_ hard to reproduce if you've got one or two whole albums to encode at different quality settings.
Edit: Some potentially good news:
"Probably, it will be unnecessary.
I found the clear problem in local_book_besterror_dimX. "
Looking forward to RC3.
Edit 2: Got to run a pre-release of RC3 where the bug seems to have been fixed --- at least my only test-case is working now. Furthermore, this does not seem to have impaired encoding speed at the least. (I do however detect a slight change in file size)
What's the point of posting the report at a forum the developer probably doesn't read?
[a href="index.php?act=findpost&pid=283291"][{POST_SNAPBACK}][/a]
My point was to warn other people against using the compile, because I think it's just fortuity to find this bug (in case of RC1 I encoded 8 albums without any problems before) and give evidence that RC2 does not solve the problem (it was easy for me check this because I know the "wrong" sample). Maybe I'm naive, but it's so catchy to use compile like this...
If I were you I would send him an e-mail, and hope that he speaks at least some english.
Of course you are right. I wasn't quick enough (as posted above)
Edit 2: Got to run a pre-release of RC3 where the bug seems to have been fixed --- at least my only test-case is working now. Furthermore, this does not seem to have impaired encoding speed at the least. (I do however detect a slight change in file size)
[a href="index.php?act=findpost&pid=283381"][{POST_SNAPBACK}][/a]
Here the problem seems to be fixed too (with Oggenc_rc3_pre01.exe on the same sample like before). I've sent email to blacksword about this result too.
I'm looking forward to RC3
Archer RC3 is out.
F:\wav\archer>oggenc_archer -v
OggEnc v1.1 (Archer RC1 based on AoTuV Beta03)
Still displaying the wrong version, but at least the files are tagged correctly. Odd that these are different strings.
Well, bad news I think - my WAV still doesn't encode with RC3 (outputs a 0 bytes dummy OGG). What I noticed is that sampling rate of that WAV is 32KHz, I tried with another 32KHz file and it didn't encode either, so it looks like a problem with this certain sampling-rate. I also tried some 22KHz, 44KHz, and 48KHz files and they encode fine.
EDIT: I checked with -q-2 only.
I can confirm that 32KHz files don't work at all at negative quality settings:
.text:0041A53D mov eax, [esp+60h+var_24]
.text:0041A541 mov edi, [eax+esi*4] ; <-------- crash at negative quality, 32KHz
.text:0041A544 movss xmm1, dword ptr [ebx+edi*4]
.text:0041A549 mov edx, [ebx+edi*4]
.text:0041A54C movss xmm0, xmm1
.text:0041A550 mulss xmm0, xmm0
Looks like the base register isn't set up correctly (eax).
Hopefully it's just a problem with the loader.
edit:
F:\wav\archer>oggenc_archer --resample 32000 -q -1 Posbe14.wav
Opening with wav module: WAV file reader
Resampling input from 44100 Hz to 32000 Hz
Encoding "Posbe14.wav" to "Posbe14.ogg" at quality -1,00
<crash>
No such luck.
Samplerates < 26000 and > 39999 == "works" (encoder doesn't crash).
Edit 2:
"Root cause has become clear.
I was not testing 32KHz wav file.
In this case, (loop count mod 16) was not zero in
_vp_noise_normalize.
This question is corrected by RC4." -- Mebius1
... and RC4 is out (http://homepage3.nifty.com/blacksword/).
Seems to work fine now
Bump for new version of Lancer 2005028 Release (Based on aotuv-pb4_20050412).
oh great....you made me wet my pants again
EDIT: Encoding from a pipe appears to be broken in this release
Speed increased slightly on my system (AMD XP 1800+):
from 19.8x to 20.4x (3% speedup).
("Archer RC4" against "Lancer")
Run with the input file disk-cache hot.
Archer -q 6: 22,8144x, Average bitrate: 199,3 kb/s
Lancer -q 6: 23,2464x, Average bitrate: 195,5 kb/s
These speeds are wicked fast, so fast that any improvement is basically unnecessary.
I'd be more interested in what could be done to the decoder. I fear that the next generation sound cards and game consoles will have _greatly_ accelerated hardware decoding and mixing of "mp3", which might slow down or even _revert_ vorbis adoption by game devs -- which I consider today the largest and most important "market" where vorbis is successfully competing.
It would be a shame to see that happen. :-(
The next generation consoles won't ne pushing mp3. Microsoft will certainly push wma for Xbox titles and Sony its own Atrac3.
Speed increased slightly on my system (AMD XP 1800+):
from 19.8x to 20.4x (3% speedup).
("Archer RC4" against "Lancer")
[a href="index.php?act=findpost&pid=301129"][{POST_SNAPBACK}][/a]
Why is the bitrate different? Rounding errors? If so - are theese versions safe to use?
Speed increased slightly on my system (AMD XP 1800+):
from 19.8x to 20.4x (3% speedup).
("Archer RC4" against "Lancer")
[a href="index.php?act=findpost&pid=301129"][{POST_SNAPBACK}][/a]
Why is the bitrate different? Rounding errors? If so - are theese versions safe to use?
[a href="index.php?act=findpost&pid=301224"][{POST_SNAPBACK}][/a]
If you mean bitrate diference between Archer x Lancer, the reason of course is the different version of the encoder (AoTuv b3 x AoTuv pb4), otherwise i didn't find any bitrate or filesize difference between Lancer [20050528] x original AoTuV pb 4 [20050412] (on which Lancer is based)
BTW I love it. I didn't expect they will release it so quickly. WONDERFUL !!!
are theese versions safe to use?
[a href="index.php?act=findpost&pid=301224"][{POST_SNAPBACK}][/a]
We would need some listening tests to be sure...
are theese versions safe to use?
[a href="index.php?act=findpost&pid=301224"][{POST_SNAPBACK}][/a]
We would need some listening tests to be sure...
[a href="index.php?act=findpost&pid=301284"][{POST_SNAPBACK}][/a]
Not really, if you can determine that it produces identical output as AoTuv pb4 then you only really need to perform listening tests on AoTuv pb4 or Lancer.
EDIT: Encoding from a pipe appears to be broken in this release
[a href="index.php?act=findpost&pid=301090"][{POST_SNAPBACK}][/a]
Pipe seems to work fine here.
are theese versions safe to use?
[a href="index.php?act=findpost&pid=301224"][{POST_SNAPBACK}][/a]
We would need some listening tests to be sure...
[a href="index.php?act=findpost&pid=301284"][{POST_SNAPBACK}][/a]
Not really, if you can determine that it produces identical output as AoTuv pb4 then you only really need to perform listening tests on AoTuv pb4 or Lancer.
[a href="index.php?act=findpost&pid=301289"][{POST_SNAPBACK}][/a]
Yeah, the point is that AoTuv & Archer/Lancer outputs are not identical. I don't know if due to different compilers or just the fact that SSE instructions are used, but they never were identical IIRC.
the bitrates are identical
but the resulting files differ in filesize by a few bytes (between lancer and aotuv pb4), i can't explain why
Is different vendor strings may be the cause of it (the few bytes difference)?
And yeah, piping works well in this version.
Is different vendor strings may be the cause of it (the few bytes difference)?
[a href="index.php?act=findpost&pid=301336"][{POST_SNAPBACK}][/a]
Nope, actual audio data differs too.
But the differences are only sporadic.
If you do a wave subtraction, you will see large amounts of absolute silence and a number of spikes.
I raised the question here (http://www.hydrogenaudio.org/forums/index.php?showtopic=32764) but there was no final answer.
the size of aoTuV pre-beta4 [20050412] is 1.36 Mo
and the size of Lancer [20050528] is 401 Ko
why is there a such big difference of size?
Lancer is probably packed with UPX or something and aoTuV is not.
Lancer is probably packed with UPX or something and aoTuV is not.
[a href="index.php?act=findpost&pid=304424"][{POST_SNAPBACK}][/a]
Lancer's DLL's are pretty big too, 3 meg unpacked, 450kb zipped. Beats me why they're that inflated.
Current Lancer (Archer) version is stable release? Why changed name of this tune?
The change of the name form Archer to Lancer is because it uses the AoTuV pre-beta 4, the Archer versions uses Beta 3... thats why the change of the name, or at least I suppose that... it is pretty stable but it maybe have various bugs to be fixed... and in the other hand it uses a pre-beta which maybe have some others problems...
new version is out,
Lancer 20050621(Based on aotuv-b4_20050617)
Many thanks for the heads up. =)
Will test it out in a moment...
Almost a week has passed. Any test results?
Almost a week has passed. Any test results?[a href="index.php?act=findpost&pid=309178"][{POST_SNAPBACK}][/a]
I use it regularly without any problems (mostly q3 & q4), the speed is fantastic. Also I've noticed that it uses only about 90% CPU when running on my AMD 2700+ machine, as Archer spent full power.
I use it regularly without any problems (mostly q3 & q4), the speed is fantastic. Also I've noticed that it uses only about 90% CPU when running on my AMD 2700+ machine, as Archer spent full power.
[a href="index.php?act=findpost&pid=309198"][{POST_SNAPBACK}][/a]
Ooh. That probably implies that it's fast enough that it's actually sometimes blocking on IO. Impressive!
Almost a week has passed. Any test results?
[a href="index.php?act=findpost&pid=309178"][{POST_SNAPBACK}][/a]
Well, compared to the Lancer that uses aoTuV pre-beta 4, no real noticeable difference.
U might wanna check that out a few posts (or pages?) back about previous Lancer's performance.
To the Archer, it's just like Josef said.
Anyway, the speed gain (from the auTuVb4 compiles found in Rarewares.org) on slower PIII systems is adequate.
(I tested it on a PIII 600MHz)
No numbers yet (since I kinda forgot...), but I think it was about 1.15x to 1.30x faster.
As always, cmiiw. =)
From http://www.tom.womack.net/x86FAQ/faq_features.html (http://www.tom.womack.net/x86FAQ/faq_features.html)
For the P3, Intel skimped somewhat on the implementation, using only a two-wide ALU, so the average performance of SSE and 3DNow will be the same - I've constructed sequences of instructions which are faster on 3DNow. It's possible they'll use a four-wide one on later chips, which would make SSE roughly twice as fast as 3DNow.
This is also why P3's are notoriously poor on certain games such as UT2003/4.
New Lancer build based on the aoTuVb4 library merged with libvorbis 1.1.1. Previous was based on aoTuVb4 with libvorbis 1.1.0
http://homepage3.nifty.com/blacksword/index_e.htm (http://homepage3.nifty.com/blacksword/index_e.htm) as always
Just tried the latest version to replace the built in Foobar and it's well over twice as fast, especially when converting from FLAC. Amazing!
Just finished a test with besweet. Encoding time dropped from 1:54 to 1:13 (mins:secs).
FYI - New 'Lancer' builds of oggdropXPd v1.8.6 and libvorbis.dll
http://homepage3.nifty.com/blacksword/index_e.htm (http://homepage3.nifty.com/blacksword/index_e.htm) as always
I have a question. Will Lancer and P-III optimized (from rarewares.org) versions work and have any gain on Celeron 128kb cache (not Tualatin)?
I have a question. Will Lancer and P-III optimized (from rarewares.org) versions work and have any gain on Celeron 128kb cache (not Tualatin)?
[{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=324999")
Only if your Celeron support SSE.... try to use a program like [a href="http://www.cpuid.org/cpuz.php]cpuz[/url] to see if it have SSE instruction.
Got a question guys
I did some encodings with foobar0.8.3 and about 25 different songs,using built in vorbis which is 1.1 and using lancer which is merged 111 and aotuvb4 and im getting increase on all samples by 5-10% in size,actuaally 20 samples were 10% and rest was between5-10%.It was a classic rock songs,using p4 and xp-sp2.
It is a quite faster which is nice,but what confusing me is that on beginning of the thread all tests shows the same bitrate-i encoded at q5 everything was default from foobar
By 'built-in', did u mean the official vorbis libraries?
The difference is probably caused by the different tunings used.
The official one doesn't use AoTuV b4 tunings yet.
[span style='font-size:8pt;line-height:100%'](is it still b2 or something...? I forgot...)[/span]
By 'built-in', did u mean the official vorbis libraries?
The difference is probably caused by the different tunings used.
The official one doesn't use AoTuV b4 tunings yet.
[span style='font-size:8pt;line-height:100%'](is it still b2 or something...? I forgot...)[/span]
[a href="index.php?act=findpost&pid=325060"][{POST_SNAPBACK}][/a]
I meant the one which comes by default in foobar allready configured in diskwriter
mrq and foobar reports it as 1.1.
Both were encoded with default preferencies-q5 and no other parametars
The bitrate difference you're seeing is aotuv vs 1.1, lancer may change bitrates compared to the regular aotuv, but only by a tiny bit.
I meant the one which comes by default in foobar allready configured in diskwriter
mrq and foobar reports it as 1.1.
Both were encoded with default preferencies-q5 and no other parametars
[a href="index.php?act=findpost&pid=325272"][{POST_SNAPBACK}][/a]
Exactly.
I don't think that version (1.1.0) already use AoTuV b4 tunings.
So if the resulting file size difference is quite big, it's probably 'cause of different tunings used.
I have a question. Will Lancer and P-III optimized (from rarewares.org) versions work and have any gain on Celeron 128kb cache (not Tualatin)?
[{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=324999")
Only if your Celeron support SSE.... try to use a program like [a href="http://www.cpuid.org/cpuz.php]cpuz[/url] to see if it have SSE instruction.
[a href="index.php?act=findpost&pid=325044"][{POST_SNAPBACK}][/a]
Yes, CPUZ says my CPU has SSE, I tried 'Lancer oggenc' and it is realy faster than P-III compile.
However I also tried 'Lancer OggDropXPd' and it doesn't work.
When I drop wav's in it nothing happens. Does anybody know why?
My PC is Intel Celeron 1000 MHz (not Tualatin) with Windows 98SE. If anybody have Windows 98SE installed - please check - does 'Lancer OggDropXPd' work?
Thanks.
My PC is Intel Celeron 1000 MHz (not Tualatin) with Windows 98SE.
Try these updates (http://www.msfn.org/board/index.php?showforum=91)...
Nice speed incrase here on AMD x2 4400+
aoTuVb4 - no enhancements (oggenc)
File length: 72m 28.0s
Elapsed time: 5m 07.0s
Rate: 14.1643
Average bitrate: 192.2 kb/s
aoTuVb4 - SEE version (oggenc2)
File length: 72m 28.0s
Elapsed time: 4m 16.0s
Rate: 16.9861
Average bitrate: 192.2 kb/s
aoTuVb4 - SEE2 version (oggenc2)
File length: 72m 28.0s
Elapsed time: 3m 30.0s
Rate: 20.7068
Average bitrate: 192.2 kb/s
lancer20050709 (oggenc2)
File length: 72m 28.00s
Elapsed time: 2m 3.66s
Rate: 35.1655
Average bitrate: 192.2 kb/s
Is this optimizing done via implementing SSE and SSE2 only, or also via assembling some parts of code?
Can such optimizing work be done with Ogg Vorbis decoder?
I recall reading vorbis was all x87 code, that is, floating point, but not accelerated.
What the ICC compiler does is a process called autovectorisation, it's a very clever piece of software that examines routines and attempts to implement them using the faster SSE(2) instructions. At least, that is how i understand it.
What lancer does is replace certain standard routines in vorbis with hand written SSE implementations. This is not assembly (i think), but it is vectorisation (making use of SSE) done by a human.
The SSE instruction set works at a lower precision than the regular x87 instructions, but i don't think that's ever reduced sound quality in a noticeable way.
I'm not an expert on this, but i hope this explanation is accurate enough to answer your questions.
There is also an accelerated vorbis decoder, look at the first post of this topic.
- W.Dee's wuvorbisfile (Japanese only?) (http://kikyou.info/tvp/#side_product): wuvorbis.dll is a fast Ogg Vorbis decoder with SSE and 3DNow!, which is a part of KiriKiri software (useful for developing multi-media contents or adventure games). wuvorbis.dll decodes 1.4x-1.8x faster (SSE) and 1.5x-1.9x faster (3DNow!) than official libvorbis.
Babelfish translation (http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=ja_en&url=http%3A%2F%2Fkikyou.info%2Ftvp)
I'll see if i can get this to work and bench it.
EDIT: can't make much sense of the japanese even with babelfish, the .dll supplied at least doesn't work as a regular vorbisfile.dll or vorbis.dll.
@toot - very impressive speedup, imagine if it were multithreading!
Lancer 20051118 is out
http://homepage3.nifty.com/blacksword/ (http://homepage3.nifty.com/blacksword/)
I recall reading vorbis was all x87 code, that is, floating point, but not accelerated.
What the ICC compiler does is a process called autovectorisation, it's a very clever piece of software that examines routines and attempts to implement them using the faster SSE(2) instructions. At least, that is how i understand it.
What lancer does is replace certain standard routines in vorbis with hand written SSE implementations. This is not assembly (i think), but it is vectorisation (making use of SSE) done by a human.
The SSE instruction set works at a lower precision than the regular x87 instructions, but i don't think that's ever reduced sound quality in a noticeable way.
I'm not an expert on this, but i hope this explanation is accurate enough to answer your questions.
[a href="index.php?act=findpost&pid=336587"][{POST_SNAPBACK}][/a]
Basically nothing you said was correct.
1) autovectorisation is not the same as using SSE or SSE2 instructions
2) hand written SSE implementations are assembly or intrinsics
3) hand written (or automatically generated) SSE does *not* imply vectorisation
4) SSE or SSE2 does not automatically imply lower precision than floating point.
Lancer 20051118 is out
http://homepage3.nifty.com/blacksword/ (http://homepage3.nifty.com/blacksword/)
[a href="index.php?act=findpost&pid=342886"][{POST_SNAPBACK}][/a]
Fantastic! Thanks to all people involved!
Here is a small Ogg Vorbis CLI encoder speed comparison between John33 and Lancer builds:
long_code_here = ';
oggenc2.6-aoTuVb4.5generic.exe
Elapsed time: 0m 11.0s
Rate: 10.0169
Average bitrate: 151.0 kb/s
oggenc2.6-aoTuVb4.5P4.exe
Elapsed time: 0m 07.0s
Rate: 15.7409
Average bitrate: 151.0 kb/s
OggEnc_SSE_20041213ArcherB10.exe
Elapsed time: 0m 05.0s
Rate: 22.0373
Average bitrate: 148.3 kb/s
OggEnc_SSE_20050320ArcherRC4.exe
Elapsed time: 0m 05.0s
Rate: 22.0373
Average bitrate: 148.3 kb/s
oggenc2_lancer20050528_1.exe
Elapsed time: 0m 4.44s
Rate: 24.8335
Average bitrate: 141.0 kb/s
oggenc2_lancer20050621.exe
Elapsed time: 0m 4.30s
Rate: 25.6426
Average bitrate: 151.0 kb/s
oggenc2_lancer20050709.exe
Elapsed time: 0m 4.23s
Rate: 26.0242
Average bitrate: 151.0 kb/s
oggenc2_lancer20051118.exe
Elapsed time: 0m 4.27s
Rate: 25.8290
Average bitrate: 151.0 kb/s
Test environment:
Pentium4 2.4GHZ, Windows XP SP2, 512MB ddr266 sdram.
Test with 18.5 MB, 44.1khz, stereo, 1min 50sec audio file, and -q4 switch.
NOTE: result above might not accurate...
I recall reading vorbis was all x87 code, that is, floating point, but not accelerated.
What the ICC compiler does is a process called autovectorisation, it's a very clever piece of software that examines routines and attempts to implement them using the faster SSE(2) instructions. At least, that is how i understand it.
What lancer does is replace certain standard routines in vorbis with hand written SSE implementations. This is not assembly (i think), but it is vectorisation (making use of SSE) done by a human.
The SSE instruction set works at a lower precision than the regular x87 instructions, but i don't think that's ever reduced sound quality in a noticeable way.
I'm not an expert on this, but i hope this explanation is accurate enough to answer your questions.
[a href="index.php?act=findpost&pid=336587"][{POST_SNAPBACK}][/a]
Basically nothing you said was correct.
1) autovectorisation is not the same as using SSE or SSE2 instructions
2) hand written SSE implementations are assembly or intrinsics
3) hand written (or automatically generated) SSE does *not* imply vectorisation
4) SSE or SSE2 does not automatically imply lower precision than floating point.
[a href="index.php?act=findpost&pid=342888"][{POST_SNAPBACK}][/a]
To explain:
3DNow, SSE, SSE2 are alternate instruction sets for floating point processing. These instruction sets have some major advantages over the old x87 mode:
1) They have register based access, instead of stack based
2) They have the *possibility* to operate on 2 or 4 values at the same time (vectorisation)
SSE and 3DNow have 32 bit accuracy, SSE2 has 64 bit accuracy. x87 has 32 or 64 bit accuracy and a possibility (that shouldn't be used and I'm pretty sure vorbis doesn't use it!) to do 80 bit accuracy arithmetic.
Using these instruction sets can be done in the following manner: code for them manually (in assembler or with instrinsics), use a compiler that can use the SSE(2) instructions for floating point instead of x87, or use a compiler than can *vectorize* computations for SSE/SSE2.
Currently (besides manually writing in assembly), ICC is the best at vectorization, and some very recent GCC's have the possibility too. MSVC2005 and older GCC's have the possibility to generate SSE(2) floating point instructions (without vectorisation).
Lancer 20051118 is out
http://homepage3.nifty.com/blacksword/ (http://homepage3.nifty.com/blacksword/)
[a href="index.php?act=findpost&pid=342886"][{POST_SNAPBACK}][/a]
For some reason on the site right now the links are crossed out and removed.
EDIT: I see why now. aoTuV b4.51 came out.
Lancer 20051118 is out
http://homepage3.nifty.com/blacksword/ (http://homepage3.nifty.com/blacksword/)
[{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=342886")
really ?
[a href="http://homepage3.nifty.com/blacksword/index_e.htm]Ogg Vorbis acceleration project[/url]
they must've pulled it, maybe since 4.51 bugfix was released almost simultaneously.
they must've pulled it, maybe since 4.51 bugfix was released almost simultaneously.
[a href="index.php?act=findpost&pid=343048"][{POST_SNAPBACK}][/a]
It looks like it.. according to google's surprisingly legible translation..
November of 2005 19th
Release is discontinued to completion of the aoTu V beta4.51 base.
Thanks for the explanation, Garf. Reading mostly about the use of 3DNow/SSE with regard to 3D work i didn't realise vectorisation was only a possibility. Or that x87 had precisions other than 80 bit.
OK, lancer_20051121 patches against aotuv4.51 are out.
Can I batch the source under the Linux ?
What would be the exact command and what I need (besides aotuv source) ?
I have downloaded Lancer_20051121 and tested the OggEnc2.exe. Here's the log of what I have done (_j33 is John33's compiled, _lancer is Lancer version):
D:\Music\!Reprocess>oggenc_j33 -q 2 --output=Mamma_Mia_j33.ogg "ABBA - Mamma Mia
.wav"
Opening with wav module: WAV file reader
Encoding "ABBA - Mamma Mia.wav" to
"Mamma_Mia_j33.ogg"
at quality 2.00
[ 99.7%] [ 0m00s remaining] -
Done encoding file "Mamma_Mia_j33.ogg"
File length: 3m 32.0s
Elapsed time: 0m 16.0s
Rate: 13.3078
Average bitrate: 101.4 kb/s
D:\Music\!Reprocess>oggenc_lancer -q 2 --output=Mamma_Mia_lancer.ogg "ABBA - Mam
ma Mia.wav"
Opening with wav module: WAV file reader
Encoding "ABBA - Mamma Mia.wav" to
"Mamma_Mia_lancer.ogg"
at quality 2.00
[ 99.7%] [ 0m00s remaining] -
Done encoding file "Mamma_Mia_lancer.ogg"
File length: 3m 32.00s
Elapsed time: 0m 8.78s
Rate: 24.2484
Average bitrate: 101.4 kb/s
Wow! It's amazingly fast (I use my brother's AthlonXP 2400+). However, the next step I took made me pause:
D:\Music\!Reprocess>dir M*.ogg
Volume in drive D is Data
Volume Serial Number is 20E6-C9A1
Directory of D:\Music\!Reprocess
2005-11-25 01:10 2,701,832 Mamma_Mia_j33.ogg
2005-11-25 01:10 2,701,784 Mamma_Mia_lancer.ogg
2 File(s) 5,403,616 bytes
0 Dir(s) 2,493,083,648 bytes free
Whoa! Significant difference? Can't be because of different comment, no? I go check with EditPlus, and I think the files
mostly are identical. So I decode both to WAVs and got the same size:
D:\Music\!Reprocess>dir *.wav
Volume in drive D is Data
Volume Serial Number is 20E6-C9A1
Directory of D:\Music\!Reprocess
2005-11-15 02:42 37,560,040 ABBA - Mamma Mia.wav
2005-11-25 01:18 37,560,040 Mamma_Mia_j33.wav
2005-11-25 01:18 37,560,040 Mamma_Mia_lancer.wav
4 File(s) 253,711,964 bytes
0 Dir(s) 2,493,083,648 bytes free
Same as original. Not knowing what else to do, I try EAQUAL:
D:\Music\!Reprocess>eaqual -fref Mamma_Mia_j33.wav -ftest Mamma_Mia_lancer.wav
EAQUAL - Evaluation of Audio Quality
Version: 0.1.3alpha
Author: Alexander Lerch, zplane.development
_______________________________________________________
Reference File: Mamma_Mia_j33.wav
Test File: Mamma_Mia_lancer.wav
Sample Rate: 44100
Number of Channels: 2
Press Escape to cancel...
Processed: 212.93 seconds of audio file
Time elapsed: 82.25
Resulting ODG: 0.11
Resulting DIX: 3.64
BandwidthRef 16082.5596
BandwidthTest 16082.5192
NMR -34.2508
WinModDiff1 0.3679
ADB -0.1596
EHS 0.0345
AvgModDiff1 0.1880
AvgModDiff2 0.3213
NoiseLoud 0.0132
MFPD 0.9995
RDF 0.0000
And it seems there
are differences.
I tried listening to the results but to my ears they sound the same.
Anyone can shed a light as to why they differ?
[span style=\'font-size:8pt;line-height:100%\']
EDIT: Changed CODE to CODEBOX that's all[/span]
Anyone can shed a light as to why they differ?
Read this page with machine translation if you really want to know the reason (the first item in Frequently Asked Questions)
http://homepage3.nifty.com/blacksword/readme_j.htm (http://homepage3.nifty.com/blacksword/readme_j.htm)
In short, SSE arithmetic has 32bit precision while FPU (i.e., without SSE optimization/compile) arithmetic has 80bit precision. The computational error in floating point arithmetic may make the difference but is so small that you probably cannot hear the difference. I bet you also get a difference between John33's compile and reference binary distributed by Aoyumi.
In short, SSE arithmetic has 32bit precision while FPU (i.e., without SSE optimization/compile) arithmetic has 80bit precision.
[a href="index.php?act=findpost&pid=344771"][{POST_SNAPBACK}][/a]
SSE2 has 64 bit accuracy, and the FPU is generally used with only 64 bit accuracy (using 80 bit mode is not possible in a portable way, and as I said, vorbis is not doing it).
Note that AMD64/EM64T use SSE/SSE2 exclusively instead of the FPU.
But yes, in this case the difference is likely just minor rounding error. Note that positive ODG means that there is no audible difference (actually: encoded sample is better than the original, but that's a limitation in the way EAQUAL works).
Whoa! Thanks for the clarification I was afraid that Lancer optimizations is buggy and will degrade the output, but this puts my fear to rest. I am very amazed at the encoding speed increase and will change over to Lancer (oggenc, oggdrop, and libvorbis.dll).
One question: How do I decode the result of EAQUAL? Any pointer will be appreciated. Thanks a lot.
Can I patch the source under the Linux ?
What would be the exact command and what I need (besides aotuv source) ?
[a href="index.php?act=findpost&pid=343961"][{POST_SNAPBACK}][/a]
One question: How do I decode the result of EAQUAL? Any pointer will be appreciated. Thanks a lot.
[a href="index.php?act=findpost&pid=344879"][{POST_SNAPBACK}][/a]
ODG = Objective difference grade
From memory
0 = Imperceptible
-1 = Perceptible but not annoying
-2 = Slightly annoying
-3 = Annoying
-4 = Very annoying
Positive value = better than perfect
OK, managed to patch aotuv sources with lancer dif (don't know what went wrong last time), but now oggenc segfaults. Switching back to original aotuv helps.
Any comments ?
My box is amd64 Gentoo with gcc 3.4.4.
Try to compile by gcc 3.3.x.
For me gcc 3.4 can't compile sources, 4.0 compiles but oggenc display mystic error on start but 3.3 compiles and oggenc works after it.
I'm using Ubuntu 5.10 with gcc 3.3.6, 3.4.4 and 4.0.1 (acutally 4.0.2 pre) on Athlon XP 2200+. libvorbis 1.1.2 compiled by gcc 4 with default package options (--host=i486-linux-gnu) gives ~11x, with -march=athon-xp -mfpmath=sse about 14x, with lancer patches by gcc 3.3 with -march=athon-xp gives about 17x.
With the late November Lancer oggenc2 , I get MediaCoder encoding speeds from Flacs around 29x on my AMD 3300+ Win X64 system (q 6.16).
I know this thread is about speed, but I wonder if others disagree with my perception that quality now is comparable to MPC at rates around 200 kbps.
The quality at which -q setting...?
The quality at which -q setting...?
[a href="index.php?act=findpost&pid=353740"][{POST_SNAPBACK}][/a]
Say, in the range of nominal 200 kbps, which is q 6.16 to 6.24.
I recall that MPC was considered near as dammit to transparent at q 8. So, I'm getting at whether there's a sweet spot in the latest Japanese tweaked Ogg Vorbis encoders in that 6 to 8 range.
I'm pretty sure that that glitch on the 6.0 boundary for the official release that made those just north sound much better has been solved...
I know that I've ditched mpc "insane" for ogg q7. The poor seeking and limited hardware support for mpc and the improvements in Vorbis are what convinced me.
I don't think Guru has tested beyond the 170 to 180 range yet, which showed ogg to be on par with (and in some cases better than) mpc.
I don't think Guru has tested beyond the 170 to 180 range yet, which showed ogg to be on par with (and in some cases better than) mpc.
No, need to it's waste of time IMO. Most people with the exception of a few like GuruB can tell the difference, I can't. If it was low-bitrate test then sure why not
I don't think Guru has tested beyond the 170 to 180 range yet, which showed ogg to be on par with (and in some cases better than) mpc.
No, need to it's waste of time IMO. Most people with the exception of a few like GuruB can tell the difference, I can't. If it was low-bitrate test then sure why not
[a href="index.php?act=findpost&pid=353907"][{POST_SNAPBACK}][/a]
I agree that it's probably futile...but when I still see mpc recommended and touted as the best quality codec in the higher bitrate range, I have to wonder if some more tests are needed to debunk the myth. Finding a significant number of golden ears and "artifact professionals" would be difficult, though.
I'm referring above to claims like the following:
"Highest quality lossy codec at high bitrates" at dbpoweramp's codec central
and the general exuberance and confidence on display for mpc at the musepack forums when it hasn't seen any real quality improvements since its superiority was initially discovered...they even discount Guru's recent 170-180 kbps test as significant and make broad-sweeping claims that 128 kbps is not transparent (and that such bitrate testing is not interesting), which I think the latest 128 test will prove otherwise.
OK... thanks to HotshotGG and Vinnie97. I do suspect that the inherent superiority of MPC at 192+ kbps is shibboleth unquestioned because only those able to read Japanese can keep up with what those codec developers have been doing.
I'm delighted by how very simple and quick it has been to convert my FLACs for use in my 60 Gb iAudio -- one week to get it two thirds full. It was because that player has a 9,999 file limit that I decided to go up to around 200 kbps *.ogg . I was surprised that I so readily detected improvement over nominal 160 kbps LAME VBR.
Yes that reminds me... an MPC-fans said that the result is "inconclusive" as it is low bitrate and MPC achieves transparency at high bitrate...
I mean if Vorbis already achieves transparency at -q 5 or -q 6, what's the point of using -q 7 and higher?
If I need exact transparency I'll use FLAC. For my daily use, -q 2 suffices
New version seems to be out.
Not sure what the differences are, but I encoded a couple dozen files with the older version and the new one. I noted an increase of speed of about one percent.
36.28 seconds average for the old version and 36.64 for the new version. Significant or not? I dunno, but hey one percent is one percent
New version seems to be out.[a href="index.php?act=findpost&pid=361129"][{POST_SNAPBACK}][/a]
New version of what? Where?
The new version of the encoder discussed in this thread, Archer/Lancer. The new version has a build date of January 31st 2006.
http://homepage3.nifty.com/blacksword/ (http://homepage3.nifty.com/blacksword/)
Previous version was based on OggDropXPd 1.8.6. This one is based on 1.8.7. I think it is the only difference.
From google-translated page I think that there are some new optimizations and/or bug-fixes.
Via the Google Japan-English (beta) translation:
Libogg in 1.1.3 update
Oggenc in 2.8 update
OggdropXPd in 1.8.7 version rise
With SSE optimization of mapping_forward and _2class analysis of ICL imperfectly correspondence
Via the Google Japan-English (beta) translation:
Libogg in 1.1.3 update
Oggenc in 2.8 update
OggdropXPd in 1.8.7 version rise
With SSE optimization of mapping_forward and _2class analysis of ICL imperfectly correspondence
[a href="index.php?act=findpost&pid=361215"][{POST_SNAPBACK}][/a]
It seems that Lancer 20060131 oggenc2.exe piping from STDIN is broken. Can anyone test it?
Via the Google Japan-English (beta) translation:
Libogg in 1.1.3 update
Oggenc in 2.8 update
OggdropXPd in 1.8.7 version rise
With SSE optimization of mapping_forward and _2class analysis of ICL imperfectly correspondence
[a href="index.php?act=findpost&pid=361215"][{POST_SNAPBACK}][/a]
It seems that Lancer 20060131 oggenc2.exe piping from STDIN is broken. Can anyone test it?
[a href="index.php?act=findpost&pid=361605"][{POST_SNAPBACK}][/a]
Works here...
"You can specify taking the file from stdin by using - as the input filename.
In this mode, output is to stdout unless an output filename is specified
with -o"
It seems that Lancer 20060131 oggenc2.exe piping from STDIN is broken. Can anyone test it?
[a href="index.php?act=findpost&pid=361605"][{POST_SNAPBACK}][/a]
What did you do or use (software and command line arguments) when you had the problem?
It seems that Lancer 20060131 oggenc2.exe piping from STDIN is broken. Can anyone test it?
[a href="index.php?act=findpost&pid=361605"][{POST_SNAPBACK}][/a]
What did you do or use (software and command line arguments) when you had the problem?
[a href="index.php?act=findpost&pid=361630"][{POST_SNAPBACK}][/a]
I use it for the dMC compressor, actually I don't test it by using command line.
It seems that Lancer 20060131 oggenc2.exe piping from STDIN is broken. Can anyone test it?
[a href="index.php?act=findpost&pid=361605"][{POST_SNAPBACK}][/a]
What did you do or use (software and command line arguments) when you had the problem?
[a href="index.php?act=findpost&pid=361630"][{POST_SNAPBACK}][/a]
I use it for the dMC compressor, actually I don't test it by using command line.
[a href="index.php?act=findpost&pid=361685"][{POST_SNAPBACK}][/a]
I tested Lancer 20060131 oggenc2.exe today. It seems that feeding a complete wave file form CLI or though pipe to oggenc2 will encode. *BUT* I don't know what dMC / fb2k will feed to oggenc2. To only thing I know is that, oggenc2_lancer20051121 doesn't have such issue.
Works fine here with fb2k & piping.
People, I need previous Lancer 20051121, both oggenc2 and OggDropXPd, but links from official site does not work. Can anybody give working links? Big thanks in advance !
could anyone provide a static linux oggenc with lancer?
please?
People, I need previous Lancer 20051121, both oggenc2 and OggDropXPd, but links from official site does not work. Can anybody give working links? Big thanks in advance ![a href="index.php?act=findpost&pid=362299"][{POST_SNAPBACK}][/a]
If you will wait I will upload this morning. In RAR format okay?
Yeah. Or 7z.
By the way I already have oggenc2 so you can post OggDropXPd only. Thanks !!!
Works fine here with fb2k & piping.
[a href="index.php?act=findpost&pid=362151"][{POST_SNAPBACK}][/a]
Yup, I simply replaced my old oggenc2 and foobar2000 did its thing on my FLACs. But I did see that it was faster.
The new oggenc2 works even faster with Stanley Hwang's MediaCoder using mplayer. But my output Oggs have Genre:Unknown and track numbers without preceding 0, e.g., 01. That was so with the November oggenc2, also; can anybody suggest CLI for MediaCoder?
Yeah. Or 7z.
By the way I already have oggenc2 so you can post OggDropXPd only. Thanks !!![{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=362466")
Uploaded. Get the OggDropXPd Lancer 20051121 [a href="http://marupa.mine.nu/EquityLinks/oggdropxpd_lancer20051121.rar]here[/url].
Just in case, OggEnc 2.6 Lancer 20051121 you can also get here (http://marupa.mine.nu/EquityLinks/oggenc_lancer20051121.rar).
I just tried it with dbpoweramp... older version pipes just fine.. this one doesnt
i'll see if i can get it to work....
dbpoweramp error messages..
Encoding standard input to
"D:\~dmcout.ogg"
at quality 3.00
Internal error: attempt to read unsupported bitdepth 16
Done encoding file "D:\~dmcout.ogg"
File length: 0m 00.0s
Elapsed time: 0m 00.0s
Rate: 0.0000
Average bitrate: 1.$ kb/s
^^ without any extra options, just ' - --output=D:\~dmcout.ogg'
after I removed this from the options file....
--raw --raw-chan=[Channels] --raw-bits=[BitsPerSample] --raw-rate=[SamplesPerSec]
I got this error
ERROR: Input file "(stdin)" is not a supported format
I also tried using [WriteWaveRIFF], but gave me the same error as above (and other junk), and it made my computer 'beep' at me
even though piping works fine with it in foobar2000, it is obvious to me that something is wrong
dbpoweramp error messages..
Encoding standard input to
"D:\~dmcout.ogg"
at quality 3.00
Internal error: attempt to read unsupported bitdepth 16
Done encoding file "D:\~dmcout.ogg"
File length: 0m 00.0s
Elapsed time: 0m 00.0s
Rate: 0.0000
Average bitrate: 1.$ kb/s
^^ without any extra options, just ' - --output=D:\~dmcout.ogg'
after I removed this from the options file....
--raw --raw-chan=[Channels] --raw-bits=[BitsPerSample] --raw-rate=[SamplesPerSec]
I got this error
ERROR: Input file "(stdin)" is not a supported format
I also tried using [WriteWaveRIFF], but gave me the same error as above (and other junk), and it made my computer 'beep' at me
even though piping works fine with it in foobar2000, it is obvious to me that something is wrong
[a href="index.php?act=findpost&pid=362798"][{POST_SNAPBACK}][/a]
I also test it with a .pcm file saved from CoolEdit Pro and same error occured.
I'd like someone experienced to make a static linux build too, as I failed every time trying to do one
dbpoweramp error messages..
Encoding standard input to
"D:\~dmcout.ogg"
at quality 3.00
Internal error: attempt to read unsupported bitdepth 16
Done encoding file "D:\~dmcout.ogg"
File length: 0m 00.0s
Elapsed time: 0m 00.0s
Rate: 0.0000
Average bitrate: 1.$ kb/s
^^ without any extra options, just ' - --output=D:\~dmcout.ogg'
after I removed this from the options file....
--raw --raw-chan=[Channels] --raw-bits=[BitsPerSample] --raw-rate=[SamplesPerSec]
I got this error
ERROR: Input file "(stdin)" is not a supported format
I also tried using [WriteWaveRIFF], but gave me the same error as above (and other junk), and it made my computer 'beep' at me
even though piping works fine with it in foobar2000, it is obvious to me that something is wrong
[{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=362798")
I also test it with a .pcm file saved from CoolEdit Pro and same error occured.
[a href="index.php?act=findpost&pid=362957"][{POST_SNAPBACK}][/a]
From [a href="http://64.233.179.104/translate_c?hl=en&sl=ja&u=http://pc7.2ch.net/test/read.cgi/software/1136822006/266&prev=/search%3Fq%3Dhttp://pc7.2ch.net/test/read.cgi/software/1136822006/%26hl%3Den%26hs%3DEOq%26lr%3D%26client%3Dfirefox%26rls%3Dorg.mozilla:en-US:unofficial%26sa%3DG]Author's reply[/url], it is a bug with oggenc 2.8.
If you are looking for a (more or less) perfect frontend for transcoding your FLAC´s, Monkey´s audo- or WAVPack-files to OGG-vorbis, including perfect transfer of the TAG´s from the original and replaygaining (automatically after the encoding process), just stick with this frontend:
Universal-Front -All-In-One- (http://home.pages.at/thursdaychild/Universal-Front.7z)(7-zip-packed, 3,29 MB)
(Note: All necessary codecs are already included.)
Lancer 20060301 is out.
http://homepage3.nifty.com/blacksword/ (http://homepage3.nifty.com/blacksword/)
Now SSE and SSE2 optimizations!
Translated by excite.co.jp
--
2006/03/01 Lancer 20060301
The optimization option when compiling is reexamined.
Oggenc is renewed to Ver.2.81.
Because the function of the management of the project of Visual Studo is unstable, the development environment is completely shifted to the command line.
The optimization code for SSE2 is implemented.
The optimization code that uses an in-line assembler for bark_noise_hybridmp and seed_curve is implemented.
The SSE optimization of mdct_forward is changed.
Double-Step Bresenham algorithm is implemented on render_line and render_line0.
AMD CodeAnalyst is introduced into the code analysis.
[EDIT]fine tuning translation.[/EDIT]
Test on Pentium M 715 (2MB L2 Cache, 1.5GHz, 400MHz FSB)
c:\Temp\test>oggenc_aotuv_451.exe test.wav
Opening with wav module: WAV file reader
Encoding "test.wav" to
"test.ogg"
at quality 3,00
[ 99,6%] [ 0m00s remaining] /
Done encoding file "test.ogg"
File length: 2m 46,0s
Elapsed time: 0m 20,0s
Rate: 8,3069
Average bitrate: 105,0 kb/s
c:\Temp\test>oggenc2_lancer_20060131.exe test.wav
Opening with wav module: WAV file reader
Encoding "test.wav" to
"test.ogg"
at quality 3,00
[ 99,6%] [ 0m00s remaining] /
Done encoding file "test.ogg"
File length: 2m 46,0s
Elapsed time: 0m 07,9s
Rate: 21,1372
Average bitrate: 105,0 kb/s
c:\Temp\test>oggenc2_lancer_20060209_sse2test.exe test.wav
Opening with wav module: WAV file reader
Encoding "test.wav" to
"test.ogg"
at quality 3,00
[ 99,6%] [ 0m00s remaining] /
Done encoding file "test.ogg"
File length: 2m 46,0s
Elapsed time: 0m 07,8s
Rate: 21,3080
Average bitrate: 105,0 kb/s
c:\Temp\test>oggenc2_lancer_20060301_p3.exe test.wav
Opening with wav module: WAV file reader
Encoding "test.wav" to
"test.ogg"
at quality 3,00
[ 99,6%] [ 0m00s remaining] /
Done encoding file "test.ogg"
File length: 2m 46,0s
Elapsed time: 0m 07,9s
Rate: 21,0970
Average bitrate: 105,0 kb/s
c:\Temp\test>oggenc2_lancer_20060301_p4.exe test.wav
Opening with wav module: WAV file reader
Encoding "test.wav" to
"test.ogg"
at quality 3,00
[ 99,6%] [ 0m00s remaining] /
Done encoding file "test.ogg"
File length: 2m 46,0s
Elapsed time: 0m 07,6s
Rate: 21,8345
Average bitrate: 105,0 kb/s
c:\Temp\test>
May I post bugreport here? Thanks!
Can't build new lancer code by gcc into linux. Error is
../../lib/psy.c: In function 'seed_curve':
../../lib/psy.c:745: error: 'post05' undeclared (first use in this function)
../../lib/psy.c:745: error: (Each undeclared identifier is reported only once
../../lib/psy.c:745: error: for each function it appears in.)
../../lib/psy.c:768: error: 'post06' undeclared (first use in this function)
I think it's because of define at line 635:
#if defined(_MSC_VER)
int post07 = ((post1-i)&(~1));
int post06 = (post07&(~3));
int post05 = (post06&(~7));
and using post05, post06 & post07 into #else block.
Well, I've move these declarations before "#if defined(_MSC_VER)" and can compile code.
But with gcc 4.0.2 I get
% oggenc test.wav -o /dev/null
Opening with wav module: WAV file reader
Encoding "test.wav" to
"/dev/null"
at quality 3,00
Mode initialisation failed: invalid parameters for quality
With gcc 3.4.5 I get error at compile time again, but more mysterious
../../lib/floor1.c: In function `floor1_encode':
../../lib/floor1.c:2333: internal compiler error: in trunc_int_for_mode, at explow.c:54
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions,
see <URL:file:///usr/share/doc/gcc-3.4/README.Bugs>.
O'key... At last I'm trying with gcc 3.3.6 and compile code fine like with gcc 4.0.2. And oggenc works! But bitrate not as with generic aoTuV 4.51. For aoTuV it's 115.3 kb/s but for Lancer it's 111,8 kb/s. And difference between tracks in Audacity up to 0.2 of amplitude range.
Is this code untested for any compiler other MS Visual C ? Is it planned to support GCC? (optimization up to x2-x3 is very good thing!)
Lancer 20060302 is released!
Because there's a problem with decoding function of SSE2 edition of Lancer 20060301, Lancer 20060302 was released.
The straight line drawing algorithm of the fixed point of "Extremely Fast Line Algorithm" was improved and implemented.
It quickens because it became easy to optimize SIMD though it is a
little.
Lancer 20060302 is released!
It doesn't seem to be working...
In foobar (0.8.3), I get the following error:
INFO (foo_clienc) : CLI encoder: C:\Program Files\Codec\Vorbis\Lancer 2.81 2006 03-02\oggenc.exe
INFO (foo_clienc) : Destination file: file://C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
INFO (foo_clienc) : Source file: file://F:\Albums\Bad Company\Bad Company (AF Gold)\05 Bad Company.wv
INFO (foo_clienc) : 44100Hz 24bps 2ch
INFO (foo_clienc) : Encoding took 62 milliseconds, speed 1.68x
ERROR (foo_input_std) : Ogg stream is corrupted : C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
ERROR (foo_speex) : Ogg stream is corrupted : C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
ERROR (foo_input_std) : Ogg stream is corrupted : C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
ERROR (foo_speex) : Ogg stream is corrupted : C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
INFO (CORE) : attempting to edit file info : file://C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
WARNING (CORE) : file info update failure on : file://C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
ERROR (foo_diskwriter) : Conversion failed.
...and from the command line:
Encoding standard input to "C:\Documents and Settings\372\Desktop\05 Bad Company.ogg" at quality 6.00
can't write .WAV data, disk probably full!
** ERRORS:
General errors: 1
Press any key to continue...
I'm using oggenc281_p4_lancer20060302 on a P4 3.2GHz system with 1G of RAM.
The old oggenc28_p4_lancer20060131 works fine.
I notice in the foobar error report it says:
INFO (foo_clienc) : 44100Hz 24bps 2ch
...but the file is 16bps, not 24.
Any suggestions?
Thanks!
~esa
@esa372
Set "Maximum bitdepth" to 16bits, works here with fb2k 0.9RC
Cheers,
Tim
Set "Maximum bitdepth" to 16bits, works here with fb2k 0.9RC
[a href="index.php?act=findpost&pid=368571"][{POST_SNAPBACK}][/a]
I was wrong, on a second file, foobar reports:
"Error writing to file (Encoder has terminated prematurely with code -1073741819; please re-check parameters)"
Lancer 20060302 is released!
It doesn't seem to be working...
In foobar (0.8.3), I get the following error:INFO (foo_clienc) : CLI encoder: C:\Program Files\Codec\Vorbis\Lancer 2.81 2006 03-02\oggenc.exe
INFO (foo_clienc) : Destination file: file://C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
INFO (foo_clienc) : Source file: file://F:\Albums\Bad Company\Bad Company (AF Gold)\05 Bad Company.wv
INFO (foo_clienc) : 44100Hz 24bps 2ch
INFO (foo_clienc) : Encoding took 62 milliseconds, speed 1.68x
ERROR (foo_input_std) : Ogg stream is corrupted : C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
ERROR (foo_speex) : Ogg stream is corrupted : C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
ERROR (foo_input_std) : Ogg stream is corrupted : C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
ERROR (foo_speex) : Ogg stream is corrupted : C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
INFO (CORE) : attempting to edit file info : file://C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
WARNING (CORE) : file info update failure on : file://C:\Documents and Settings\372\Desktop\05 Bad Company.ogg
ERROR (foo_diskwriter) : Conversion failed.
...and from the command line:Encoding standard input to "C:\Documents and Settings\372\Desktop\05 Bad Company.ogg" at quality 6.00
can't write .WAV data, disk probably full!
** ERRORS:
General errors: 1
Press any key to continue...
I'm using oggenc281_p4_lancer20060302 on a P4 3.2GHz system with 1G of RAM.
The old oggenc28_p4_lancer20060131 works fine.
I notice in the foobar error report it says:
INFO (foo_clienc) : 44100Hz 24bps 2ch
...but the file is 16bps, not 24.
Any suggestions?
Thanks!
~esa
[a href="index.php?act=findpost&pid=368569"][{POST_SNAPBACK}][/a]
What's your CPU?
There's report that Athlon64 3000+ doesn't work with this build but P4 2.6G works.
EDIT: Typo.
Set "Maximum bitdepth" to 16bits, works here with fb2k 0.9RC
I was wrong, on a second file, foobar reports:
"Error writing to file (Encoder has terminated prematurely with code -1073741819; please re-check parameters)"
Too bad... thanks anyway.
What's your CPU?
Intel P4 3.2GHz
The good news is that today's P3 build works here (P4 2,66). I just encoded 4 cd's to be sure (Yesterday's P4 build worked too, today's P4 build gives me strange errors... ).
Cheers,
Tim
Mr 637 said that there's some problems with straight line drawing algorithm and the compiler. A bug fix version will release in next Monday.
Mr 637 said that there's some problems with straight line drawing algorithm and the compiler. A bug fix version will release in next Monday.
Good to know - thanks!
cool... this one (march 2, 2006 for those reading this a long time from now) works with piping in dbpoweramp... but it's slower than the november 05 release
edit: actually, the new one is faster... silly me must have been running too many other programs at the time of my initial test nov. lancer went 20x in dbpoweramp, and the new one at 24x
Mr 637 said that there's some problems with straight line drawing algorithm and the compiler. A bug fix version will release in next Monday.
Good to know - thanks!
[a href="index.php?act=findpost&pid=368672"][{POST_SNAPBACK}][/a]
Mr. 637 is always come as a surprise!
Lancer 20060303 is released!
--
Mr 637 said in 2ch board,
"After all, it is a lapse of judgment of the priority of the operator. It is shameful. "
The bug with variable declaration in psy.c has gone but resulted bitrate differ from generated by generic aoTuV 4.51 (compiled by gcc 3.3.6 with -march=athlon-xp, run on AthlonXP+ 2200 under Ubuntu 5.10).
Mr. 637 is always come as a surprise!
Lancer 20060303 is released!
Excellent - thank you!
It's working fine on my system now.
Test @ -q6:
Bad Company (2006 AudioFidelity remaster)
aoTuV b4.51 P4 - 2:15 encode time
Lancer 20060303 P4 - 1:27 encode time
Identical bitrates and file sizes.
Athlon64 X2 3800+ E3 (512 L2 Cache, 2x2000, 200Mhz FSB)
oggenc281_p4_lancer20060303
D:\test>oggenc2.exe test.wav
Opening with wav module: WAV file reader
Encoding "test.wav" to
"test.ogg"
at quality 3,00
[100,0%] [ 0m00s remaining] |
Done encoding file "test.ogg"
File length: 73m 11,0s
Elapsed time: 2m 15,9s
Rate: 32,3169
Average bitrate: 114,6 kb/s
so, what about dual core ?
Launching two encodes in parallel, not setting the affinity by hand like so:
C:\temp>type code.bat
@echo off
start codeone 1
start codeone 2
C:\temp>type codeone.bat
@echo off
oggenc2 -q 5 %1.wav
pause
Results in:
Done encoding file "1.ogg"
File length: 4m 16,0s
Elapsed time: 0m 07,9s
Rate: 32,4412
Average bitrate: 171,7 kb/s
Done encoding file "2.ogg"
File length: 4m 16,0s
Elapsed time: 0m 08,0s
Rate: 32,0600
Average bitrate: 171,7 kb/s
They start and complete in almost exactly the same time, so the effective rate is about 64x real-time!
(The source file(s) are "Reflect the storm" off the new In Flames album, which I made disk-cache hot by checksumming them prior to starting the script. The CPU is an AMD X2 3800+ at 2.0GHz (the default) with cool'n'quiet enabled)
You can test this easily with foobar2000 0.9 Release Candidate: it will automagically encode in parallel on multiprocessor machines.
but if i have a really big file? 700 mbs of audio - one track or so...
[span style='font-size:8pt;line-height:100%']i remember it was a nice app out there - gogo2, now it speeds up at 250x on my pc and i think that is tyni int counter limination
the homepage was also on homepage1.nifty.com/herumi
so i would like to ogg will be faster, btw multicore lame 3.97 beta 2 gives me only 45x, so possible 64x speed of ogg on x64 platform will be amazing think, isnt`it?[/span]
update:I wrote to the author about multi-core and got this:
Hello,
The MultiThreading function is certainly cool. But I do not have
dual-core machine. I have Athlon-XP 1700+ and Pentium4M only.
It should be detailed to the algorithm of vorbis to do the work
and requires more memory in process.
Therefore, it is not easy. However, if a new machine is obtained,
I will challenge it.
Lancer 20060310 is released.
Lancer 20060317 is released!
Translated by excite.co.jp, fine tuned by me
--
2006/03/15 Lancer 20060317
When the DownMix function of oggdropXPd is used, the hang issue is corrected.
Oggpack_write new is added to the vorbis side for the performance improvement of DLL.
The SSE optimization code of _ve_amp is updated.
An unnecessary SSE optimization code in floor1.c is deleted.
The code in which the bug of GCC is evaded with inspect_error is added.
The enroll and the register renaming processing are executed by the code related to mdct_forward,
bark_noise_hybridmp, and fft.
The loop division point of bark_noise_hybridmp is calculated beforehand and it changes.
Lancer 20060331 is released! (http://translate.google.com/translate?u=http%3A%2F%2Fhomepage3.nifty.com%2Fblacksword%2F&langpair=ja%7Cen&hl=en&newwindow=1&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools)
Lancer 20060331 is released! (http://translate.google.com/translate?u=http%3A%2F%2Fhomepage3.nifty.com%2Fblacksword%2F&langpair=ja%7Cen&hl=en&newwindow=1&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools)
Thank you!
Lancer 20060506 (http://homepage3.nifty.com/blacksword/)
Now support SSE3 and Multithreading too.
Lancer 20060506 (http://homepage3.nifty.com/blacksword/)
Now support SSE3 and Multithreading too.
Nice! I love Lancer's Vorbis Tunings SSE2 and MT for me.
Some encode times with an AMD X2 4400+.. MT is very nice
oggenc283_sse3mt_lancer20060506
File length: 75m 58.0s
Elapsed time: 1m 28.0s
Rate: 51.8139
Average bitrate: 192.8 kb/s
oggenc283_sse2mt_lancer20060506
File length: 75m 58.0s
Elapsed time: 1m 32.4s
Rate: 49.3090
Average bitrate: 192.8 kb/s
oggenc283_sse3_lancer20060506
File length: 75m 58.0s
Elapsed time: 2m 00.8s
Rate: 37.7426
Average bitrate: 192.8 kb/s
oggenc283_sse2_lancer20060506
File length: 75m 58.0s
Elapsed time: 2m 01.9s
Rate: 37.3896
Average bitrate: 192.8 kb/s
oggenc283_sse_lancer20060506
File length: 75m 58.0s
Elapsed time: 2m 01.4s
Rate: 37.5532
Average bitrate: 192.8 kb/s
Uhhh... so which one is which?
I have AthlonXP 2400+ (IIRC Barton core). Which one should I get.
Sorry my mind is a bit swimming at the moment...
Uhhh... so which one is which?
I have AthlonXP 2400+ (IIRC Barton core). Which one should I get.
Sorry my mind is a bit swimming at the moment...
SSE Version
Lancer's dll crash my winamp+oddcast3 since 200603010 build.
till now, it is still now fixied.
Hmm, I guess this is still based off the old code and not Aoyumi's recent tunings? Multithreading is so cool, the problem is that it takes longer to read off the hard drive (or decode a flac) than to encode now. XD
From the site:
Based on aotuv-b4.51_20051117
4.51beta is the latest version from Aoyumi.
http://www.geocities.jp/aoyoume/aotuv/index.html (http://www.geocities.jp/aoyoume/aotuv/index.html)
No need to worry
Lancer 20060512 (only MT) Released!
Home page (http://translate.google.com/translate?u=http%3A%2F%2Fhomepage3.nifty.com%2Fblacksword%2Findex.htm&langpair=ja%7Cen&hl=en&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools)
Hey thanks for that link. I tried going there last night but couldn't figure out a single thing I'm just getting into ogg vorbis now...
But I just tried those (oggenc2) and I got a Fatal error: This program is not designed to run on this machine. I have a P4, 2.1 (Dell), with Windows XP sp2. Are these new builds meant for AMD processors only?
edit: Oh wait a minute, I was trying the SSE3 version, give me a few minutes to try the SSE2 version. BTW, what is SSE2/3 anyway? I think I'm in way over my head here.
edit 2: The SSE2 one does work. Ok, I'm a hoser
Hey thanks for that link. I tried going there last night but couldn't figure out a single thing I'm just getting into ogg vorbis now...
But I just tried those (oggenc2) and I got a Fatal error: This program is not designed to run on this machine. I have a P4, 2.1 (Dell), with Windows XP sp2. Are these new builds meant for AMD processors only?
edit: Oh wait a minute, I was trying the SSE3 version, give me a few minutes to try the SSE2 version. BTW, what is SSE2/3 anyway? I think I'm in way over my head here.
edit 2: The SSE2 one does work. Ok, I'm a hoser
Hey I got confused too for a moment (see my post up there). I should've checked wikipedia first... it lists processors with SSE, SSE2, and SSE3.
What are these SSE-thingies? In a nutshell, they are special instructions to enable CPUs perform exotic calculations faster. SSE2 adds some instructions to SSE. SSE3 adds more instructions to SSE2. Of course there are CPU architecture evolution but let's KISS.
Soooo... I put in a (very very very) simplified guide on which version of Lancer you should use, in the Lancer page of HA Wiki (http://wiki.hydrogenaudio.org/index.php?title=Lancer).
Hey thanks for adding that wiki page, makes way more sense now! But does the type of processor you have (32 bit vs. 64 bit) factor into it in any way? I know I have a 32 bit processor (does that sound right?), and a single core computer. The SSE2MT version does work on my computer.
Hey thanks for adding that wiki page, makes way more sense now! But does the type of processor you have (32 bit vs. 64 bit) factor into it in any way? I know I have a 32 bit processor (does that sound right?), and a single core computer. The SSE2MT version does work on my computer.
Um, the bit-ness of your processor is not related strictly to SSEx. For instance, compare Intel: P4 is 32-bit, yet it support SSE2 instructions. AMD did not get the opportunity to embed SSE2 instructions into their 32-bit line, and opt to add SSE2 into their 64-bit line.
So, whether your processor supports a certain version of SSEx or not, depends more on its release date than its bit-ness.
Edit: Updated the wiki page above slightly to explain the (theoretical) benefit of MT versions.
What are these SSE-thingies? In a nutshell, they are special instructions to enable CPUs perform exotic calculations faster. SSE2 adds some instructions to SSE. SSE3 adds more instructions to SSE2. Of course there are CPU architecture evolution but let's KISS.
Yeah that really needs to be clarfied for a lot of folks. I gathered some information about them and rewrote that section in the wiki. It does help in the long run though.
What are these SSE-thingies? In a nutshell, they are special instructions to enable CPUs perform exotic calculations faster. SSE2 adds some instructions to SSE. SSE3 adds more instructions to SSE2. Of course there are CPU architecture evolution but let's KISS.
Yeah that really needs to be clarfied for a lot of folks. I gathered some information about them and rewrote that section in the wiki. It does help in the long run though.
The most sure-fire way to know which SSEx version your processor supports is to download all 5 Lancer OggEnc2 encoders and run them one by one. If your processor does not support the SSEx, OggEnc2 will exit gracefully, informing you so.
I've added this to the Lancer wiki page (http://wiki.hydrogenaudio.org/index.php?title=Lancer). Hope it helps.
Edit: stupid typo. Note to self: don't type something long while holding a lighted cigarette.
There are also Windows programs that easily tell you the processor instructions for your system; wcpuid (http://hp.vector.co.jp/authors/VA002374/src/download.html) and cpu-z (http://www.cpuid.com/cpuz.php)
Does people know of programs for Linux and Mac with similar function?
There are also Windows programs that easily tell you the processor instructions for your system; wcpuid (http://hp.vector.co.jp/authors/VA002374/src/download.html) and cpu-z (http://www.cpuid.com/cpuz.php)
Does people know of programs for Linux and Mac with similar function?
cat /proc/cpuinfo will work on Linux at least.. and maybe on Mac too since the newer OSs are Unix based I think..
Info about wcpuid and cpu-z is now part of the Lancer page (http://wiki.hydrogenaudio.org/index.php?title=Lancer). I'm not into Linux or any Unix, so please complement the info there if need be. Thanx.
I have and Athlon 64 x2 3800.
This new sse3 version is the cat's meow for fully utilizing both cores.
On comparison, I did find a strange phenomenon though.
Encode a whole album with 10 songs and the total time for the sse3mt version
took 10 seconds longer than if I boot two 2006/03/31 versions and oggdrop
5 songs in each simultaneously. Can anyone else reproduce this?
Is the threading overhead higher in the sse3mt version?
Not that I'm complaining mind you. The convenience factor is great with the sse3mt version!
Excellent work. The encoding speed is typically over 50x regardless!
My idea of a "killer app" here!
minor edit for grammar.
I have and Athlon 64 x2 3800.
This new sse3 version is the cat's meow for fully utilizing both cores.
On comparison, I did find a strange phenomenon though.
Encode a whole album with 10 songs and the total time for the sse3mt version
took 10 seconds longer than if I boot two 2006/03/31 versions and oggdrop
5 songs in each simultaneously. Can anyone else reproduce this?
Is the threading overhead higher in the sse3mt version?
Not that I'm complaining mind you. The convenience factor is great with the sse3mt version!
Excellent work. The encoding speed is typically over 50x regardless!
My idea of a "killer app" here!
minor edit for grammar.
It's normal that using a multi-threaded app over two cores won't give you a 100% boost over using one core.. It's usually something like 70% faster. In fact, a mere 10 seconds difference for a whole album is very good!
Of course double speed is seldom possible using multithreading since many problems can't be parallelised (or whatever it's called) completely. But in the case of encoding a batch of files with OggdropXpd, wouldn't it make more sense then to run one normal encoding thread per core, because that's 100% parallelised? Of course, when only one file is encoded, the multithreaded version is (if above post is correct) only slightly slower, but I think if it gives a speed advantage (which is what Lancer builds are all about I believe) one could implement this parallel encoding into the frontend.
Hope I'm making sense, I need some sleep...
MedO
Here we go: Lancer 20060529 Release (http://homepage3.nifty.com/blacksword/index.htm)
Changelog (by babelfish):
- Correcting the trouble of the decoding section.
Is it a chance for any version in near future to work correctly after compilation by GCC (preferably by 4.x branches)?
Latest version which correctly works after compile by gcc 3.3.6 is 20051121 (tested on Athlon XP 2200+ with SSE only support in Ubuntu Linux). All versions after this one give differrent bitrate in generated .ogg (compared to 'standard' - aoTuV b4.51).
Ogg Vorbis is standard lossy codec in Linux world and SSE(2,3) optimized version for Linux is a good support for the community.
If the bitrate difference is only slight, this is probably normal. IIRC, even the P3-Optimised version of the original AoTuV-encoder gives slightly different results than the generic build.
The difference is significant, about 2-3 kbps for -q 3 (on my test.wav it gives 112.3 kbps instead of 115.3 kbps in aoTuV 4.51).
I've tried to compile aoTuV by gcc 3.3/3.4/4.0 with or without compiler SSE optimization (-march=athlon-xp -mfpmath=sse) and bitrate was different only in hundredths. So this is definitely a bug.
Remember that standard aoTuV uses the FPU and Lancer uses SSE. They have different bit-length to represent real nums, and may thus cause different compression.
If in doubt, ABX.
Already compiled oggenc2/oggDropXPd for Windows give the same bitrate as unoptimized aoTuV, but its were compiled MSVC or similar, not by GCC. I think GCC just untested in new versions of Lancer.
Lancer 20060616 released.
Edit: Fixed bug in decoding(SSE2)
Translated by google:
2006/06/16 Lancer 20060616
In one for AMD CPU replacing the CPU distinction processing of the DLL file
Optimizing vorbis_oggpack_look with the inline assembler
Adding SSE3 optimization processing to _mm_add_horz*
SSE optimizing oggdec, it adds
Correcting the trouble of the SSE2 optimization of ov_read_float2pcm
The decoding section of oggdropXPd SSE optimization
Optimization profile for multithread operation for single thread and joint ownership conversion
Didn't use Vorbis for some time but now I needed an encoder for some previews and tried the Lancer (2006 06 16th) one.
The speed is just sick, thanks to all involved in that
Works great on my Athlon XP, normal SSE version.
LOL yeah I still got the warm-fuzzy-feeling everytime I encode using Lancer
New release out today.
Changes:
* inline assembly replaces as much as possible to intrinsic
* abolish original memory transfer code in block.c
* bitreverse use looking up table
* fix speed down vorbis_book_decodevv_add's regression in lancer
20060529
* remove optimization prevention code in vorbis_book_decodevv_add
* pre-calculate tables for triggers in mdct
* simplifying a code in which high frequency removed by mdct_backward
* add decode only funcs: mdct_butterflies_backward,
dct_butterfly_first_backward
* improve SSE optimization: bark_noise_hybridmp
* add SSE optimization: render_line, vorbis_noise_normalize,
_vp_noise_normalize
* add SSE3 optimization: mdct_bitreverse
* add pre-calculation code: seed_loop, max_seeds
* optimize: seed_chase
* add SORT16 to psy.c
* auto loop unrolling: SORT8, SORT32 in psy.c
* use lddqu in non SSE environment for unaligned memory load
* improve loop condiution code in inline assembly code
* add t option for oggdec benchmarks (without outputting file)
(courtesy of pub at cyanet.jp)
Good to see the asm being replaced with intrinsics.
* add SSE optimization: render_line, vorbis_noise_normalize,
_vp_noise_normalize
SSE optimizations to the noise normalization code ey? that's interesting. Must be very fast
Hope this guy will work on theora or dirac in the future !
I also hope to see SSE/SSE2/SS3 builds merged together and autoselects the optimizations on fly (like FLAC...)
Lancer 20060722 is temporarily unavailable due to memory issue(unconfirmed).
What kind of memory issue? I've been using 20060722 with no problems.
What kind of memory issue? I've been using 20060722 with no problems.
It's impossible to download from the page.
If someone could post Lancer 20060722 release (at least "oggenc2.83"), that would be great. Or just a link for dl, of course.
Lancer 20060802 (http://homepage3.nifty.com/blacksword/exprimental/index.htm) released.
This is bug fixed release.
20060722 had a memory leak according to author.
And the crash issue of Lancer DLL is going to be fixed.
Experimental (http://homepage3.nifty.com/blacksword/exprimental/index.htm) Lancer 20060806 is up.
Altavista says:
"Being heap memory access error occurs with vorbis_oggpack_write it abolishes, the optimization module in oggpack_write movement
oggpack_look SSE optimization of optimization
_ve_amp cash control processing of modification
accumulate_fit being imperfect with correction
MDCT-RELATED cash control rearranging unnecessary zero data exception processing the SSE optimization description section of deletion _encodepart with correction inspect_error"
Err.. right.
Wow! Optimization of optimization, that's gotta be fast!
Wow! Optimization of optimization, that's gotta be fast!
None of the many "recent" optimizations provided a major speedup for me. Maybe it's 20x encoding with an old lancer and 22x with the sse2-optimized version. I have a Celeron M 1400Mhz, so the MT speedups don't help here. Still, the speed is great. What are your experiences/speeds/setups? Just curious...
What are your experiences/speeds/setups? Just curious...
I'm experiencing a speedup of about ~1.7x with Lancer 20060806 compared to the latest OggEnc from rarewares (average speed is something around 29x vs 17x,
significantly depending on the sound material).
EDIT: My system is WinXP SP2 on an Athlon 64 3400+ (Venice).
If anybody has a Core 2 Duo, I would be interested how fast the SSE2 version is, as this new CPU doesn't break up SSE2 instructions intto 2 parts and thus can compute them directly.
Athlon XP 3200+ (2.2GHz), SSE:
Lancer 20060616 - 37.85x
Lancer 20060805 - 38.56x
So it's about 1.9% faster, nice but hardly significant.
I also recall that a previous version of lancer was about .5x faster than 06/16. Like linux kernels, not every next version is faster on every system.
The SSE3 multithreading version should fly on Core2. I wouldn't be surprised if it encodes over 100x.
What are your experiences/speeds/setups? Just curious...
Machine: AMD Athlon64 X2 3800+ (2GHz) @ 2.4GHz
Track: Machinae Supremacy - Elite.wav (4m 24.0s)
Options: -q 5
OggEnc (vorbis-tools Rev.10381):
13.9284x (19s)
OggEnc v2.83 (Lancer [20060805](SSE3MT) based on aoTuV b4b):
57.605572x (4.594s)
It seems that there was something wrong with the latest Lancer-version (Lancer 20060802(Based on aotuv-b4.51_20051117)) for vorbis, because all files were taken (scratched) from the download-server. See actually his page (http://homepage3.nifty.com/blacksword/).
20060807 has been released on the main page
2006/08/07 Lancer 20060807
Correcting the SSE optimization of mdct_forward and mdct_backward
Only static edition reviving vorbis_oggpack_write
Correcting the problem of local_book_besterror_dim1x4
... and now it too is "crossed out".
"2006 August 9th
Continuing with oggdropXPd of Lancer, 20060807 when you encode, because the problem which becomes output of abnormal bit rate was discovered it stops release at one time."
... and now it too is "crossed out".
"2006 August 9th
Continuing with oggdropXPd of Lancer, 20060807 when you encode, because the problem which becomes output of abnormal bit rate was discovered it stops release at one time."
Try a newer build.
http://homepage3.nifty.com/blacksword/exprimental/index.htm (http://homepage3.nifty.com/blacksword/exprimental/index.htm)
And now 20060811 is released on the main page.
2006/08/11 Lancer 20060811
Correcting the SSE optimization of mdct_backward
2006/08/10 Lancer 20060810 (bit rate abnormal problem, for multithread operation problem evaluation)
Correcting the problem of the SSE optimization of _ve_amp
Correcting the problem where the pattern which each time differs in multithread operation edition is output
Maybe this was suggested before, but it would be beneficial to the project if he/she did the page in English, or if thats a problem ask a friend to translate it.
ありがとう - それは非常に速くある
Maybe this was suggested before, but it would be beneficial to the project if he/she did the page in English, or if thats a problem ask a friend to translate it.
????? - ??????????
http://translate.google.com/ (http://translate.google.com/)
I think there's a problem with the multi-threaded versions of 2006/08/11 Lancer 20060811.
I'm on a Core Duo, and I've never had problems with the multithreaded releases up to this one. Now, I'm getting speed slowdowns to 1.x and 3.x, when it was going 40-50x before. The non-multithreaded compiles work fine (40-50x); it's only the SSE2 and SSE3 multithreaded that are screwed up. Can anyone confirm?
Can anyone confirm?
On my system, the "sse3_mt_lancer20060811" just freezes up.
I've gone back to using the "sse3_mt_lancer20060807", which is working fine.
Can anyone confirm?
On my system, the "sse3_mt_lancer20060811" just freezes up.
I've gone back to using the "sse3_mt_lancer20060807", which is working fine.
is there an online archive of past releases, or did you just already have it on your system?
is there an online archive of past releases, or did you just already have it on your system?
I keep several "layers" of past releases on my computer. I don't know if there's an online archive.
Here's a link for the August 07 release, if you need it:
(right-click -> "Save Target As...") oggenc283_sse3mt_lancer20060807.zip (http://66.49.140.133/assets/ha/oggenc283_sse3mt_lancer20060807.zip)
I had problems with the latest MT as well and have reverted to an earlier version.
Cheers,
Pete
Lancer 20060815 Experimental (http://homepage3.nifty.com/blacksword/exprimental/index.htm)
Babelfish translation:
It released started Lancer 20060815 for multithread operation problem verification at the laboratory.
Maybe this was suggested before, but it would be beneficial to the project if he/she did the page in English, or if thats a problem ask a friend to translate it.
It released started Lancer 20060815 for multithread operation problem verification at the laboratory.
Lol, so much for Babelfish - Google Translate gives something similar.
It's really a problem that there's no translation available... this way we can't give the guy (girl?) feedback on how it runs/crashes on our systems, nor tell him he's doing cool and appreciated work (although he may guess that from the download numbers).
I think the author should spend some time, once the code is stable again, on getting it working on modern GCCs, like for instance getting clean compiles under linux /w GCC 4.1
I'd do if myself if I had the mad skillz, but I don't.
Lancer 20060818 (MT only) is out.
Changes:
It improves the multithread operation processing of mapping0_forward, increases the parallel processing section and accelerates
In order with coodbook.* to make the parallel processing of floor1_encode possible, mounting the delay collective entry function of the Ogg stream
Way floor1_encode can be executed while parallel processing, modification
_vp_couple it corresponds to parallel processing
At the time of profile measurement way it does not enter into the infinite loop, modification
I will encode my whole flac archive (400cds, about 150 gb) this weekend on my amd x2 4400+ to ogg q6 oder q7.
Best regards
Franklin
I think that a bi-directional 2-pass MT encoder would be great.
I talked a guy with a Core 2 Duo E6600 @ 3.1GHz into running an encode, and here's the result he reported (for -q 5?):
File: "M - Pop Muzik"
OggEnc v2.83 (Lancer [20060818](SSE3MT) based on aoTuV b4b)
File length: 5m 01,0s
Elapsed time: 0m 3,432s
Rate: 87,950145
Average bitrate: 193,7 kb/s
So these new CPUs seems to be quite the little SSE monsters.
EDIT: that makes me feel ashamed of my newly aquired mid-range PC
Eloj, that's very interesting, awesome performance, shame it falls short of my 100x prediction. Perhaps in -q2
Yet looking at the numbers i get a feeling the multithreading doesn't speed it up that much, is it possible for you to test the non-mt version and see how much slower it is?
Eloj, that's very interesting, awesome performance, shame it falls short of my 100x prediction. Perhaps in -q2
Yet looking at the numbers i get a feeling the multithreading doesn't speed it up that much, is it possible for you to test the non-mt version and see how much slower it is?
At these speeds I think Disk I/O can be a bottleneck...
See the command i used for testing on the previous page. It doesn't write an output file to disk and if you run it multiple times windows will buffer the input file in memory. For me it leads to accurately repeatable results when you discount the first run.
Also 100x is only 17.2MB/s, which any vaguely modern harddrive can easily keep up with.
Hi,
new releases out: 20060824
Recently i converted 400 cds with lancer with a speed of about 50x on my X2 4400+
Best regards
Franklin
The difference seems to be that it's built on aotuv Release 1.
(babelfished) ChangeLog:
2006/08/24 Lancer 20060824
Based cord/code modification to aotuv-r1_20051117
Adding SSE optimization to _vp_couple
Adding the cord/code for multi channel processing divisions to xmmlib.h
At the time of OpenMP use the singles lead-lead _vp_quantize_couple_memo and _vp_quantize_couple_sort which are operational modification to multithread operation operation
Lancer 20060903 is out.
Changelog
2006/09/03 Lancer 20060903
Efficiency of the inline assembler cord/code for ICL detailed survey, deleting the slow part
Efficiency of cash control-related cord/code detailed survey, deleting the slow part
Efficiency of memory transfer type cord/code detailed survey and SSE optimization cord/code part revival
Improving the SSE optimization of bark_noise_hybridmp
Knocking down the renewal frequency of the lapse indication of oggenc2.
Regards
Franklin
Awesome... as always !
trying to use the latest & best Lancer is like joining your Build-of-the-day club...
trying to use the latest & best Lancer is like joining your Build-of-the-day club...
You ain't kiddin'!
Lancer 2006 09-15 (http://homepage3.nifty.com/blacksword/) is out.
2006/09/15 Lancer 20060915
Because binary for multithread operation from profile edition usually modification (the profile optimization effect at the time of MT is low in edition,)
Correcting the description mistake of the cord/code for MT of mapping0_forward
Executing loop unrolling with mdct_forward, mdct_backward and mdct_butterfly_generic under multithread operation environment
trying to use the latest & best Lancer is like joining your Build-of-the-day club...
You ain't kiddin'!
Lancer 2006 09-15 (http://homepage3.nifty.com/blacksword/) is out.
uhhh ...
I haven't even yet unzipped the previous version... and now a new build...
*dies*
Not that I despise BlackSword and his (her?) attempts... domo arigato gozaimasu !
I haven't even yet unzipped the previous version... and now a new build...
LOL
No sse build?
No sse build?
It looks like 20060915 build is a MT-only bugfix build.
2006/10/05 Lancer 20061005:
- Updated ICL to 9.1.030
- Improved MT optimization code for mapping0_forward
- Tweaked compile options
- Suppress some compiling warnings
- Discontinue GCC support
This release is memorial to me as this binary (with -q4) runs faster than 100x on my new machine.
http://nyaochi.sakura.ne.jp/encoder-benchm...t-20061005.html (http://nyaochi.sakura.ne.jp/encoder-benchmark/result-20061005.html)
Many thanks to 637 (Blacksword) for the brilliant achievement!
This is really impressive. I remember the old time (pre-RC3 encoder) when Vorbis was painfully slow: x1,5 max on my Duron 800 - up to 3...4 time slower than musepack (not present in this big benchmark), and same speed than LAME --alt-preset extreme.
2006/10/05 Lancer 20061005:
- Discontinue GCC support
Sadly to read but it's more truely as a number of previous versions not worked after GCC correctly.
But under
wine oggenc2.exe will work anyway.
Man, that truly sucks. This should be written with GCC intrinsics, not ICC. Anyone tried building it with the linux version of ICC?
This release is memorial to me as this binary (with -q4) runs faster than 100x on my new machine.
My congratulations to you!
2006/10/13 Lancer 20061013
Correcting the problem of the memory management cord/code
Regards
Franklin
Hmm. On my AMD 2400 it is slower (1x-2x) than ver 2005 11 21
Is it ok?
Man, that truly sucks. This should be written with GCC intrinsics, not ICC. Anyone tried building it with the linux version of ICC?
Agreed
New version out (20061103), here's the -babelfished- changelog:
Based cord/code modification to aotuv-b5_20061024
Modifying the SSE optimization of _vp_offset_and_mix, _vp_noise_normalize_sort and _vp_couple
ICL in 9.1.032 version rise
Correcting the description mistake of the optimization cord/code
website (http://homepage3.nifty.com/blacksword/index.htm)
Release 20061110 is out:
2006/11/03 Lancer 20061110
Correcting the trouble which cannot encode the monaural sound source in multithread operation edition.
Improving the SSE optimization of _vp_couple.
Modifying the calculation which disperses the load at the time of the multithread operation of _vp_couple.
Reducing the cord/code of _vp_offset_and_mix.
Regards
Franklin
I've made a little comparison between standard
OggEnc Win32 aoTuV beta5 2006/11/11
and
oggenc283_sse3mt_lancer20061110
Results are quite impressive:
c:\ogg>oggenc -q2 "thom yorke - harrowdown hill.wav"
Opening with wav module: WAV file reader
Encoding "thom yorke - harrowdown hill.wav" to
"thom yorke - harrowdown hill.ogg"
at quality 2,00
[100,0%] [ 0m00s remaining] -
Done encoding file "thom yorke - harrowdown hill.ogg"
File length: 4m 41,0s
Elapsed time: 0m 15,0s
Rate: 18,7956
Average bitrate: 92,6 kb/s
c:\ogg>oggenc2 -q2 "thom yorke - harrowdown hill.wav"
Opening with wav module: WAV file reader
Encoding "thom yorke - harrowdown hill.wav" to
"thom yorke - harrowdown hill.ogg"
at quality 2,00
[100,0%] [ 0m00s remaining] \
Done encoding file "thom yorke - harrowdown hill.ogg"
File length: 4m 41,0s
Elapsed time: 0m 2,834s
Rate: 99,482475
Average bitrate: 92,6 kb/s
Lancer is 5.5 times faster...
Anyone knows where to get one of these optimized vorbis builds for Linux?. Can this patch (http://homepage3.nifty.com/blacksword/patch_vorbis_lancer20061110.zip) be applied to vorbis' source code and then be built? Using which compiler?
Thanks in advance!
Lancer is 5.5 times faster...
Yeah, I wanted to modify the title of this thread but couldn't.
Anyone knows where to get one of these optimized vorbis builds for Linux?. Can this patch (http://homepage3.nifty.com/blacksword/patch_vorbis_lancer20061110.zip) be applied to vorbis' source code and then be built? Using which compiler?
Intel C/C++ compiler. I'm not sure whether if it can be compiled with the linux version of the compiler.
He dropped GCC support a while back, didn't he? Or was it just code that's untested on gcc?
Current Lame versions (3.98) can be compiled in 64bits mode, that is how I am using it most of the time.
Using VC8 as a compiler, it increases encoding speed by about 20% compared to 32bits mode.
While trying to sound as un-redundant as possible, I observed speeds as high as 24.2x (primarily in the range of 16-20x), compared to only 10.0'ish from what I remember last... and while I didn't run actual comparisons of the files using any kind of sophisticated process, I did ABX and compare bitrates/qualities of files using both aotuv-b5 and the latest lancer build and was unable to distinguish between the two. Beautiful ... thanks guys!
Steve Jobs disappointed me by not coming out with a 100 Gb iPod to hold the 9,800 M4As I'd spent since Christmas transcoding from FLAC to Nero M4As.
So I've set to re-encoding as Ogg Vorbis q 4.5 in hopes that can make them fit on my 60 Gb Cowon iAudio.
It's proceeding right now on the Sony Vaio 2 GHz Core Duo notebook I bought Tuesday.
OMG, is it ever fast with the SSE3 MT build! I'm surprised how much faster it is than using SSE2 optimization on the Athlon64 3300+ desktop under Win X64.
Thanks so much to all the developers!
Good choice, more people should give Steve Jobs the finger.
New Vorbis optimization project here (http://softlab.technion.ac.il/project/OggVorbis/html/index.htm), check it out !
Someone test it and tell us how it compares to Lancer.
Someone test it and tell us how it compares to Lancer.
They tuned Xiph Vorbis 1.0.1 and boast a performance increase of 18%. I can't seem to find a date on their page or in the documentation. Also, the download size for the "binaries" package is an impressive 90MiB. Nothing to see here, move along...
i tested it and on their intel-optimized binary, the radio.wav file which was included took 45 seconds to encode on -q10. the non-intel binary took 49.
i also tested out Lancer's builds(SSE2-Threaded one) and i got 22 seconds.
as MedO said, nothing to see here.
Steve Jobs disappointed me by not coming out with a 100 Gb iPod to hold the 9,800 M4As I'd spent since Christmas transcoding from FLAC to Nero M4As.
Honestly, do you need to carry 9800 songs with you?
Need isn't as significant as the ability to do so. It gives one the greatest variety of music when not tethered to their PC.
Some people travel and are not at home for weeks or months at a time. iTunes says my 55 gb "checked playlist" lasts 30.7 days. It was enough for the last time I was out of town for 3 weeks, and it's nice to know I didn't run out of music.
How much faster are SSE2 and SSE3 versions of lancer compared to SSE? My CPU only supports SSE, and I would like to know how much boost I can expect from a CPU upgrade.
Are the MT versions ~twice as fast on a dual core cpu, e.g. Core2Duo / Athlon64 X2 ?
FLAC -> OGG conversion runs at 21x on my system (Athlon XP-M 2600+ @ 10x200), how much can I expect from an Athlon64 X2 3800+ ?
How much faster are SSE2 and SSE3 versions of lancer compared to SSE?
SSE2/3 has less importance on lancer. But MT enables ~1.4 times faster encoding.
Benchmark on Athlon64 X2 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=29161&view=findpost&p=390175) and Core2Duo (http://nyaochi.sakura.ne.jp/encoder-benchmark/result-20061103.html) FYI.
Lancers MT makes use of up to 2 core per encoding. If you have quad core, you have to run 2 instances at a time, Lancer is enough fast tho
How much faster are SSE2 and SSE3 versions of lancer compared to SSE?
SSE2/3 has less importance on lancer. But MT enables ~1.4 times faster encoding.
Benchmark on Athlon64 X2 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=29161&view=findpost&p=390175) and Core2Duo (http://nyaochi.sakura.ne.jp/encoder-benchmark/result-20061103.html) FYI.
Lancers MT makes use of up to 2 core per encoding. If you have quad core, you have to run 2 instances at a time, Lancer is enough fast tho
I kind of wonder why you don't just run two single-threaded encoders instead of the multithreaded one. Usually you'll encode multiple files anyway. That way it'd be ~2x as fast instead of 1.4x...
I kind of wonder why you don't just run two single-threaded encoders instead of the multithreaded one. Usually you'll encode multiple files anyway. That way it'd be ~2x as fast instead of 1.4x...
We techheads will do that. But simpler users (i.e. the overwhelming majority of PC users) tend to encode one at a time.
ye, new problems are coming up, there are 4-core cpu`s on the market, and early will be 8. We need a solution, unlike:
1. lancer need`s to check out how many cores in system
2. use them all
3. it will be bad idea to limit him on 8 cores...(maybe this is not the end?)
ye, new problems are coming up, there are 4-core cpu`s on the market, and early will be 8. We need a solution, unlike:
1. lancer need`s to check out how many cores in system
2. use them all
3. it will be bad idea to limit him on 8 cores...(maybe this is not the end?)
and add SSE5 and SSE6 support!!!
Does anyone know if there are lancer static-built binaries for windows I can download anywhere? This would be very useful, thanks.
Does anyone know if there are lancer static-built binaries for windows I can download anywhere? This would be very useful, thanks.
http://homepage3.nifty.com/blacksword/index.htm (http://homepage3.nifty.com/blacksword/index.htm)
http://translate.google.com/translate?u=ht...Flanguage_tools (http://translate.google.com/translate?u=http%3A%2F%2Fhomepage3.nifty.com%2Fblacksword%2Findex.htm&langpair=ja|en&hl=en&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools) (same page translated by Google)
Can someone who owns the Intel compiler please build and make available windows static library binaries ( static-link libraries (*.lib) that do not depend on any ogg/vorbis dll at run time ) ? Ideally these would be compiled using the release multi--threaded version of MSVCRT, thanks!
The Lancer's webpage isn't available more, but you can use a copy on the Web Archive:
https://web.archive.org/web/20100217183320/http://homepage3.nifty.com/blacksword/
The fastest versions:
https://web.archive.org/web/20100217183320/http://homepage3.nifty.com/blacksword/oggenc283_sse3_lancer20061110.zip (SSE3, single threaded)
https://web.archive.org/web/20100217183320/http://homepage3.nifty.com/blacksword/oggenc283_sse3mt_lancer20061110.zip (SSE3, multi threaded)
Still the fastest Ogg Vorbis encoder. What a pity that the author abandoned this project.
I think very few people care about audio encoder speed in the era of fast multicore processors.
I think very few people care about audio encoder speed in the era of fast multicore processors.
Hi
Currently I'm using...
$ oggenc -v
OggEnc v2.83 (Lancer [Nov 14 2009](SSE) based on aoTuV b5)
With a single-core machine.
Is there anything faster out there - for LINUX?
I think very few people care about audio encoder speed in the era of fast multicore processors.
I'd rather wait for 3-4 times less time with almost the same quality.
oggenc_general_x64 - http://www.rarewares.org/files/ogg/oggenc2.88-1.3.5-x64.zip
oggenc_lancer_sse3mt - https://web.archive.org/web/20100217183320/http://homepage3.nifty.com/blacksword/oggenc283_sse3mt_lancer20061110.zip
C:\Users\VEG\Desktop>oggenc_general_x64 -q 0 test.wav
Opening with wav module: WAV file reader
Encoding "test.wav" to
"test.ogg"
at quality 0.00
[100.0%] [ 0m00s remaining] -
Done encoding file "test.ogg"
File length: 9m 51.0s
Elapsed time: 0m 12.0s
Rate: 49.2648
Average bitrate: 51.8 kb/s
C:\Users\VEG\Desktop>oggenc_lancer_sse3mt -q 0 test.wav
Opening with wav module: WAV file reader
Encoding "test.wav" to
"test.ogg"
at quality 0.00
[100.0%] [ 0m00s remaining] -
Done encoding file "test.ogg"
File length: 9m 51.0s
Elapsed time: 0m 3.841s
Rate: 153.912430
Average bitrate: 57.6 kb/s
12.0 seconds vs. 3.8 seconds for one song.
I'm using it for coding of music for my phone.
You mean from multithreading? I can get an 8-10x speed up using foobar and the stock encoder with no loss of quality. I don't think using the mt version makes sense.
Anyway, if you are interested in encoding speed, you should work on it. Multithreading may not make sense but further x64 asm or see intrinsics would likely help a lot.
If you are interested in an up to date & fast (faster than stock libvorbis) Vorbis encoder, check out https://hydrogenaud.io/index.php/topic,109766.0.html
I think very few people care about audio encoder speed in the era of fast multicore processors.
Hi
Currently I'm using...
$ oggenc -v
OggEnc v2.83 (Lancer [Nov 14 2009](SSE) based on aoTuV b5)
With a single-core machine.
Is there anything faster out there - for LINUX?
Bump
I think very few people care about audio encoder speed in the era of fast multicore processors.
Hi
Currently I'm using...
$ oggenc -v
OggEnc v2.83 (Lancer [Nov 14 2009](SSE) based on aoTuV b5)
With a single-core machine.
Is there anything faster out there - for LINUX?
Bump
Has this thread died, or is it a very very difficult question?
I think dutch109 answered that question anyway with a link to faster builds for single core machines.
I think very few people care about audio encoder speed in the era of fast multicore processors.
Hi
Currently I'm using...
$ oggenc -v
OggEnc v2.83 (Lancer [Nov 14 2009](SSE) based on aoTuV b5)
With a single-core machine.
Is there anything faster out there - for LINUX?
Bump
Has this thread died, or is it a very very difficult question?
Did you see dutch109's post above? It has a link to newer, more optimized builds than you are running.
Did you see dutch109's post above? It has a link to newer, more optimized builds than you are running.
I don't see a link to optimized linux builds, it is a thread about patches.
https://www.freac.org/en
It has a link to newer, more optimized builds than you are running.
Unfortunately, it only uses the name of the Lancer, but it works much slower than the original Lancer. Quality is almost the same, all Vorbis encoders provide almost the same quality. libvorbis doesn't use patches from the latest aoTuV versions, just because improvements are barely perceptible, if they are perceptible at all. I don't think that authors of the libvorbis would ignore aoTuV patches if improvements were indisputable.
What is the difference in speed between the current and original builds?
libvorbis doesn't use patches from the latest aoTuV versions, just because improvements are barely perceptible, if they are perceptible at all. I don't think that authors of the libvorbis would ignore aoTuV patches if improvements were indisputable.
They might also ignore AoTuv changes because they disagree with the result it produces, see the changelog from libvorbis 1.3.2:
...
* vorbisenc: Back out an [old] AoTuV HF weighting that was
first enabled in 1.3.0; there are a few samples where I
really don't like the effect it causes.
...
https://github.com/xiph/vorbis/blob/master/CHANGES#L36
They might also ignore AoTuv changes because they disagree with the result it produces, see the changelog from libvorbis 1.3.2
Noticed bit rate bloat too, Like some samples average 200 ~ 385kbps on AoTuv . Yet libvorbis stays within 100 ~ 190kbps with zero effect to sound quality?.