HydrogenAudio

Hydrogenaudio Forum => Validated News => Topic started by: bryant on 2015-05-26 23:08:16

Title: WavPack 4.75.0 has been released
Post by: bryant on 2015-05-26 23:08:16
WavPack 4.75.0

Changes:

General information about WavPack can be found on the WavPack website (http://www.wavpack.com/index.html).

Source code and official binaries are available for download here (http://www.wavpack.com/downloads.html).
Title: WavPack 4.75.0 has been released
Post by: bryant on 2015-06-01 21:31:58
After discussions starting with this post (http://www.hydrogenaud.io/forums/index.php?s=&showtopic=109098&view=findpost&p=899192) on the release candidate thread, I went ahead and added assembly optimizations for the "extra" (-x) processing mode for mono audio. I originally did not do this because it seemed like sort of an edge case, but have since recalled that multichannel 5.1 audio contains two channels that are encoded in mono (FC and LFE) and that the original mono code was not very fast (except on ICC compiles with modified source -- thanks Case for reminding me of that).

This does not affect mono encoding without the "extra" mode, nor does it affect pure stereo at all, but for 5.1 audio it's now faster than the other compiles. Obviously it's too late for the 4.75.0 release, but maybe something will be found that can justify an update. In the meantime I have uploaded 32-bit and 64-bit binaries here.

These are the updated results on my i7 machine and a 24-bit, 5.1 test file encoded with -hx2:

Code: [Select]
version      encode     decode
------------------------------
4.70.0       21.53s     7.63s
4.70.0 x64   25.46s     7.30s
------------------------------
4.75.0       19.19s     5.27s
4.75.0 x64   18.39s     5.12s
------------------------------
4.75.1       16.44s     -----
4.75.1 x64   16.02s     -----


bb10, hope you can give this a shot! 

--David
Title: WavPack 4.75.0 has been released
Post by: bb10 on 2015-06-02 23:11:31
bb10, hope you can give this a shot! 

--David


Thanks and of course!

CPU: Core i7 2630QM
Norah Jones - 12. Nightingale [4:12] (24-bit 6ch WAV) -h -x2 option:

Code: [Select]
X                  |     run 1     |     run 2      |     decode     |     decode 2   |
4.70.0 x86         |     37.88s    |     38.05s     |     11.10s     |     11.28s     |
4.70.0 x64         |     39.88s    |     39.76s     |     10.73s     |     10.72s     |
4.75.0 x86         |     33.75s    |     34.06s     |      8.59s     |      8.73s     |
4.75.0 x64         |     32.79s    |     32.59s     |      8.48s     |      8.68s     |
4.75.1 x86         |     29.17s    |     29.19s     |       --       |       --       |
4.75.1 x64         |     28.60s    |     28.61s     |       --       |       --       |
gcc 4.75.1 x86     |     27.63s    |     27.72s     |      8.86s     |      8.97s     |
gcc 4.75.1 x64     |     27.19s    |     27.16s     |      8.28s     |      8.45s     |
lvqcl-icc x86      |     27.91s    |     27.86s     |      8.11s     |      8.17s     |
lvqcl-icc x64      |     27.54s    |     27.60s     |      7.65s     |      7.80s     |
lvqcl-msvc2k13 x86 |     28.75s    |     28.86s     |      8.84s     |      9.03s     |
lvqcl-msvc2k13 x64 |     28.60s    |     28.54s     |      8.45s     |      8.28s     |
icc-mmx 4.70.0 x32 |     36.99s    |     36.62s     |       --       |       --       |
icc-mmx 4.70.0 x64 |     28.10s    |     28.03s     |       --       |       --       |
icc-aw 4.75.1 x86  |     27.93s    |     27.97s     |      8.06s     |      8.19s     |
icc-aw 4.75.1 x64  |     27.09s    |     27.07s     |      7.65s     |      7.48s     |


Norah Jones - 12. Nightingale [4:12] (16-bit 6ch WAV) -h -x2 option:

Code: [Select]
X                  |     run 1     |     run 2      |     decode     |     decode 2   |
4.70.0 x86         |     37.00s    |     36.98s     |     10.24s     |     10.26s     |
4.70.0 x64         |     36.19s    |     36.21s     |      9.90s     |     10.00s     |
4.75.0 x86         |     33.70s    |     33.56s     |      8.46s     |      8.34s     |
4.75.0 x64         |     32.56s    |     32.38s     |      8.29s     |      8.39s     |
4.75.1 x86         |     29.00s    |     29.05s     |       --       |       --       |
4.75.1 x64         |     28.49s    |     28.50s     |       --       |       --       |
gcc 4.75.1 x86     |     27.97s    |     27.88s     |      8.58s     |      8.64s     |
gcc 4.75.1 x64     |     27.22s    |     27.19s     |      8.28s     |      8.13s     |
lvqcl-icc x86      |     27.87s    |     27.74s     |      7.71s     |      7.67s     |
lvqcl-icc x64      |     27.51s    |     27.65s     |      7.46s     |      7.37s     |
lvqcl-msvc2k13 x86 |     28.81s    |     28.99s     |      8.44s     |      8.48s     |
lvqcl-msvc2k13 x64 |     28.51s    |     28.38s     |      8.12s     |      8.10s     |
icc-mmx 4.70.0 x32 |     41.94s    |     41.96s     |       --       |       --       |
icc-mmx 4.70.0 x64 |     28.89s    |     28.93s     |       --       |       --       |
icc-aw 4.75.1 x86  |     28.00s    |     28.07s     |      7.76s     |      7.69s     |
icc-aw 4.75.1 x64  |     27.06s    |     26.95s     |      7.36s     |      7.35s     |



4.75.1 is now the fastest build with 16bit files! But the 4.70.0 icc build with apply_weight is still ever so slightly faster with 24bit files.

Thanks to lamedude for providing the 4.75.0 icc binary. I wonder if a 4.75.1 icc binary would make a difference.
Title: WavPack 4.75.0 has been released
Post by: lamedude on 2015-06-03 15:01:11
4.75.1 (http://daman6009.home.comcast.net/wavpack.zip)
64bit has modified apply_weight (using line 456 & 465 (https://github.com/dbry/WavPack/blob/master/src/wavpack_local.h#L456)).
Title: WavPack 4.75.0 has been released
Post by: lvqcl on 2015-06-03 17:34:29
I suspect that the definition of apply_weight doesn't matter now when both 32-bit and 64-bit WavPack uses asm code. And the difference between en/decoders made by different compilers should be rather small.

Anyway, I compiled The latest sources with GCC 4.9.2, MSVS 2013 Update 4 and ICC XE 2015 Update 4. (No changes in the source code, just vanilla settings).
Title: WavPack 4.75.0 has been released
Post by: bryant on 2015-06-03 20:25:47
4.75.1 is now the fastest build with 16bit files! But the 4.70.0 icc build with apply_weight is still ever so slightly faster with 24bit files.

Thanks to lamedude for providing the 4.75.0 icc binary. I wonder if a 4.75.1 icc binary would make a difference.
Thanks so much for testing all these and posting the results! I suspect that the little difference for 24-bit files is now in the realm of CPU to CPU variation because on my i7 the icc version is slightly slower. I'll still take this as a win!

An icc binary of 4.75.1 would probably be a little faster still, but it would be pretty minor I think because as lvqcl says, almost all the time is spent in assembly language now.

4.75.1 (http://daman6009.home.comcast.net/wavpack.zip)
64bit has modified apply_weight (using line 456 & 465 (https://github.com/dbry/WavPack/blob/master/src/wavpack_local.h#L456)).
Sorry, I should have noticed that the apply_weight modification would have to be in two places now because of some "improvements" I made to those macros (they really should make more sense now). But anyway, I see you figured it out... 

Anyway, I compiled The latest sources with GCC 4.9.2, MSVS 2013 Update 4 and ICC XE 2015 Update 4. (No changes in the source code, just vanilla settings).
Very cool...thanks for generating these! I'm glad to see that it wasn't too much trouble to build them on newer MSVS; it was a little bit of a hack for me to get that working on 2008 (I had to edit the project files by hand). Kudos to Microsoft if it just worked; kudos to you if it didn't!
Title: WavPack 4.75.0 has been released
Post by: lvqcl on 2015-06-03 22:58:33
I'm glad to see that it wasn't too much trouble to build them on newer MSVS; it was a little bit of a hack for me to get that working on 2008 (I had to edit the project files by hand). Kudos to Microsoft if it just worked; kudos to you if it didn't!

There's a small nuisance: MSVS 2013 tries to build 32-bit exe with /SAFESEH option and Wavpack .asm modules don't use it. So it's necessary either to remove /SAFESEH from .exe linker options, or add this option to the *_x86.asm files. No problems with 64-bit builds.
Title: WavPack 4.75.0 has been released
Post by: bb10 on 2015-06-04 01:47:15
Updated the tables. lamedudes 4.75.1 64bit icc encode takes the crown once again and is followed closely by gcc, though the difference between icc and the rest isn't as big as last time with 4.70.0.
Title: WavPack 4.75.0 has been released
Post by: BlueKnight on 2015-08-16 22:43:46
Nice!

wavpack -hx2 <input_pcm_s16le44kHz.wav>:
Code: [Select]
4.70.0: 4.13s
4.75.1: 3.16s


Thank you!
Title: WavPack 4.75.0 has been released
Post by: lamedude on 2015-11-04 04:52:09
Heads up they nerfed MMX on Skylake.  On page 69 (http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf).
Quote
In processors based on the Skylake microarchitecture, the functionality of the MMX instruction set is unchanged from prior generations. But many MMX instructions are constrained to execute to one port with half the instruction throughput relative to prior microarchitectures.

Title: WavPack 4.75.0 has been released
Post by: lvqcl on 2015-11-04 08:52:43
It's actually page 11-69 (section 11.16.5):
Quote
The MMX instructions with throughput constraints include:
• PADDS[B/W], PADDUS[B/W], PSUBS[B/W], PSUBUS[B/W].
• PCMPGT[B/W/D], PCMPEQ[B/W/D].
• PMAX[UB/SW], PMIN[UB/SW].
• PAVG[B/W], PABS[B/W/D], PSIGN[B/W/D].

But Wavpack uses only pcmpeqd and paddusw instructions, so maybe it won't suffer much from this.