I have tested the files (using suggested command-line with timer) that you have suggested (he-jo) on my two machines!
TESTED FILES:
41_30sec.wav
Bartok_strings2.wav
BigYellow.wav
bodyheat.wav
DaFunk.wav
dBTech_122-96_24bit_web.wav
EnolaGay.wav
Leahy.wav
Mama.wav
mytek_8X96_24bit_web.wav
NewYorkCity.wav
OrdinaryWorld.wav
prism_AD124_24bit_web.wav
Quizas.wav
rosemary.wav
SinceAlways.wav
thear1.wav
TheSource.wav
trust.wav
Twelve.wav
Waiting.wav
CELERON MACHINE SPECS:
OS: Microsoft Windows XP Professional Service Pack 2
CPU: Intel Celeron-A, 466 MHz (7 x 67)
MEMORY: 256 MB (SDRAM)
MOBO: Intel 82440BX/ZX
VIDEO: NVIDIA RIVA TNT (16 MB)
MONITOR: Compaq V55 [15 inch CRT]
SOUND: ESS Technology ES1938/ES1941/ES1946 Solo-1(E) Sound Card
HDD: ST310211A (10 GB, 5400 RPM, Ultra-ATA/100)
OPTICAL: HL-DT-ST CD-RW GCE-8480B (48x/16x/48x CD-RW)
ATHLON MACHINE SPECS:
OS: Microsoft Windows XP Professional Service Pack 2
CPU: AMD Athlon XP, 1666 MHz (12.5 x 133) 2000+
MEMORY: 512 MB (DDR SDRAM)
MOBO: ECS KT600-A (4 PCI, 1 AGP, 1 CNR, 3 DDR DIMM, Audio, LAN)
VIDEO: WinFast A340
MONITOR: Samsung SyncMaster 750(M)s(T) [17 inch CRT]
SOUND: VIA AC97 Enhanced Audio Controller
HDD: Maxtor 6 Y080M0 SCSI Disk Device (80 GB, 7200 RPM, SATA)
OPTICAL: HL-DT-ST CD-RW GCE-8520B (52x/24x/52x CD-RW)
RESULTS FOR CELERON:
OVERALL RESULTS [-f -x6]:
----------------------------------------------
TEST_ORIGINAL
Kernel Time = 3.304 = 00:00:03.304 = 0%
User Time = 471.017 = 00:07:51.017 = 98%
Process Time = 474.322 = 00:07:54.322 = 98%
Global Time = 479.960 = 00:07:59.960 = 100%
----------------------------------------------
TEST_BSR [+7,84%]
Kernel Time = 3.424 = 00:00:03.424 = 0%
User Time = 433.733 = 00:07:13.733 = 98%
Process Time = 437.158 = 00:07:17.158 = 98%
Global Time = 442.477 = 00:07:22.477 = 100%
----------------------------------------------
TEST_JFL2B [+2,08%]
Kernel Time = 3.324 = 00:00:03.324 = 0%
User Time = 461.123 = 00:07:41.123 = 98%
Process Time = 464.447 = 00:07:44.447 = 98%
Global Time = 469.715 = 00:07:49.715 = 100%
----------------------------------------------
TEST_MMX [+14,30%]
Kernel Time = 2.954 = 00:00:02.954 = 0%
User Time = 403.560 = 00:06:43.560 = 98%
Process Time = 406.514 = 00:06:46.514 = 98%
Global Time = 411.352 = 00:06:51.352 = 100%
----------------------------------------------
TEST_MMX-BSR [+22,64%]
Kernel Time = 3.304 = 00:00:03.304 = 0%
User Time = 363.642 = 00:06:03.642 = 97%
Process Time = 366.947 = 00:06:06.947 = 98%
Global Time = 372.085 = 00:06:12.085 = 100%
----------------------------------------------
TEST_MMX-JFL2B [+16,58%]
Kernel Time = 2.984 = 00:00:02.984 = 0%
User Time = 392.684 = 00:06:32.684 = 97%
Process Time = 395.668 = 00:06:35.668 = 98%
Global Time = 400.857 = 00:06:40.857 = 100%
RSEULTS FOR ATHLON:
OVERALL RESULTS [-f -x6]:
----------------------------------------------
TEST_ORIGINAL
Kernel Time = 0.765 = 00:00:00.765 = 0%
User Time = 100.734 = 00:01:40.734 = 98%
Process Time = 101.500 = 00:01:41.500 = 99%
Global Time = 102.281 = 00:01:42.281 = 100%
----------------------------------------------
TEST_BSR [+3,65%]
Kernel Time = 0.953 = 00:00:00.953 = 0%
User Time = 96.843 = 00:01:36.843 = 98%
Process Time = 97.796 = 00:01:37.796 = 99%
Global Time = 98.484 = 00:01:38.484 = 100%
----------------------------------------------
TEST_JFL2B [+0,55%]
Kernel Time = 0.921 = 00:00:00.921 = 0%
User Time = 100.015 = 00:01:40.015 = 98%
Process Time = 100.937 = 00:01:40.937 = 99%
Global Time = 101.688 = 00:01:41.688 = 100%
----------------------------------------------
TEST_MMX [+12,78%]
Kernel Time = 0.812 = 00:00:00.812 = 0%
User Time = 87.718 = 00:01:27.718 = 98%
Process Time = 88.531 = 00:01:28.531 = 99%
Global Time = 89.078 = 00:01:29.078 = 100%
----------------------------------------------
TEST_MMX-BSR [+16,49%]
Kernel Time = 0.796 = 00:00:00.796 = 0%
User Time = 83.968 = 00:01:23.968 = 98%
Process Time = 84.765 = 00:01:24.765 = 99%
Global Time = 85.375 = 00:01:25.375 = 100%
----------------------------------------------
TEST_MMX-JFL2B [+13,50%]
Kernel Time = 0.968 = 00:00:00.968 = 1%
User Time = 86.828 = 00:01:26.828 = 98%
Process Time = 87.796 = 00:01:27.796 = 99%
Global Time = 88.484 = 00:01:28.484 = 100%
Well the conclusion is after done this tests that MMX with BSR is fastest on slower machine ~22% with -f -x6 options, the JFL2B optimization is just tiny faster, but is this caused by my build, almost certainly YES, because he-jo have stated that this optimization is ~8% faster. I do not know where is the problem with my implementation, it will be nice if some one familiar with VC++ and NASM could check my sources.