I ran some tests with the various versions and included GCC 4.82 compiles in the mix too. These results are from Core i7-4771. I ran each test twice and used fastest results of each. The parameters I used were the same bb10 used: -h -x2. The numbers are speed x realtime.
official 32-bit:
16-bit: 78,35212648x
24-bit: 77,01991355x
official 64-bit:
16-bit: 78,62632693x
24-bit: 63,22435532x
lvqcl original 32-bit:
16-bit: 78,14190364x
24-bit: 86,93761979x
lvqcl original 64-bit:
16-bit: 79,32029858x
24-bit: 63,45351647x
lvqcl modified 32-bit:
16-bit: 78,17297656x
24-bit: 86,952771x
lvqcl modified 64-bit:
16-bit: 78,65516379x
24-bit: 76,97832292x
intel 32-bit:
16-bit: 69,03704896x
24-bit: 89,67199856x
intel 64-bit:
16-bit: 98,29642284x
24-bit: 90,80626081x
gcc 4.82 32-bit:
16-bit: 77,13143698x
24-bit: 72,1682216x
gcc 4.82 64-bit:
16-bit: 79,48867848x
24-bit: 72,89575572x