Results for v3 binary:
-j1: Average time = 22.844 seconds (3 rounds), Encoding speed = 473.30x
-j2: Average time = 18.255 seconds (3 rounds), Encoding speed = 592.27x
-j3: Average time = 9.570 seconds (3 rounds), Encoding speed = 1129.82x
-j4: Average time = 6.603 seconds (3 rounds), Encoding speed = 1637.35x
-j5: Average time = 6.646 seconds (3 rounds), Encoding speed = 1626.76x
-j6: Average time = 7.094 seconds (3 rounds), Encoding speed = 1524.18x
-j7: Average time = 6.446 seconds (3 rounds), Encoding speed = 1677.41x
-j8: Average time = 6.539 seconds (3 rounds), Encoding speed = 1653.46x
-j9: Average time = 7.046 seconds (3 rounds), Encoding speed = 1534.42x
-j10: Average time = 7.123 seconds (3 rounds), Encoding speed = 1517.90x
-j11: Average time = 6.800 seconds (3 rounds), Encoding speed = 1589.92x
-j12: Average time = 6.286 seconds (3 rounds), Encoding speed = 1719.92x
Scales almost perfectly at the beginning (1 nc thread = 18.3 sec, 2 nc threads = 9.6 sec, 3 nc threads = 6.6 sec), but after that nothing/little is gained. Does the thread management take all the extra time the additional cores could provide?