Tested on some old Ubuntu (desktop):
- gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
- Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
Compiled with -O3. Run:
./art -q -y -4 -r96000 INPUT OUTPUT
5 times:
for ((i=1;i<=5;i++)); do echo -n "$i "; { time ... ; } 2>&1 | grep "real" | cut -f2; done
and discarding max and min. The input is 30s of float/44.1k.
Comparing this change:
- sum1 = apply_filter (cxt->filters [i], source - cxt->numTaps / 2 + 1, cxt->numTaps);
- sum2 = apply_filter (cxt->filters [i+1], source - cxt->numTaps / 2 + 1, cxt->numTaps);
+ sum1 = apply_filter (cxt->filters [i], source - cxt->numTaps / 2 + 1, cxt->numTaps/8*8);
+ sum2 = apply_filter (cxt->filters [i+1], source - cxt->numTaps / 2 + 1, cxt->numTaps/8*8);
There seem to be some significant time improvement with version 4:
version
1 2 3 4
2 0m6.418s 2 0m3.355s 2 0m2.201s 2 0m3.774s
cxt->numTaps 4 0m6.458s 3 0m3.357s 3 0m2.200s 3 0m3.772s
5 0m6.424s 5 0m3.373s 5 0m2.205s 4 0m3.826s
2 0m6.423s 1 0m3.365s 2 0m2.199s 1 0m3.424s
cxt->numTaps/8*8 4 0m6.421s 3 0m3.363s 3 0m2.197s 2 0m3.388s
5 0m6.417s 5 0m3.379s 5 0m2.198s 3 0m3.399s
But then tested on a bit newer Ubuntu (laptop):
- gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
- Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
and there is no change:
4
1 0m3.666s
2 0m3.683s
4 0m3.696s
1 0m3.683s
3 0m3.668s
5 0m3.665s