Thanks for making this comparison easier!
My results:
SAMPLE 32 KHz 44 KHz ABX 32 vs 44 MAIN ISSUE
01 Castanets 2.5 4.0 16 out of 16, pval < 0.001 Pre-echo
02 Fatboy 1.0 1.8 8 out of 8, pval = 0.004 Noise; grain; uglyness
03 EIG 2.0 3.5 5 out of 5, pval = 0.031 Irregular smearing
04 Bachpsichord 3.3 4.0 8 out of 8, pval = 0.004 Smearing or pre-eho
05 Enola Gay 1.5 2.5 12 out of 16, pval = 0.038 Tonal distortion
06 Trumpet 4.5 5.0 no test no obvious issue
07 Applaud 3.0 2.0 19 out of 20, pval < 0.001 44K is noisier
08 Velvet 3.8 4.3 no test minor lowpass (?)
09 Linchipin 1.7 2.7 12 out of 12, pval < 0.001 excessive smearing
10 Spill the blood 4.2 4.5 no test
11 Female Speech 4.6 4.8 no test
12 French ad 4.4 5.0 no test
3.04 3.67
Except for sample #7 (applauds) the 32 KHz resampled mode 2 is inferior. ABX are not systematic. The most common issue is smearing / preecho / unsharpness / noise increased with 32 KHz (except for sample #7). Tonal issues are also perceptible with Enola Gay sample.
These 12 samples are probably not representative of a generic musical experience but an educated guess would be that 32 KHz resampling would lower the quality of many transients with no obvious benefits (at least for my ears) on other parts.
MINOR ISSUE: the test name for each sample is the same everywhere (Sample01 41_30sec). It's not a problem but I report it.
EDIT: tested with AKG q701 headphone and laptop basic soundcard/headphone output. High Volume.