I'm not sure I understand... Why is Sox internally converting to a higher bit depth then?
I found this relevant information on the sox website:
Specifically, by default, SoX automatically adds TPDF dither when the output bit-depth is less than 24 and any of the following are true:
• bit-depth reduction has been specified explicitly using a command-line option
• the output file format supports only bit-depths lower than that of the input file format
• an effect has increased effective bit-depth within the internal processing chain
For example, adjusting volume with vol 0.25 requires two additional bits in which to losslessly store its results (since 0.25 decimal equals 0.01 binary). So if the input file bit-depth is 16, then SoX’s internal representation will utilise 18 bits after processing this volume change. In order to store the output at the same depth as the input, dithering is used to remove the additional bits.
It's not clear to me how any of the 3 points applies. If I understood you right you're talking about the third point, but I'm not changing volume...
Edit: Oh, I think I have an idea why... When it is upsampling it has to create new samples... Basically interpolating their amplitude in regards to its neighbours. If the accuracy of 16 bit didn't suffice to accurately represent the value of the new sample it uses a higher internal representation and has to dither to go back to 16 bit?! ... Maybe?