When using ffmpeg, height of image must be power of 2 to show full spectrum. Not very convenient.
SoX doesn't support unicode. It can take audio from pipe from ffmpeg (and ffmpeg then can take picture from pipe from SoX to write unicode text on it), but to generate spectrogram from pipe, it is needed to set duration in SoX command.
Because, i am not good in writing complicated batch scripts, i personally use foobar2000 with foo_run and ffmpeg together with SoX to avoid aforementioned problems.
Command for foo_run:
cmd.exe /d /c ffmpeg -v 16 -i "%path%" -f sox -|sox -p -n spectrogram -d %length_samples%s -x 1600 -Y 1200 -t " " -o -|ffmpeg -v 16 -y -i - -vf "drawtext=''text=$replace(%filename_ext%,'',$char(8217)):font=ArialUnicodeMS:fontcolor=white:fontsize=$ifgreater($len(%filename_ext%),170,16,20):x=5:y=10''" "%path%.png"
I use font ArialUnicodeMS because it has very many symbols of unicode in it. In case, some symbols are not present in used font, they will be replaced by squares.