Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free (Read 29797 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

hello,
as pre-discussed in the wrong thread in
https://hydrogenaud.io/index.php/topic,119448.msg1058386.html#msg1058386
https://hydrogenaud.io/index.php/topic,119448.msg1058314.html#msg1058314
I'm looking for a way to change the pitch of multi channel movie audio tracks via commandline in ffmpeg or a fork of ffmpeg.
The goal is to have as less audible artefacts and as less length change (1-2 ms is ok) as a pitch change in Audacity would have.
The common scenario when working with movie audio tracks is a 2 step conversion. For example a standard ffmpeg speedup from 23,976fps to 25fps would be
Code: [Select]
 -af aresample=resampler=soxr:precision=20:osf=s16:dither_method=triangular,asetrate=50050 -ar 48000 -acodec pcm_s16le -f WAV %1%-25.wav 

For most of the cases the pitch is correct then and you don't need to apply any additional pitch shift after that.
So if an audio dub was dubbed in 23,976fps and was only sped up to 25fps for a PAL VHS or DVD release it would sound correct again when slowed down back to 23,976 to match the fps of a Bluray.
However some audio dubs were made in 25fps as they were meant for a 25fps PAL video release only or because the dub studio didn't know better. Theese dubs already have the pitch of 23,976 in 25fps speed and would therefore sound too low when slowed down to 23,976 speed.
In such cases a pitch correction is needed.  A summary of the different pitch correction cases would be:
Code: [Select]
23.976fps to 25fps = -4.096% = -0.72
23.976fps to 24fps = -0.100% = -0.02
25fps to 23.976fps = +4.271% = +0.72
25fps to 24fps     = +4.166% = +0.71  
24fps to 23.976fps = -0.100% = +0.02
24fps to 25fps     = -4.000% = -0.68
Looking forward to whatever you guys come up with and thank you very much

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #1
I don't have an answer for you, because if Audacity's pitch change algorithm (HQ setting) isn't good enough then I don't know what would be, certainly not in the "free" category.

Out of curiosity, what does the last figure in each of your rows represent?  For example: I don't see how "4,000% = -0,68" [I presume the comma is your decimal point, it is more conventional to use a "." and reserve commas as thousands separators].
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #2
ok I just changed the "," to "." Audacity is perfect I was just looking for a commandline alternative as audacity doesn't support useful command line access. the "=-0.68" are the semitones.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #3
I don't fully understand what the question/problem is but sox has pitch shifting:
Code: [Select]
sox INPUT OUTPUT pitch SHIFT
where SHIFT is "positive or negative 'cents' (i.e. 100ths of a semitone)".

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #4
Audacity is perfect I was just looking for a commandline alternative
I get that, it's just that you said:
The goal is to have as less audible artefacts and as less length change (1-2 ms is ok) as a pitch change in Audacity would have.
...so you're rowing back on that.

the "=-0.68" are the semitones.
Oh, OK.  And that's useful how?
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #5
I meant to have equal quality as audacity. That sentence was confusing sorry.
the semitones are useful to know if you work with cents.
I don't fully understand what the question/problem is but sox has pitch shifting:
Code: [Select]
sox INPUT OUTPUT pitch SHIFT
where SHIFT is "positive or negative 'cents' (i.e. 100ths of a semitone)".
ok I'll give it a go then for a too high 25fps track which needs to have a 24fps pitch so reduced -68 cents
Code: [Select]
 FOR %%A IN (%*) DO %sox% %%A "%%~nxA-sox-downpitch.wav" pitch -68 rate 

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #6
the semitones are useful to know if you work with cents.
OK, thanks.  That's new to me.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #7
I've spotted a couple of errors:

Code: [Select]
24fps to 23.976fps = -0.100% = +0.02 
...should read:
Code: [Select]
24fps to 23.976fps = +0.100% = +0.02 

Code: [Select]
24fps to 25fps     = -4.000% = -0.68
...should read:
Code: [Select]
24fps to 25fps     = -4.000% = -0.71
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #8
There seems to be a rubberband filter in ffmpeg as well (in certain builds at least)
https://ffmpeg.org/ffmpeg-filters.html#rubberband
(I've never used that, but looks promising).
PANIC: CPU 1: Cache Error (unrecoverable - dcache data) Eframe = 0x90000000208cf3b8
NOTICE - cpu 0 didn't dump TLB, may be hung

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #9
independent time/frequency adjustment simply cannot be artifact-free, it has to rely on some form of re-synthesis or other tricks that can't be guaranteed to work perfectly for the general case.

I'd recommend to choose what is more important for the use case (speed or pitch), and align to that, using a resampler.

(and if you suspect those disks have such a time-stretching processing already applied to them, then from a purist perspective they'll be busted anyway, but I guess adding more time-stretching on top of that can only make it worse)
a fan of AutoEq + Meier Crossfeed

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #10
rubberband was the most terrible, both the r2 and the supposed to be higher quality r3 variant. Awful metallic sound.
sox was pretty impressive but still audacity remains to be the winner here. It had the most natural overall sound without any metallic touch to it / audible loss.
I also tried the supposed to be highest quality variant of sox adding "-u" but it came out the same as the default setting. If anyone knows how to setup a maximum quality setting of sox I would definitely give that another go. Couldn't make any sense out of the online documentation.
For the file length rubberband and sox were both accurate and didn't change the overall length while audacity changed it 1ms which doesn't really matter as nothing would go out of synch because of 1 ms.
This case was a ffmpeg/sox resampling from 25fps to 23.976fps and pitch change of +72 cents /+ 0.72 semitones afterwards.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #11
(and if you suspect those disks have such a time-stretching processing already applied to them, then from a purist perspective they'll be busted anyway, but I guess adding more time-stretching on top of that can only make it worse)
The pitch change in all cases is less than a semitone.  Only somebody with "perfect pitch" would notice, and those are few and far between.  Is the OP barking at the moon here?
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #12
well the pitch of the speaker is what matters the most here. Higher or lower pitched movie related soundtrack music or effect tracks are not an issue, nobody would recognise it unless you make a direct comparision. For the voices tho' you realise it quickly if somebody sounds like beeing on valium or on helium. If you are used to certain voice dubbers you'll notice that pretty quickly.
I have to admit that the differences between sox and audacity are mininmal but still it would be amazing if you could achieve equality with a higher quality sox setting.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #13
I also tried the supposed to be highest quality variant of sox adding "-u" but it came out the same as the default setting. If anyone knows how to setup a maximum quality setting of sox I would definitely give that another go. Couldn't make any sense out of the online documentation.
Where did you find this "-u"? AFAICT the pitch effect in sox only has "-q" option, which according to the  documentation makes it quicker but may sound worse.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #14
For the voices tho' you realise it quickly if somebody sounds like beeing on valium or on helium.
Really?  At <1 semitone??  Only by direct comparison, which is not the normal use case.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #15
Where did you find this "-u"? AFAICT the pitch effect in sox only has "-q" option, which according to the  documentation makes it quicker but may sound worse.
sorry, "-u" was a typo I meant "-v".
the documentation on sourceforge is offline at the moment but there is a quote in this post which explains sox quality settings
https://community.audirvana.com/t/explanation-for-sox-filter-controls/10848/9
Code: [Select]
-q      quick  
-l      low    
-m      medium 
-h      high   
-v      very high 
but it seems they belong to other options such as resampling and won't work for the regular pitch shift case.

Really?  At <1 semitone??  Only by direct comparison, which is not the normal use case.
Of course you won't notice any differences between 24fps and 23,976fps that would be a bit too mental. But 72 cents gets noticed very often here. It's because people grew up watching their favorite movies in a specific pitch on TV, VHS, DVD and when then all of a sudden all the voices are deeper on the Blu-ray release a lot of customers go insane about it and companys even did replacement discs because of that.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #16
After being inspired by this discussion, I refactored my audio-stretch code to work with 32-bit float data and added it to my resampler tool. Now you can use one command-line tool to do the whole process, and it should always generate exactly the same length. You can specify a pitch change in cents, or a tempo change either by ratio or by specifying the desired new duration, or a combination of the two.

I think the quality is pretty good, especially with small ratio changes. It certainly does not have the artifacts of the rubberband effect!

https://github.com/dbry/audio-resampler/releases/tag/0.4

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #17
ascale and ardftsrc provide much better quality anyway.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #18
exciting. Thanks so much. I will definitely test that :-)

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #19
But 72 cents gets noticed very often here. It's because people grew up watching their favorite movies in a specific pitch on TV, VHS, DVD and when then all of a sudden all the voices are deeper on the Blu-ray release a lot of customers go insane about it and companys even did replacement discs because of that.
That's nuts.  It ain't gonna happen with speech, *might* happen for music (but not for the vast majority of listeners).
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #20
For movies it's the opposite. Speech gets more noticed than music. I'm not talking about the majority of course. More about the collectors and fan bases who buy high quality products on physical media. Legendary voice dubbers have a big fan base here. The so called majority doesn't buy physical media they just stream their movies and of course you can feed them with the shittiest audio and video quality anyways, they don't care.  Check out Universals Saturday Night LIVE archive on peacock. The videos are all deinterlaced, upscaled and encoded poorly and the aac 96 kb/s audio tracks have so many compression artefacts to them that it's unlistenable to my ears. This is sketch comedy mostly speach with 2.0 Stereo and later 2.0 Surround tracks. You don't do those in pisspoor 96 kb/s with the crapiest encoder? The old DVD's were a night and day difference and that was 192 AC3. Still this is all ok for the majority. 
But anyway this doesn't have anything to do with the main toppic. Just wanted to point out that the whole pitch shifting is not something for the average streaming watcher it's for the high quality product buyer who probably already bought the same movie at least 3-5 times before.

 

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #21
After being inspired by this discussion, I refactored my audio-stretch code to work with 32-bit float data and added it to my resampler tool. Now you can use one command-line tool to do the whole process, and it should always generate exactly the same length. You can specify a pitch change in cents, or a tempo change either by ratio or by specifying the desired new duration, or a combination of the two.

I think the quality is pretty good, especially with small ratio changes. It certainly does not have the artifacts of the rubberband effect!

https://github.com/dbry/audio-resampler/releases/tag/0.4
I did some testing and was very satisfied with the results so far. Pretty impressive. Is there a way to make this a multi processing bat?
For example such as
Code: [Select]
 FOR %%A IN ("*.wav") DO %ART% %%A "%%~nxA-24-down-pitch.wav" -4 --pitch=-72 
which would process all .wav files in one folder. That would be amazing.  I tried it but it wasn't supported. So far I only used is as a drag and drop cmd which worked.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #22
Glad it might work out for you!

I'm not sure what you mean about the batch processing. It definitely won't handle more than one file per invocation, but it should work with batch files like you show. What happened when you did that? How did it fail? Unfortunately I'm bad at Windows batch files, so I'm not sure if I can catch the error.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #23
Pretty impressive. Is there a way to make this a multi processing bat?
For example such as
Code: [Select]
 FOR %%A IN ("*.wav") DO %ART% %%A "%%~nxA-24-down-pitch.wav" -4 --pitch=-72 
which would process all .wav files in one folder. That would be amazing.  I tried it but it wasn't supported. So far I only used is as a drag and drop cmd which worked.
I can help here, but there's not enough context for me to work out what "%ART%" is supposed to do (I presume you have defined "ART" as a string representing the executable), and neither can I see how drag&drop is working even for one file (the dropped file would be "%1").  There must be more lines in the .bat you're not showing us.

I can have a guess, but I won't have time to do the analysis until later.  Meanwhile, it will be easier if you post or PM the whole .bat.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Changing Pitch -0,1% - 4,271% (PAL/NTSC) without changing length, artefact-free

Reply #24
drag and drop so a cmd file would work but a bat which would convert one file after another doesn't
the complete bat would be
Code: [Select]
@echo off
set PATH="C:\Program Files\Audio-Resampler\"
set ART=%PATH%\art.exe
FOR %%A IN ("*.wav") DO %ART% %%A "%%~nxA-24-down-pitch.wav" -4 --pitch=-72
that is supposed to pick every file with the extension ".wav" in one folder and encodes them with changed pitch one after another. It puts out the error "extra unknown argument:!"

thanks for looking into it, fooball