Skip to main content


Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Block Switch algorithm.. (Read 4266 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Block Switch algorithm..

When dealing with sequence like "Fatboy' switching to short block will get better coding efficiency. 
"Fatboy" sounds like machine man speech, its spectral and waveform are also slightly similar with speech signal (exists pitch and foment).

However, when dealing with speech signal, it is better to remain
at long block style from my experienments. It is strange.
Maybe coding each "pitch" separatedly will break their phases..!

There comes another question, since "fatboy" and "speech" has
similar spectral and waveform (pitch structure), how can i discrimnate them into each suitable switching style?

high pass the waveform still remain the pitch structure.
Since TNS is applied after the waveform  feed into Psy-Model, we
have no other method to shape the pitch structure..!

any commend will be appreciated.

Block Switch algorithm..

Reply #1
Fatboy has short energy "pulses" - or "attacks" which are unusual for human spech which is more or less stationery.

These attacks are handled very bad with long window, because noise will be spread all over the window, where actual pulse is much shorter.

Block Switch algorithm..

Reply #2
Originally posted by Ivan Dimkovic
Fatboy has short energy "pulses" - or "attacks" which are unusual for human spech which is more or less stationery.

These attacks are handled very bad with long window, because noise will be spread all over the window, where actual pulse is much shorter.

Encoding a MP3 only with short blocks should result in something
between MP3 and MP2. Frequency resolution is something
between MP2 (32 lines) and MP3/long (576 lines), subbands
are narrower at lower frequencies, wider at high frequencies,
Huffman and nonlinear coding is used.

A normal piece of music which works good at 192 kbps for MP3
and 256 kbps for MP2 should be encodable at something around
224 kbps.

Tests in 2001 had shown that MP3s only using short blocks
are sounding worse even at 320 kbps. Psychoacoustic
for short blocks is totally broken. Using only short blocks
do not increase bitrate bei 20...30 kbps, but by 200 kbps.

Therefore for good attack encoding you need:
  * usable psychoacoustic for short blocks
  * right switching algorithm between short and long blocks

Another method of preecho suppression is out-of-order
encoding. Blocks with attacks in the right hand side of the
MDCT will be reencoded after encoding the following
block (which have the attack on the left hand side).
These reencoded frames use a cos² slope on the left and
subtractive resynthese at the right.

Especially at 160+ kbps it is possible to compensate
preechos with out of phase signals. Due to the modular
style of Lame it should be very easy to add this feature.
AFAIK A52 use this in the standard encoder.
--  Frank Klemm

Block Switch algorithm..

Reply #3
Now the most weak part of Ogg Vorbis is the pre-echo issue. Since we cannot wait for wavelet to solve this problem completely (hopefully), the latest improvement should be the new pre-echo control  code. Dose this part of code that handles block switch more skillfull and the coming improvement will accelarate the presence of RC4?

Block Switch algorithm..

Reply #4
Vorbis Coding:

I do believe some sort of a discrete wavelet implementation won't be implemented in Vorbis until after a 1.0 release as I read, however there are some improvments being made for pre-echo control in RC4 for a majority of the bitstreams, as well as better short block/long block switching, short blocks for transients and long blocks for stationary signals, just usual temporary masking with some improvements in window switching I believe. 

I think that using a discrete wavelet transform for coding transient signals in the time domain will allow about 80% of the wavelet coeffcients to be replaced with a noise-model as I have read, this would also be dependent on the type of instrument too, It would be helpful to see technical documents of universities or groups implementing wavelets in experimental models of encoders and results that are got and compared. I am quite eager to learn how they work to a more fully extent.  I am open to any comments or ideas as well.

AAC Coding:

The idea of compensating pre-echo for out of phase signals is not a bad idea, I do believe if I am not mistakin this is implemented in Vorbis codec and is known as phase stereo, and there are various subsets of it as well for quantizing in various phases, eight, six, four. It isn't quite exactly the same what is stated above, but is quite similiar.

However I do believe one way to help to stop temporary spreading artifacts is being looked over in MPEG-4 AAC and is known as a gain control, it uses a gain and amplifier control before on the preceding frame by which the MDCT will be performed on, by splitting them in to four polyphase quadrature filter-banks, this way it can attenuate pre-echo's that are produced as the  bitstream is being decoded.

If you have yet to see this you may want to take a look:
budding I.T professional

Block Switch algorithm..

Reply #5
Frank: I don't really understant how to reduce pre-echo with out of phase signal, neither your description.

If you have time I'd be very thankfull if you could ellaborate.