Skip to main content

Topic: Human hearing beats FFT (Read 31115 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • 2Bdecided
  • [*][*][*][*][*]
  • Developer
Human hearing beats FFT
Reply #50
Interesting paper, thank you.

Human hearing beats FFT
Reply #51

EST is a new transform that can explain the results of the article.

Fourier-related transforms, like FFT, are just one way to find frequencies, and clearly not the best possible.

EST derives frequencies from samples and is unrelated to Fourier/FFT.
The process of EST is deterministic, does not use non-linear equations, and can handle noise.

In the ideal case of a noiseless signal composed of n sinusoids, the frequencies, amplitudes and phases are precisely recovered from 3n
equally spaced real samples.

A noisy signal will require more samples, depending on noise level.

Other than the minimum for the ideal case, accuracy does not depend on the number of samples (time). The additional samples for a noisy signal
are needed to handle noise.

EST can also transform samples into increasing/decreasing sinusoids, which is a better way to model audio. In such a case, for a noiseless
signal, 4 samples are required per increasing/decreasing sinusoid, and more for a noisy signal.

EST can be evaluated using a demo program that implements it. There is also a paper that details the transform and its mathematical basis.

Those interested to see the paper and/or the demo program, can email me at gringya atsign gmail dot com.

  • Woodinville
  • [*][*][*][*][*]
Human hearing beats FFT
Reply #52
Fourier-related transforms, like FFT, are just one way to find frequencies, and clearly not the best possible.

Which, of course, depends entirely on your definition of "Frequency", something that itself is trickier than some seem to realize.
Quote
EST derives frequencies from samples and is unrelated to Fourier/FFT.

What does "EST" stand for, in the first place. Does it use a complex exponential or a representation of a complex exponential?

Quote
The process of EST is deterministic, does not use non-linear equations, and can handle noise.

Which is true of the Fourier Transform, as well.
Quote
In the ideal case of a noiseless signal composed of n sinusoids, the frequencies, amplitudes and phases are precisely recovered from 3n
equally spaced real samples.

Sounds pretty good. What's the basis set you're using?  Sounds a lot like a * sin (b *t +c) where a,b,c are the 3 samples. Not sure what "equally spaced" means here, unless you're referring to the fact you can characterize a sine wave with 3 non-degenerate points.
Quote
A noisy signal will require more samples, depending on noise level.

No surprise.
Quote
Other than the minimum for the ideal case, accuracy does not depend on the number of samples (time). The additional samples for a noisy signal
are needed to handle noise.

EST can also transform samples into increasing/decreasing sinusoids, which is a better way to model audio. In such a case, for a noiseless
signal, 4 samples are required per increasing/decreasing sinusoid, and more for a noisy signal.

So it's Laplace-based instead of Fourier based, then?

Instead of bombarding us with a bunch of not-very-specific qualities, why not just tell us what the basis set is, and how the analysis works?

I am aware of approximately infinite (well, literally infinite but obviously I haven't generated them all!) numbers of basis sets, many of which this could describe.
-----
J. D. (jj) Johnston

Human hearing beats FFT
Reply #53
Yaakov, also check out the Reassigned spectrogram mode in iZotope RX. It “beats FFT” in terms of time and frequency resolution: it can precisely localize impulsive events in time and precisely display frequencies of harmonics, assuming that they do not overlap in FFT spectrum.

Human hearing beats FFT
Reply #54
EST stands for Exponential Sum Transform and it uses complex exponentials.

The basis is sigma(c*b^t) where b and c are non-zero complex numbers and the set of b is distinct. If all b are on the unit circle, then it is simply a spectrum.

When all b are on the unit circle and the samples are real, this becomes sigma(a*cos(b*t+c))

The samples must be equally space, not just non-degenerate.

It clearly looks more like Laplace than Fourier, but a specific relation, if exists, is not known to me.

As for describing the analysis, I offered to send the detailed paper. Do you prefer an informal description?


  • Canar
  • [*][*][*][*][*]
  • Global Moderator
  • Your mom's favourite moderator
Human hearing beats FFT
Reply #55
I think a lot of us here would be interested in a formal description, myself included. I think from what you've just said that we'll get it puzzled out though.
1. Attack the argument, not the arguer.
2. Assume good faith.

Human hearing beats FFT
Reply #56
I think a lot of us here would be interested in a formal description, myself included. I think from what you've just said that we'll get it puzzled out though.

If I understand you correctly, you prefer a formal description of the process, and only that.

  • db1989
  • [*][*][*][*][*]
  • Global Moderator
Human hearing beats FFT
Reply #57
If I may guess, I think he means that this site has a significant number of users who would appreciate detailed descriptions. However, that is not to stop you from providing less technical information (i.e. ‘layman’s terms’) if you want to; there are probably other users who would like that, too.

  • Porcus
  • [*][*][*][*][*]
Human hearing beats FFT
Reply #58
I think I could very well use a formula or two ... point seven eighteen twentyeight ...

As for describing the analysis, I offered to send the detailed paper. Do you prefer an informal description?


I think I just got one that was a bit too rough  although I do suspect I have guessed the point.
  • Last Edit: 03 April, 2013, 03:37:18 PM by Porcus

Human hearing beats FFT
Reply #59
The following link:

http://www.mediafire.com/view/?ce47jurz43wzjce

is to a short document that describes the EST process for real noiseless samples.


  • Woodinville
  • [*][*][*][*][*]
Human hearing beats FFT
Reply #60
Hm.  Define "noiseless".  Most instruments have a chaotic part of their performance that in fact is noiselike in that it does not repeat, is not entirely stationary, depends on technique, and so on.

So, I'm not quite sure I know what you mean by noiseless.
-----
J. D. (jj) Johnston

Human hearing beats FFT
Reply #61
The paper described the mathematical basis of EST, which uses the ideal case of perfect increasing/decreasing sinusoids.

For realistic data, EST uses different processes, that expect noise.

For audio, the EST process is as follows.
1. Find linear prediction coefficients, preferably using the covariance method and not the auto-correlation method.
2. Create the linear prediction polynomial.
3. Find the roots of the linear prediction polynomial to establish the basis set of an exponential sum function, as described in the paper.
4. Use the samples and the basis set to find the coefficients of the function.

The key point is that linear prediction coefficients and an exponential sum function, are equivalent, with the exponential sum function having the distinct advantage of being an analytic function with a useful structure. The mathematical basis proves this equivalence.

Due to the equivalence, an exponential sum function models an audio signal with the same quality as linear prediction.

You may note that the best lossless audio compressors, like OptimFROG, use linear prediction. This is a strong indication of the power of linear prediction to model audio.

Since EST generates an analytic function, it is suitable for lossy audio compression, as well as other audio applications.

Once EST generated an exponential sum function, you can do the following:
Identify noise elements, using frequency and/or amplitude, and remove them.
Identify inaudible elements, and remove them.
Quantize the coefficients.
Resample the audio signal, both sample rate and sample depth.
And various other things.

Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data.

In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction.

  • Woodinville
  • [*][*][*][*][*]
Human hearing beats FFT
Reply #62
Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data.

In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction.


Try applying EST to the first 30 seconds of the track "We Shall Be Happy" by Ry Cooder off the album titled "Jazz".  Let me know how big your covariance matrix is, too, ok?
-----
J. D. (jj) Johnston

Human hearing beats FFT
Reply #63
Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data.

In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction.


Try applying EST to the first 30 seconds of the track "We Shall Be Happy" by Ry Cooder off the album titled "Jazz".  Let me know how big your covariance matrix is, too, ok?


In a practical implementation the samples will be broken into blocks and there will be a chosen matrix size for that block size.

The size of the matrix and the block size will determine accuracy and an accuracy-speed trade-off.

This is also the way it is done when using linear prediction for lossless audio compression or for speech compression. The difference is that EST returns an analytic function.

30 senconds of audio will therefore be broken into many smaller blocks, and not treated as a single block.

  • Woodinville
  • [*][*][*][*][*]
Human hearing beats FFT
Reply #64
Unlike Fourier related methods, which use a predefined basis, EST uses a basis derived from the data.

In short, EST for audio combines the flexibility and usefulness of an analytic function with the modeling power of linear prediction.


Try applying EST to the first 30 seconds of the track "We Shall Be Happy" by Ry Cooder off the album titled "Jazz".  Let me know how big your covariance matrix is, too, ok?


In a practical implementation the samples will be broken into blocks and there will be a chosen matrix size for that block size.

The size of the matrix and the block size will determine accuracy and an accuracy-speed trade-off.

This is also the way it is done when using linear prediction for lossless audio compression or for speech compression. The difference is that EST returns an analytic function.

30 senconds of audio will therefore be broken into many smaller blocks, and not treated as a single block.


I do know how coders work, so try your EST basis on We Shall Be Happy and get back to me, ok?  And tell me how many basis functions you need for that one, too. And how many are orthogonal. And then how many of those you have to code.
-----
J. D. (jj) Johnston

  • Specy
  • [*]
Human hearing beats FFT
Reply #65
Over 10 years ago, for my master thesis, I wrote an algorithm that determines nearly exact frequency values from an FFT transform - it can find any frequency as long as they are far enough away from each other and constant in tone and level.

The method is pretty simple:
1. Create an FFT using a window that's a lot bigger than the block of audio that you use
2. Find the highest peak in the FFT domain. This is an estimation of the loudest frequency present.
3. Write down the found frequency, phase and amplitude
4. Generate an FFT based on the found freq, phase, amp (this can be optimized for speed, since it's only a single tone).
5. Subtract a small percentage of this (I found that 5-10% works well) from the original FFT from step 1.
6. Go back to step 2.

This gives you a whole lot of values, next you need to combine all the values that have approximately the same frequency. This can be done as follows:
- If a frequency is new (no data within 0.5 FFT bin size), this is a new frequency that we haven't seen before.
- Otherwise combine this new measurement with the measurement closest to it.

Tones that are 1 bin apart will not be found perfectly (frequency and amplitude might be very slightly wrong), but they still clearly show up as separate signals. Tones that are 2 or more bins apart show up nearly perfectly.

Test tones:


Real signal (voice):



Signal and it's peak data:


  • Last Edit: 17 August, 2013, 06:59:53 AM by Specy

Human hearing beats FFT
Reply #66
Several months ago, in posts in this topic, I provided some information about my transform, EST.

I now have a document with better explanations, actual results, and charts.

The link to the document is:
http://www.mediafire.com/?0bprdaoop81d0cx
Please note that viewing the document online will only display the text, and not the charts. It has to be downloaded to be fully viewed.

As a reminder, this topic followed an article that showed that human hearing performance in finding frequencies exceeds the Fourier uncertainty limit.

EST finds frequencies using a deterministic algorithm unrelated to Fourier transforms and not bound by the Fourier uncertainty principle.

This shows that the results of the article are not surprising.