33 views (last 30 days)

Show older comments

John D'Errico
on 17 Sep 2021 at 17:01

Edited: John D'Errico
on 17 Sep 2021 at 17:27

Invariably the data will vary by multiple orders of magnitude in such a problem. And that tends to imply the noise in your data is NOT normally distributed. It virtually cannot be Gaussian noise. In fact, odds are, the noise is what may be called proportional noise. So multiplicative noise. It might follow a lognormal distribution, as would be common.

A serious problem when you have mutiplicative noise on data, is if you then try to use a nonlinear regression that treats every data point as equally important, is it places far too much importance on some of the data.

Anyway, that should give you the clue to how to solve this problem.

LOG your data. That is, take the log of your model. (As well as your data.) So, if we have:

a=k1*b^k2*c^k3*d^k4

then

log(a) = log(k1) + k2*log(b) + k3*log( c) + k4*log(d)

Feel free to choose what log base you use, thus log10, or the natural log. Whatever floats your boat.

A really nice thing is that now, any multiplicative noise you may have had before, is now viewed as purely additive noise. So now a simple linear least squares wil apply. And even better, you should see the coeficients in this model can now be estimated using a simple linear regression. This is becuse those coefficients that were once exponents are now merely multiplicative constants in a simple additive linear model.

That is, you can now use any tool applied to the logs of your data, such as regress, or fitlm, etc. You can even use the backslash operator.

When all is done, remember to exponentiate the constant term in the model, since it too got logged in that transformation.

As an example, since I lack your data, here is how you would handle it, on some sample data.

n = 100;

X = rand(n,1);

Y = rand(n,1);

coeffs = [2 3 4]; % Ground truth

Z = coeffs(1)*X.^coeffs(2).*Y.^coeffs(3) .* lognrnd(0,.25,n,1);

plot3(X,Y,Z,'o')

view(21,15)

grid on

box on

Actually, the data is not too bad looking.

Now we can fit this using one of two models. First, a nonlinear regression.

ft = fittype('k1*X.^k2.*Y^k3','indep',{'X','Y'})

mdl = fit([X,Y],Z,ft)

And that would appear to be not too terrible, but need I point out that the confidence intervals for all three coefficients do not even contain what we know to be the ground truth valus, of [2 3 4]?

I did not even feel the need to give it starting values, but fit handlesd that well enough. Anyway, the problem is not lack of convergence. Now let me use a linear least squares.

K123 = [ones(n,1),log(X),log(Y)]\log(Z);

K123 = [exp(K123(1));K123(2:3)]

And that seems to hit the ground truth values nearly dead on.

So, how well did the nonlinear model do? Compared to the linear fit, the nonlinear fit was pure crapola in terms of how well the coefficients were estimated. The problem was the noise is not properly handled when we do the nonlinear regression, since the noise is truly proportional noise in this case. And the result was the coeffficient estimates were poor when I did the nonlinear regression.

William Rose
on 17 Sep 2021 at 17:09

@John D'Errico makes some very good points! I like the "take the log of both sides" idea.

William Rose
on 17 Sep 2021 at 17:04

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!