Linear least squares method question?

0 ⤊

Linear least squares method question?

I have to fit a line through a number of points with the least squares method. I am looking for a relationship y = Ax + B. I have a fixed number of x's, but I can do my measurements at any x within a range xmin =< x =< xmax.

My question: Where should I do my measurements such that I get the most reliable fit? I am thinking about spreading the x's equally over the range or concentrating around xmin and xmax.

2007-12-18 08:30:39 · 2 answers · asked by Anonymous in Science & Mathematics ➔ Mathematics

2 answers

That is a very good question, but if this were the real world, the criterion should not be the most reliable fit but the result that best predicts future measurements.

That, in turn, depends on the underlying model - are you looking for something that should be linear or are you just using a linear model because it is easy and should be good enough? etc.

Another issue is just how reliable (i.e. repeatable) your measurements really are.

If they are very reliable and the underlying model really is a linear relationship, then spreading the values evenly is probably best.

If you are not certain of the reliability but the model is sure to be linear, concentrating the measurements at the two ends is best. (This also does best w.r.t. the fit to the data if not the underlying phenomenon)

If you are going to be using the result to predict the results of subsequent measurements, then you want the samples densest where the future measurements will be taken (i.e. giving greater weight to that portion of the range)

One of the classic problems with linear least squares is how a single outlier can grossly distort the outcome.

http://en.wikipedia.org/wiki/Least_squares

One way of reducing the effect is to double up on points. If one is an otlier, then moving very far in its direction takes you too far from the other member of the pair.

2007-12-21 10:57:50 · answer #1 · answered by simplicitus 7 · 0⤊ 0⤋

i believe the spread of the data doesn't matter, just glancing at some equations. but the problem here is more what are you testing, if its some physical system, it may be really hard to make measurements at your mins and maxes, but if you can that's great. i gather from your case you can make decent measurements just about anywhere so possibly you can choose an subinterval in your xmin, xmax range and run your experiments in there. the other plus about spreading out your data is it makes your extrapolations a bit more believable and that's sort of a big motivation in this best method stuff.

2007-12-18 09:32:09 · answer #2 · answered by zeul845 2 · 0⤊ 0⤋