English Deutsch Français Italiano Español Português 繁體中文 Bahasa Indonesia Tiếng Việt ภาษาไทย
All categories

Is a two-tailed 95% prediction or confidence interval (boundary) any less likely to contain data when you use a Simple Average line as opposed to a Linear Regression trendline that has a very high R-squared value (i.e, .above .97 for instance)? Stated alternatively, is a 95% boundary more reliable for a Linear Regression trendline with a high R-squared value than for a Simple Average line.

Thank you in advance for your thoughts.

2007-01-16 06:15:38 · 4 answers · asked by Anonymous in Science & Mathematics Mathematics

4 answers

You are missing the point of what a 95% (or any other level) confidence--two-tailed or not--means.

You asked which is more ore less reliable--the answer is that the confidence level tells you how reliable the statistic (regression or whatever) is (what are the odds of it being in error). If the data are good and you use the statistic appropriately, the confidence level tells you the probability the statistic is correctly reflecting the actual population being measured.

Not sure I explained that clearly enough--so I suggest you goback and look at the basic concept of confidence levels again.

2007-01-16 06:51:50 · answer #1 · answered by Anonymous · 0 0

Recall that linear regression is also called "Method of Least Squares" because the line arrived at has the smallest sum of squares of error between the line and the actual data. Dividing this sum by the number of data points and extracting the square root gives you an "rms error". Any other line through a given data set will have a greater rms error. Call this rms error ε.

A 95% prediction would be boundary lines of
(a + 2ε) + bx
and
(a - 2ε) + bx

Curiously, the area around the regression line does not seem any more likely to contain 95% of the data than a similar area formed around any other line., since for any other line ε will be larger, making the 95% area correspondingly larger.

2007-01-16 07:33:16 · answer #2 · answered by Helmut 7 · 0 0

This seems very confused. A confidence interval for proportion p uses the sample proportion ps which only has an approximate normal distribution when n is large. A rule of thumb is that both nps>5 and n(1-ps)>5. You do not use a t distribution for this. Also the s.d of ps is estimated from the sample which adds another error factor. A hypotheses test for p uses a hypothesised value of p so the s.d. is exact but again, the distribution of ps is only approximately normal. Hope you are not more confused than ever

2016-05-25 01:34:20 · answer #3 · answered by Beverly 3 · 0 0

Why don't you stop screwing around with these clowns here?

2007-01-16 14:33:36 · answer #4 · answered by Murphy 3 · 0 1

fedest.com, questions and answers