generally anything which is beyond 3 standard deviations from the mean is classed as an outlier. Hope this helps
2007-01-07 07:03:03
·
answer #1
·
answered by Atlanta 3
·
0⤊
0⤋
The statistically proven method is actually to apply Chauvenet's principle. This is the preferred approach over standard deviation, and you'll be amazed how many scientific researchers do not bother with this. Although generally 2 standard deviations are a rough guide, it will not be accurate if the number of data points are small. Chauvenet's principle basically states that if the chance that the suspect data point could occur due to chance is <0.5, then the point could be discarded.
It's quite a lot to explain how it's done and the stats. Here's a nutshell description. Check the link below for actual examples
1. Find the standard deviation of all data
2. Find the number of standard deviations from the mean of each data point (i.e. the z value where z= x- X/'sigma', where x is your data point, X the mean and 'sigma' the standard deviation
3. Find probability (2 tailed) that the data lies outside the z range.
4. If proability that a particular point occurs <0.5 then we can reject the data point. Otherwise retain it.
5. Recalculate new mean and standard deviation
***Note that Chauvenet's principle can only be applied ONCE in a data set, otherwise you'll eliminate all your data.
By the way if none of this makes sense then you probably don't need to know it all. Just know that 2 standard deviations is a rough guide to the expected range, but it's not the most precise method.
2007-01-09 08:41:02
·
answer #2
·
answered by koala_paradise 3
·
0⤊
1⤋
There is no objective definition of an outlier because the nature of data distributions varies for different types of observations, and each field of research has its own preferred methods. Observations for one type of data might span several orders of magnitude, while other types of data may have a very tiny range and naturally cluster at some characteristic point in a distribution. Outliers for these two types of distributions would be very different.
Rule sets are probably the most widely used method; for example - "all weights greater than 10 lbs. were considered outliers in this study." Multiples of the standard deviation are also widely used, such as excluding all values that were more than 2 or 3 standard deviations from the nest highest value in the sample. All methods require some type of exploratory data analysis.
If you search for terms like outlier, method, and identification in Google Scholar, you will find all sorts of articles on the topic, and also see that there are numerous approaches.
2007-01-07 15:31:26
·
answer #3
·
answered by formerly_bob 7
·
0⤊
1⤋
Outliers, by definition are data points which don't fit the calculated curve that matches most of the data.
For example, if your data plotted like a normal distribution, but some where above and below the mean is a cluster of outliers, it indicates that the data is affected by something not accounted for that does or does not happen.
Like, if your data plots like a normal distribution, but some where away is a cluster of outliers about 1/3 the height of the main peak, then they represent an event, not accounted for, that happens about 25 % of the time.
If there are scattered peaks of data either side of the main peak, then there are several items events not accounted for.
Truly random events in large data sets won't cause outliers, but will scatter throughout the data.
2007-01-07 19:13:23
·
answer #4
·
answered by mt_hopper 3
·
0⤊
1⤋
not positive, but from what i remember from algebra class, the outliers are the values in a data series that are out of the general range. for example, in the series: 4,8,12,6,2,5,3,33,11 the outlier is 33 because it is no where near the other values. hope this helps.
2007-01-07 15:05:49
·
answer #5
·
answered by audrey 2
·
0⤊
1⤋