Statistics in Excel: How to Deal with Outliers?

0 ⤊

Statistics in Excel: How to Deal with Outliers?

Hello. When evaluating a set of items for their popularity, I pull popularity metrics from two different sources, aggregating each score into an average. While I am satisfied with this metric (Let's call it the 'popularity score'), I'd like to create a simplified, more meaningful score.

My initial though was to create an index using the following formula:

(D3/MAX($D:$D))*100)

Where D3 is the individual popularity score and D represents all popularity scores.

I discovered, however, that outliers would have a disruptive effect on the data. For instance, if there where several VERY popular items, other items with acceptable popularity scores would appear very low in relative terms.

Any ideas on how to account for the Outliers while still retaining the meaning in an easy to understand way?

A colleague suggest that I take the lognormal distribution and then take the antilog. Any idea how to implement, or if this is a sound approach?

2007-03-30 02:46:37 · 2 answers · asked by MarleyB 1 in Science & Mathematics ➔ Mathematics

Any idea how to implement the LOGNORMDIST function in Excel?

2007-03-30 03:05:18 · update #1

2 answers

Hi. In manufacturing we would look for 'assignable cause' for any outliers. Usually we would find one or more causes and the outliers would be discarded. Not your answer, of course, but a realistic approach. Double check your cell formula for the number of parentheses. It makes a difference in the order of execution, I think.

2007-03-30 02:51:46 · answer #1 · answered by Cirric 7 · 0⤊ 0⤋

I think you have to go back to the assumptions made in forming a statistic as to the distribution characteristics. This statistic seems very subjective, so if you accept the distortions caused by log-normal distriubtions, go for it.

2007-03-30 09:52:33 · answer #2 · answered by cattbarf 7 · 0⤊ 0⤋