I was at the Federal Building downtown San Diego for a consulting job, and met some representatives for a life and disability insurance company who were giving away a big-screen HD TV for the individual who came closest to guessing the number of M&Ms (chocolate and peanut butter filled) in a container. Because they do this often, I won't show the specific container they use.

I offered to make a guess of the total, but only if I could see all of the guesses so far. I was drawing from the Wisdom of Crowds example from Chapter 1 of the book where a set of independent guesses tend to outperform even an expert's best guess. I've done the same experiment many times in data mining courses I've taught, and have found the same phenomenon.

I collected data from 77 individuals (including myself) shown here (sorted for convenience, but this makes no difference in the analysis):

37

625

772

784

875

888

903

929

983

987

1001

1015

1040

1080

1080

1124

1245

1250

1450

1500

1536

1596

1600

1774

1875

1929

1972

1976

1995

2000

2012

2033

2143

2150

2200

2221

2235

2251

2321

2331

2412

2500

2500

2550

2571

2599

2672

2714

2735

2777

2777

2803

2832

2873

2931

3001

3101

3250

3333

3362

3500

3500

3501

3501

3583

3661

3670

3697

3832

3872

4280

4700

4797

5205

5225

5257

9886

10000

187952

Note there are a few flakey ones in the lot. The last two were easy to spot (so I put them at the bottom of my list). The idea of course is to just take the average of the guesses.

Average all: 4932

Average all without 37 and 187932: 2626

Then I looked at the histogram and decided that the guesses close to 10000 were also too flaky to include:

So I removed all data points greater than 8000, which took away 2 samples, leaving this histogram and a mean of 2436.

So now for the outcome:

Actual Count: 2464

Average of trimmed sample: 2436 (error 28)

Best individual guess: 2500 (error 36)

So amazingly, the average won, though I wouldn't have been disappointed at all if it finished 3rd or 4th because it still would have been a great guess.

Wisdom of Crowds wins again!

PS I reported to the insurance agents a guess of 2423 because I had omitted my original guess (provided before looking at any other guesses--2550 if you must know) and my co-worker's guess of 3250, so these helped bring up the mean a bit. The Average would have lost (barely) if I had not included them.

PPS So how will they split the winnings since two guessed the same value? I won't recommend the saw approach. I hope they ask each of the two guessers to either modify their guess, and require they modify their guess by at least one.

PPPS Note: the charts were done using JMP Pro 9 for the Macintosh

## Friday, July 29, 2011

Subscribe to:
Post Comments (Atom)

## 7 comments:

shouldn't you rather compare the median and not the average in this case?

btw, your 'wisdom of crowds' link puts the book directly in the shopping cart. great way to finance this blog huh? ;)

The median is fine to use, though in this case it would do worse.

Sorry about the shopping cart--I didn't realize it was doing that (I'll fix it). And, I don't get any money from amazon any more for links on this blog thanks to the California legislature, so I *wish* it were the case that amazon could finance the blog. :)

1. Interesting topic, and thanks for providing the actual data.

2. Other summaries might be preferred to the mean and median, such as a (mechanically) trimmed mean: The mean being sensitive to data which misbehaves, and the median suffering from weak statistical efficiency (in many common circumstances).

Will--fully agree. The final answer for me was exactly a mechanically trimmed mean where the top 3 and bottom 1 entries were removed because they were such extreme outliers.

If the data isn't skewed after removing the outliers, the mean and median should be similar. If the data is skewed at this point, there is something else wrong because people don't typically guess 'skewed'.

For this data, the skew is only 0.5 (kurtosis is -0.15), so while there are some differences in the final guess, mean guess is 2436 and median guess is 2331. The median would have tied for 7th--still pretty good.

Interesting analysis. Btw, I got a copy of Wisdom of Crowds in kindle...

My cousin recommended this blog and she was totally right keep up the fantastic work!

Embedded Systems Course

I think Your offer is well but make some changes for this, Over all your post in nice!

sourcing from china

Post a Comment