Friday, August 31, 2007

Interesting real-world example of Simpson's Paradox

At MineThatData blog there is a very interesting post on email marketing productivity was very interesting, and a good example of Simpson's Paradox (as I posted in the comments). The key (as always) is that there are disproportionate population sizes with quite disparate results. As Kevin points out in the post, there is a huge difference between the profit due to engaged customers vs. those who aren't engaged, but the number of non-engaged customers dwarfs the engaged.

The problem we all have in analytics is finding these effects--unless you create the right features, you never see it. To create good features, you usually need to have moderate to considerable expertise in the domain area to know what might be interesting. And yes, neural networks can find these effects automatically, but you still have to back out the relationships between the features found by the NNets and the original inputs in order to interpret the results.

Nevertheless, this is a very important post if for no other reason but to alert practitioners that relative sizes of groups of customers (or other natural groupings in the data) matter tremendously.

1 comment:

Sandro Saitta said...

Hi Will and Dean! I hope you come back to blogging soon. I really enjoy reading your blog, so I'm checking everyday and waiting for new posts :-)