Wednesday, March 5, 2008

Free data!

I have a longer post about selecting data for modeling, but for now, just know that the UN has put it's data statistical data online. Perhaps the nicest feature is that the site will search across all of their published datasets. I like adding macro level data into modeling and customer insight projects and the UN is a good source.

Tuesday, March 4, 2008

We're number 1!

Interestingly, a search for Business Analytics Blog, on Google, brings up Da Facto in the number 1 spot. It is a little niche-y, but still.

What is the secret sauce in direct marketing?

I am a big fan what I call tribal wisdom. Kevin Hillstrom put together a post of database marketing tribal wisdom. The item that most resonates for me? Segmentation and treating segments differently for marketing purposes. This was one of the first things I took from my McKinsey experience. Much of what he talks about are just specific instances of differential treatment of segments. Nice post.

More on data quality

I was speaking with someone about ways to assess data quality for predictive modeling. I have written on tactically how to ensure quality data, but here is a little framework you can use when thinking about data quality. Your data needs to be accurate, granular, and complete. Accuracy of data: Does the data accurately capture the attribute (e.g., income) that it was intended to? Granularity of data: In every case, the more granular the data, the better the predictive modeling can be. Also, you can usually roll up granular data (from individual to Households, Households to zip+4, etc) to higher level if you need to (for analytic or appending purposes.) Completeness of data: Any given dataset is going to have missing data. Missing data is a funny thing. Of the three, accuracy, granularity, and completeness, the later is the one that you can most influence. Obviously, the less missing data, obviously, the better. But before choosing an appropriate remediation method you need to understand why the data is missing. If the problem is that the datafeeds are broken, you are going to need to get the feeds fixed. If the data does not exist, but you have enough coverage to do some predictive modeling, you can predict the values of missing data. Or you may just need to fill missing values with the mean or median values.

Monday, March 3, 2008

Linking analytics and psychology

Wired has an article on the 1 Million dollar prize Netflix is offering to the person (or team) that can improve their recommendation algorithm by 10%. Most of the competitors rely on fairly advanced math, but one guy is implementing behavioral economic principles and is competing against the big boys.