Sunday, September 9, 2007

The Analytic Value Chain - Determine data requirements

Defining the analytic value chain post

This year, I have had three conversations with vendors who are trying to sell my company some piece of third party data. We currently have well over a thousand variables available to us for modeling and insight, before transformations; our problem is not a lack of data. Our problem is that we have too much of it. On a recent data mining project, we included over 6000 variables in the dataset. So, why do I even mention determining data requirements as part of the analytic value chain? If you can include every variable under the sun, just grab 'em all? This way, you don't have to make any hard choices about what to include and what to leave out of the analysis.

I wish that things were that simple. For well understood problems,

No comments: