Da Facto: Data Simplifiction

Monday, June 8, 2009

Data Simplifiction

When I was trained, I was told that you should never take continuous level data and make it categorical. One of the guiding principles of regression analysis is that variance is good; Never reduce it by simplifying your data and creating categories. Maybe an example is in order:

Say you have a variable like temperature. This data is continuous; It is not bounded (except in extreme cases) and the temperature of something can take a wide variety of values. In regression, there would be no reason to define ranges or temperature (0-10, 11-20, 21-30, etc). The computer does the work and if you created the ranges you might reduce the explanatory power of the data (or if the data was used as a dependant variable, make it harder for other variables to predict it value). So, in research, categorizing data is a no no. So said Professor Feldman.

Funny thing was that I had a staff member (David) who kept telling me that while that the theory was right, you can usually create categories without much loss of predictive power. And in certain applications, working with categories is much easier than working with continuous data (ad serving is one such category. But that is another post).

The other day, I went through the exercise. I took continuous level data and made it categorical. David was right. Prof was wrong. At least in the world of Digital Marketing. The data retained its power and it is easier for consumers of the data to use it. Having said that, you still need a fair number of categories (over 10) to retain the power. Even still, I thought this was a fact worth knowing.

No comments:

Subscribe to: Post Comments (Atom)

Recomended Books

Thinking, Fast and Slow My Phd was in Behavioral Economics and Kahneman co-invented the field. This books sums up Kahneman's observations over an almost 50 year career. Amazing, though not for the uninitiated, overveiw of how people actually make decisions. Competing on Analytics This is a great read if your organization is just starting to realize the importance of analytics as a competitive advantage.

Visual Display of Quantitative Information This is a classic book on displaying, well, quantitative data. If you want people to pay attention to your results, you need to present your analyses well.

The Pyrimid Principle Another classic book on how to communicate complex ideas in a business environment. Not cheap but very worthwhile. The author is a former McKinsey consultant who had a major impact on how the Firm conducts its written communication with clients.

Say it with Charts This is another McKinsey book. Mr. Zelazny has been the major force in defining the look and feel of McKinsey PowerPoint decks. He also has a second book on presentations, called, strangely enough, Say it with Presenations.

The Fifth Discipline The Fifth Discipline is all about systems thinking. I use the phrase "Business Dynamics" from this book a couple of times a week. The notion is that your business is made up of some core systems and that the behaviors of those systems can be represented with some basic building blocks. By understanding how these building blocks interact, you can radically improve your understanding of the systems that drive your business performance.

Made to Stick I want my ideas to have longevity in my organization. This is a good primer on how to make ideas last.

Information Dashboard Design My company is heavily invested in Business Objects, but our dashboards always looked clunky. As with most technology implementations, the difficulty is not in getting the technology to work, it is in getting people to use technology effectively. This is a very pragmatic book that discounts use of cool looking gadgets in favor of functional design.

Business Process Management: Practical Guidelines to Successful Implementations A basic understanding of business process is critical in most corporate settings and serves as the cornerstone for process improvement. Here is a link to an Amazon search for other bookson business processes.

Loyalty Myths Loyalty is a good thing, right? How about the loyalty of an unprofitable customer? The book lays out a number of fallacies about loyalty and can help you develop an educated perspective if loyalty is a good thing for your company.

The Leadership Challenge This book identifies the critical elements of leadership through survey research. It was the first good and actionable book I have read on leadership.

The book links go through my Amazon Associates account. All proceeds to go for my kids education, buy them soldering irons, lathes, drill presses, that kind of thing.