Tuesday, February 12, 2008

The Analytic Value Chain - Do the analysis

Finally! Lets do the analysis! Actually, I want to spend more time on what not to do.

I really did not expect to write much about how to select the analysis required to solve your problem (what!). I have assumed that you know the appropriate analysis to conduct to answer your original research question. In retrospect, maybe that isn't a great assumption, but here was my thinking: My posts are designed for managers of analytic teams and the folks that work on those teams who are still developing their managerial skills. Those people (I thought) should know the right analyses for a given situation.

Increasingly, I am questioning this assumption. I have found that analysts who are well trained in advanced analytic techniques and remember their training are not the rule; those who understand their business, and can creatively apply their training to a new business problem are rare. Maybe 30 percent of the statisticians I have worked with wholly qualify under my criteria (mostly at A fOrmer empLoyer. Hiring a director is what inspired me to write this thread. I will do a post on how to identify a high potential statistician). Most analysts are technicians and they have a hard time suggesting analyses for problems they have not seen before. This is not an indictment of statisticians, just applying statistical tools to business problems is hard. How hard? Let me illustrate.

Conducting an Ordinary Least Squares regression when you should be using a logistic regression is a common mistake. By using the simpler OLS analysis, you can get totally wrong conclusions, leading to incorrect decisions. I am talking about answers that may not even be directionally correct. So, it is important that you use Logit, even though it is more complicated, when the situation warrants (when the thing you want to predict is a yes or no). I won't hire someone who does not know when to use logit.

This mistake is so basic and no one using regression should make it. And it happens all the time. I know a company (none that I have ever worked for) that were using OLS when they needed to use Logit in a production environment, affecting hundreds of jobs for their clients a month. Did the statisticians who built the system know better? I don't know. I do know that the company has advanced statisticians working for them and who know better and are advocating for change. This is why you need the talented experts. They stop you from doing really dumb things.

As much as I would like to, I can't map out the correct analysis for any given business situation. Having said that, there are some common situations that I have come across over the last couple of years that you should watch out for.

Regression:
If you are using regression, you need to pay attention to the frequency distribution of the dependent variable. Unless the dependent variable is continuous, has a relatively large range, and is normally distributed, Ordinary Least Squares regression is not going to give you the right answer. You may need a more sophisticated analytic technique. Some rules of thumb: If you are using OLS on a binary variable (think yes or no) you are going to need a more sophisticated technique. Also, watch out if the dependant variable has a natural floor or ceiling. So, income is a good example. Very few people make less than zero dollars. So, zero is the floor. If you have an floor, then you may need to go with a Tobit. Depends on the distro of your dependant variable. If the dependant is normally distributed, then you are probably ok. If not, Tobit...

Impact of Seasonality/Time:
Most folks come up with some arithmatic technique to model seasonality. I hate this. You can never unpack the drivers of your dependant variable. Instead, you should use some kind of time series technique; e.g., ARIMA. ARIMA will let you figure out what the real drivers of behavior over time, as well as taking time into account.

Segmentation:
Check out the Kenny's one rule post on segmentation. The short version: For segmentation, don't try to do too much with one segmentation scheme. I think of segmentation like regression. You build a segmentation scheme for specific purposes.

Designing Experiments
In a direct marketing context, at least done correctly, testing is continous. I have had a number of occasions where the tests are not readable due to a business ownerer needing more volume and killing control groups or specific test cells when doing complicated designs.


Winding down
The big statistics software vendors have not yet developed bullet proof tools to help inexperienced analysts to do the appropriate analyses. My advice is to hire an analytics expert and teach them your business. Don't try (too hard) to find someone who is an expert in both analytics and your industry. Industry knowledge is much easier to teach than anaytic expertise. Doing the analysis is hard. And can take a while (it once took someone on my team three months to build a data set and do the analysis for a difficult business problem. But he nailed it and it changed how our business partners thought about the drivers of their business.

No comments: