Friday, December 4, 2009

Why Demos

I wrote a piece for Ad Exchanger and why internet marketers want demographic data. I'll post the text in the next couple of days.

Friday, October 23, 2009

Where is the US household file?

Conducting an offline direct marketing campaign is relatively easy. You can call any one of a number of data providers to get a us household file (that is, demographics on 115 MM US households), run a test campaign, figure out the profile of who responded to the campaign, and you are off to the races. The data exists. You just have to crank up your favorite LOGIT tool and you are in business. In the online space, not so much.


In the online world, there is no single vendor that has the US online user file. Why? It is hard to identify users online in a way that protects privacy and is meaningful for marketers. Though they are all getting better, no single online method of tagging users for demographics has over 30% of the online audience. So, in order to know basic demographics, you need to combine data from multiple data sources.

Say you have signed up multiple data providers. Are you ready to go? No. You now need to make some trade offs on accuracy and comparability. What do I mean by that? Well, all of the data sources have varying levels of accuracy. Is IP based better than cookie based? Do you have a file that you can use to know truth? Also, the data sources may report data at the user level (though anonymous) or the household or the zip4 or the zip. Seems like you should be as close to the user level as you can be, but what about the accuracy issue? Is is better to have accurate data at the Z4 or less accurate data at the user level. By the way, it is going to depend on the data type and category. All in all, very complicated stuff.

What is a online marketer to do? You have two choices on this. Test and learn and know that you are going to need to invest time and resources in getting a good set of data providers in place. Or, self servingly, work with someone who has already climbed up the learning cure :).

Monday, October 12, 2009

Got some press

My new company put together a press release. Check it out here. Too funny.

Wednesday, September 30, 2009

How to start a new job

I was talking to a colleague the other day who was about to start a new job. She had been at her previous company for about 10 years and wanted some pointers on how to make a successful transition. We talked for a while about making a strong start and I gave her three pieces of advice. First, spend the first month listening. Take everyone you can out to lunch; your internal clients and partners, your staff, vendors and ask them about how you can help them be more effective. Get invited to staff meetings and listen to the problems people are wrestling with. Just gather perspectives and try to listen very carefully. In every role, there will be opportunities that you can leverage to get a strong start. Try to find those opportunities. Low hanging fruit and all that.

Second, take what you have learned and come up with a plan to harvest the fruit. Talk to your manager about what you are going to accomplish and get her agreement on the plan. You want fast wins here, so try to avoid committing to a project that is going to take two years to complete. Also, be very wary of projects that have been lying around, incomplete. There may be a reason why the "raw data feed" or some such project has made no progress in 6 months. You can take those projects on, but try to push them back until after you have a track record.

Third, and I hate to be this tactical: Don't walk in and say things like "At we filtered our web log data using SAS" or "At we had the data warehouse take care of this problem". These may be true and relevant observations, but new co-workers react funnily to the comparisons. In some ways, talking about your former is like name dropping a celebrity that you once hung out with and irritating for the same reason. Once is interesting, but it gets old fast. You can say "at a previous employer we did x", but don't use the companies name. It sounds trivial, but take my word for it, people get tired of you saying "We used Lotus Notes at Mckinsey."

And on that note, I get a chance to follow my own advice. I am leaving IXI and moving to an [X+1] as the Vice President for Analytics and Data Strategy. I only work for companies that have an X in their name. Look for the blog to be more active as part of my role will be evangelism in the space.

Tuesday, June 9, 2009

Brute force vs. smarts

My dissertation was, in part, about how to encourage people, when solving problems, to think about the information they already have available to them and to not just gather information for its own sake. I found that you can save a lot of money if you charge just a token amount for each new piece of information. When you charge for information, people think more deeply about the information in their possession and stop asking for information that they don't really need to solve the problem. I had a real world brush with this phenomena the other day.


I got a call from someone who had an enormous database (multi-petabytes) and was looking for some advice on how to "scale" the database. I almost choked. How much more scale do you need? They were saving every piece of information they gathered from their customers and were afraid to throw anything away. "We don't know what we are going to need in the future" was the refrain.

In my mind, the organization was not thinking hard about the data they had and how to use it efficiently. Rather they let cheap storage lead them down an easy path. The brute force path. The engineering path. I can tell you from experience that, assuming money, the technical folks will solve the problem of increasing storage. The organization still thought of their challenge as one of engineering. "How do we save even more data." But the engineers can't fix the underlying problem; the organization was being thoughtful about the data they were saving. At AOL, we did an analysis and found that for predictive modeling, we relied on a small set of data (less than 100 variables) and only used 10-15 for any given use. We had several thousand variables available to us, but most were either correlated with other variables and could be deleted with no loss of usability, the data supported a business we were no longer in, or being saved for no reason that we could determine, other than it was easy.

I would suggest to companies looking to "scale their databases" to first do an inventory of the data they are saving and develop a simple process to determine if the data is worth saving. In AOL's our case, each modeler assigned a letter grade to each variable and we "voted" on the data to give away. And, after this process, at no time did we say "I wish we kept variable x." Reality was we still had too much data.

Monday, June 8, 2009

Thinking about BI recently

It occurred to me that using a BI tool is a hard way to gain insight. You are limited by your own imagination. I like hypothesis driven analysis, but I think you can do a much better job in providing insight if you understand a bit of econometrics (Logit and OLS). You can simply run a stepwise regression on the variable you are trying to understand (say Life Time Value of a customer) and see what pops out (say the interaction of age and education). Once you see what variables pop, you can then use BI tools to illustrate the point. Critical piece, make sure you run all of the interactions. That is where the cool stuff lies.

Data Simplifiction

When I was trained, I was told that you should never take continuous level data and make it categorical. One of the guiding principles of regression analysis is that variance is good; Never reduce it by simplifying your data and creating categories. Maybe an example is in order:


Say you have a variable like temperature. This data is continuous; It is not bounded (except in extreme cases) and the temperature of something can take a wide variety of values. In regression, there would be no reason to define ranges or temperature (0-10, 11-20, 21-30, etc). The computer does the work and if you created the ranges you might reduce the explanatory power of the data (or if the data was used as a dependant variable, make it harder for other variables to predict it value). So, in research, categorizing data is a no no. So said Professor Feldman.

Funny thing was that I had a staff member (David) who kept telling me that while that the theory was right, you can usually create categories without much loss of predictive power. And in certain applications, working with categories is much easier than working with continuous data (ad serving is one such category. But that is another post).

The other day, I went through the exercise. I took continuous level data and made it categorical. David was right. Prof was wrong. At least in the world of Digital Marketing. The data retained its power and it is easier for consumers of the data to use it. Having said that, you still need a fair number of categories (over 10) to retain the power. Even still, I thought this was a fact worth knowing.