I got tired of looking at all the internet advertising and just install some ad-blocking software. Link to the article that inspired me, here. The plugins are for Firefox. Updating the hosts file was simple. I just searched for the file name "hosts" and then added the big list of advertisers to both files that were found. No more ads, but Yahoo looks weird.
Tuesday, May 27, 2008
Thursday, May 15, 2008
Add more data? No, just more understanding
A couple of months ago, Anand Rajaraman, a professor at Stanford who teaches a class on data mining, wrote a blog post talking about a class assignment; students have to do a non-trivial data mining exercise. A bunch of his students decided to go after the Netflix prize, a contest, run by Netflix, to see if anyone could improve their movie suggestion algorithm by greater than 10%. I love these kinds of contests. One team in Anand's class added data from the Internet Movie Database to the Netflix supplied dataset. Another team did not add data, instead, they spent time on optimizing the recommendation algorithm . Turns out that the team that added data did much better on movie recommendations. So good, in fact, that they made the leaderboard. So what should we take away from this?
Anand suggests "adding more, independent data usually beats out designing ever-better algorithms to analyze an existing data set." He is right, but glosses over a critical word "independent." That is, the data being added has to not be correlated with the existing data. It needs to represent something new.
My take: The team that added the data were smart and operationalized descriptions of the movies better than the Netflix data. They found a dataset that added a missing theoretical construct, good descriptions of movies, and that made the difference. Just adding a bunch of data is not the takeaway here. (At my previous employer, we had over 8000 variables at the household level (we did a lot of transformation) that we could use to predict if someone was going to take an offer. In a typical model, we used less than 20 variaibles. Of the 20 models we had in production, we used less than 200 of the 8000 variables.)
So what is the secret sauce to model improvement? Adding data that operationalized a construct previously in the error term. In English: The team thought about what factors (what I was calling theoretical constructs) could possibly be used in a recommendation system and went to find data that could be a proxy for those factors. You should be willing to add (read:pay) for more data, but only if it measures something where you don't have an effective proxy.
Posted by Unknown at 7:50 AM 0 comments
Labels: data exploration, data mining
Wednesday, May 14, 2008
Winds of Change video
Have you ever seen the Kodak "Winds of Change" video? It has nothing to do with analytics, but I love the way that they confront the brand perception of Kodak as no longer relevant and show that they understand the issue and are working to become relevant again. I heard the CMO speak and he said that he almost got fired over this video. It was an internal piece that got out. Turned out that it was a viral hit and did wonders for the re-branding.
Posted by Unknown at 7:54 AM 0 comments
Monday, May 12, 2008
Tips for implementing a BI project
I am speaking at AICPA in Las Vegas on Business Intelligence. My talk is supposed to be a "lessons learned" kind of case study on using BI. I developed 11 tips when rolling out a BI solution. Some of these may look very familiar:
Tip 1: When deciding what to show in your BI tools, use a balanced score card approach.
Balanced scorecards provide a nice framework for thinking about developing useful metrics
Tip 2 : Select right hardware.
We needed a “Data Appliance” like Netezza. Feel free to overbuy. Your future self will thank you.
Tip 3: Take your time building requirements.
Figure out who is going to use the data and for what. What are needs going to be a year from now? Three years from now?
Tip 4: Conduct a data quality audit.
Check for missing data, unusual variability, unusual stability
Tip 5: Make your data warehouse responsible for fixing data quality problems.
Don’t try to build in work-arounds. You will have bought yourself a bunch of maintenance headaches. Let the guys who are supposed to give you clean data do their jobs.
Tip 6: Provide some context for each metric.
Show trends, goals, and/or min-max for each metric. This will allow the exec to decide if some metric is worth further attention.
Tip 7: Enable drill down on your charts (but don’t overdo it).
When an exec sees something “anomalous” they are going to want to see if they can figure out what is going on. Computer users are trained to clicking on things they are curious about. Leverage this behavior.
Tip 8: Avoid being “flashy”and cool.
Keep your charts simple and redundant. Allow your audience to learn how to consume your BI quickly, not be impressed with your teams programming skills.
Tip 9: Conduct 1-on-1 sessions with senior execs to ensure that they found the BI tools useful and informative. Adoption of these things is much harder than technical implementation. Do anything you can do to drive adoption
Tip 10: Choose a software package for ease of integration.
Time spent integrating is not worth the loss of strategic use of the data. Remember, the time you take to get things working right has a cost to the business.
All of the major BI vendors have very similar functionality and differences are not likely to have any impact on business decision making
Tip 11: Be ruthlessly simple about what you metrics you show. Complexity is your enemy.
Strive for few, but very meaningful metrics. Too often, you are going to want to create complex reports. Fight the urge. They will never be looked at. In this context, I will always sacrifice completeness for simplicity.
Posted by Unknown at 1:19 PM 2 comments