Tuesday, December 11, 2007

Kenny's 1 rule of segmentation

I was reading the segmentation post on Analytic Engine and it reminded me that I wanted to post my standard segmentation rant. First a story. If you don't know anything about Marketing Segmentation, check out the Wikipedia post.

When I first got to A fOrmer empoLoyer (think a large web portal), I found that we were big users of Prizm clustering, by Claritas. For those of you who have not seen the Prizm product, it is cool. The folks at Claritas have taken the whole US population and ran a cluster analysis on, well, us. They have classified each household into 66 unique segments. The segments group households that have similar lifestyle and socio-economic traits. They are a really neat way to do market to a target a pre-defined demographic and get some basic insight into who your customers are.

One problem, though, A fOrmer empoLoyer's targeting efforts were not based on simple demographics. We built fairly sophisticaed models to predict who would accept a given offer. In our case, Prizim clusters were rarely predictive, over an above the other variables we had available. People at the company had tried to use Prizm (and some of the other Claritas products) for all kinds of marketing-y things and they just did not provide value.

So, some bright person at A fOrmer empoLoyer said "we need to develop our own segmentation scheme. We'll provide a Universal Segmentation" (that is really what it was called) that can be used for any marketing or targeting activity." Strike two. The Universal Segmentation was a worse solution than Prizm and never made it out of the lab. Actually, A fOrmer empoLoyer took several runs at building a custom segmentation scheme using cluster analysis. None of them were found to be useful. As an aside, I mentioned the Universal Segmentation project to an expert in segmentation and he laughed and laughed. It was a bonding moment.

So why did A fOrmer empoLoyer have some much difficulty in using a tool that is in use by marketers everywhere? In both cases, the segments were not built with A fOrmer empoLoyer's needs in mind. They tried to do too much.

To finish the story. A fOrmer empoLoyer had been sending out a mass email to the whole customer base as a way of stimulating engagement and generating page views. The program was, uh, less than effective. I convinced the Customer Engagement folks to let us build a custom segmentation scheme for their email newsletter program that categorized each customer on the basis of the content they visited. In the spirit of transparency, Omniture did most of the work, they had the data. The thought was that the content someone viewed was a good proxy for their interests.

From the segmentation, we found that A fOrmer empoLoyer had less than 20 unique customer segments, but that very few segments described most of our customers. The Member Engagement staff started to create newsletters for each of the large segments. We had just started to use the segmentation scheme before A fOrmer empoLoyer stopped marketing, but the first campaign had much higher open rates than the mass mailing approach.

The lesson here is that when you initiate a segmentation project, you need to be really thoughtful about what you are trying to accomplish. Don't use every variable that you have available and just start cranking the k-means. Your segments will not be interpretable, and thus (you never see thus any more) won't be actionable. You need focus.

Instead, think about what you are trying to accomplish (e.g., be able to classify your customers into demographic groups) and build datasets that only include variables that are actionable or would give you insight about your customers. Build a segmentation scheme for one purpose. And when the guys at Claritas say "we have been doing this for 20 years. We have this nut cracked. Use our segmentation scheme", ask yourself, why do they have several segmentation products?

I could say some other stuff about creating segments, but if you focus on what you are trying to accomplish and build your dataset accordingly, you should have good results.

Monday, December 10, 2007

Break Down The Wall!

The other day, someone asked me what I do to break down silos between functional teams. First, let me say that I don't think breaking down silos is an academic exercise; you almost always generate value by talking to folks outside your group, department, business unit, whatever. I can think of at least 10 examples of really impactful projects that have come out of "Silo Busting." Second, creating an environment for silo busting is not hard. It just requires some time. However, the way you attack your silos varies depending on the construction material.

Just keep in mind, some silos are harder to bust than others. If structural conflicts (e.g., working together means that a rational person would do worse off by cooperating) or long standing animosity between silos exist, the problem becomes a lot tougher.

Set expectations.
When I first take responsibility for a team, I set the expectation that we are going to make things better. Every day. And that means innovating. And that in my experience, innovation often comes when two people who are very different talk to each other about their areas of expertise and their problems. So, there is just an incentive to getting to know your colleagues and what they do; Opportunity for improvement often results.

As an aside, I actually have an expectations document that I share with all of my staff; innovation in the service of continuous improvement, etc is one the core values listed in the document. Also, I set an example. I am often going out to lunch to meet with folks whom I have no obvious business connection and I encourage my staff to do the same.

Information sharing
In most cases, people are working in silos not becuase they enjoy it but becuase they are too busy to share information. There are some easy tactics that you can employ to facilitate information sharing. A couple of things I do is have whole team staff meetings where and monthly round tables. During these meetings, we discuss the projects we are working on if there are ways that the group can help. Barriers fall very naturally. Also, as I mentioned above, I try to go out to lunch with people outside my business unit and encourage my staff to take their clients (and anyone else they think they should get to know better) out to lunch. To provide an incentive, I offer to pay for it.

Personality or Competence
Other times, it is a personality or competence issue. Here, I run at the problem head on and really set the expectation that the staff has to work together effectively. I then put the staff members in situations where they have to work together to be successful (e.g., some special business initiative.) The expectation of being an effective collaborator also becomes part of their development plans and makes sure the problem gets fixed. If we get to the development plan stage, then I am going to be paying close attention and actively trying to help the staff member be an effective collaborator. Typically, by working together, people reach some accommodation, build some level of relationship, and break down the silos.

If it turns out that the problem is due to competence, well, no amount of relationship building is going to matter. If the staff member on my team is causing the problem, then a performance plan is going to be put in place. If it is a staff member outside my influence, I will likely have a conversation with their boss about their performance. Good luck.

Saturday, December 1, 2007

The Analytic Value Chain - Understand the relationships in your data

One the dataset is together and the data is QA'ed, New analysts want to dive right in and start mining the data or building statistical models . More experienced hands just putter. The run descriptives, they check to see relationships that they thought they would see actually exist (age and income are positively related, for example), creating some histograms to look at the frequency distribution of the variables.

Why all the putterage? An effective analyst needs to develop an intuition about the data. Without that intuition, they won't have the common sense required to make some of the judgment calls they are going to need to make when they get down to modeling. Data analysis requires judgement (which is why you hire experts to do it) and without a good intuitive understanding of how the data behaves, the judgement calls to come are going to be on an unstable foundation. And you get to understand the business dynamics (by understanding what variables drive outcomes) in a way that the people who are running the business can not.

Another point, you are going to find relationships that were unexpected. These unexpected relationships are things you should take note of, you are going to want to follow-up on them and understand if they are real. These unexpected relationships are the beginning of finding what I call "Game Changing Analyses", the home run of data analytics. What do I mean by game changing? Analyses that lead your business partners into changing their business strategy, that indicate that activities that are fundamental to the business need to be reassessed. A good analyst should be able to have this kind of impact multiple times a year.

For those data analysts who read the blog, developing the intuition is critical for your ability to have impact and your reputation. Imagine that you are presenting some work to a senior executive and they ask a question that you can't answer based on your analysis. If you have a good understanding of the data you can say "I can't answer your question definitively, but given what I know of the data, I believe this to be the answer to your question. Of course, I will check the answer as soon as I get back to my desk." To the extent that you consistently get these kinds of questions right, you be come a trusted resource for the executive. I know more than one analyst whose job was saved because they proved over and over again that they were an expert in the dynamics of a business unit.

Upshot, spend the time getting to know your data . If you are a manager, build time to explore data into your project plans. Understanding the relationships in the data is a critical part of the data analysts and managers job.

Monday, November 12, 2007

Change Managment - Creating a data driven boss

Avinash has a nice post on helping your boss become more data driven. I generally like lists of best practices and he is a gold mine of the things. While he is focused on Web Analytics, most of what he espouses is relevant in the business analytics world. Check it out.

The Analytic Value Chain - QA the data

Related posts
The Analytic Value Chain introduced
Defining the problem
Determine Data Requirements
Locate Relevant Data
Extract, Transform, and Load the data

Sorry for the long delay between posts. I have been focused on my job search and just got my head above water.

I won't do a big song and dance on this part of the value chain, I already did a post on QA'ing your data, here.

A fast story on how the prevalence of data quality problems . I was speaking with someone whom is an expert in the direct marketing industry about a brief sojourn in consulting. She went into consulting because she likes doing novel things and she thought that consulting was going to provide that variety. She was lamenting that all of the engagements she worked on were similar, with "correcting data hygiene" as how she spent most of her time. The problem is endemic.

No matter where you get your data, a third party or an internal data mart, you have to check the quality of the data. And regularly. Assume data is guilty until proven innocent.

Tuesday, October 23, 2007

AOL Layoffs

Posts are going to be a bit slow for a couple of weeks. AOL laid off about 80% of my business unit (the access business) and I was, as they say, impacted. When you stop doing marketing, you don't need folks in Marketing Analysis. If anyone has a need for business analysts, please contact me. A number of my staff are looking for work.

My resume is here.

Saturday, October 13, 2007

"Growing" a SAS analyst

The other day, someone asked me how to “grow” a SAS business analyst. My first thought was “Let Capital One do it for you”. I then got to thinking about what it means to “grow” someone. I think the question was really “what skills does a SAS analyst need?” I was talking to David Ye, who is a senior manager on my stats team about this problem and he noted that there is often a difference between SAS programmers and SAS business analyst. Just to be clear, I am talking about a business analyst role.

Putting aside things that make a good analyst (being a voracious and tenacious problem solver, have a good understanding of your business dynamics, good written communication skills, etc), an analyst who relies on SAS as their primary analytic tool needs to:

  • To be able to pull their own data (SQL skills and Proc SQL)
  • Know how to use SAS efficiently (can’t overstate the importance of this, think temp tables…)
  • Have a good understand the analytic procedures available (and options) they need for their job (I like Proc Means, Anova, Reg, Corr, Cluster, Factor, Chart, Plot, and Tabulate. Also, if you are doing serious experimental work, you need GLM and Mixed)
  • How to leverage the various programming options (Macro, SAS Code)

Some of these are easier than others to develop. Most of these skills are hard won, so if you are trying to train someone up on anything but the most basic Procs, my advice would be to hire an experienced person and have them train new staff. Someone who is new to SAS, in my opinion, needs someone close by to answer questions.

Wednesday, October 3, 2007

The Analytic Value Chain - Extract, Transform, and Load the data

Related posts
The Analytic Value Chain introduced
Defining the problem
Determine Data Requirements
Locate Relevant Data

Extract, Transform, and Load (or ETL) is database administrator talk for getting data ready to analyze. For those who want a more formal definition, check out Wikipedia. I'll talk a little bit about each, but most of the action is in Transform.

From the data analyst perspective, there is not much to say on extraction; you need to get the data out of your systems and you need people who have the requisite skills. Think SQL. If you are hiring data analysts make sure they can write SQL and can architect a simple database. It will be a big help when they need to merge data sets or give requirements to your IT staff.

As I said above, transformation is much more interesting, involving both cleaning the data and creating new variables for analysis. Lets talk about data cleansing first. By data cleansing, I mean how your team handles missing values and outliers.

Though unglamorous, data cleansing is a critical task. Without having a consistent method for data cleaning, everyone will invent their own method. Once people start sharing data sets, a lack of consistency leads to bad analysis. Further, you want to ensure that your vendors and your data warehouse play by the same rules as your analysts. You might think it is obvious that missing values should be coded as missing, but your DBAs are not analysts or statisticians and they have different priorities.

True story on handling of missing data: We had one data set that tracked values over of something over time, call it Revenue per Week. If Revenue per Week for Time2 was missing, the DBA pulled the values from Time1 and plugged it into Time2. In our case, we found data from t1 was being used in t71. As a result, the data set was unnaturally stable over a very long period of time. Because of our data quality checking efforts, we found the problem and now missing data is coded as missing.

The second part of transformation involves creating variables. As I have said before, I like a focused approach to data analysis. If you start transforming variables for analysis (taking the square and cubes, etc.) you add variables. Seems to run counter to taking a focused approach, I know. I am not suggesting creating every possible transformation, but you are going to need a few to help you capture non-linear effects or normalize your data.

In my experience, if you take take the square, the log (base 10), and the inverse of your variables you are going to get most of the value out of your transformations. You are going to need to use some common sense (what is the square of a categorical variable?) but in general, you are not going to need to go crazy creating new variables. However, your data analysts are going to want to create every possible transformation that they can think of; it is easy and they might need them later. Serious emphasis on might. In my opinion, the time would be better spent thinking about what transformations are actually useful. Also, each of those transformations creates another column of data that has to be processed, slowing down your analyses. My recommendation is to use the big three and if you can logically think through other variables that may need to be transformed, tackle those on an one-off basis.

Another type of transformation is creating interaction terms. I had written a long and involved descriptions of how interactions work and why you should care, but I deleted it; I think interactions deserve deserves its own post. Interactions capture additional impact of two variables combines that is not captured by considering each variable separately. The short version is that you can (and should) create interaction variables by multiplying variables together. The challenge: it is difficult to know which interactions to create. You could guess by now that I am not in favor of creating every possible interaction terms, you would create an enormous data set that would not be useful. I am a fan of creating interactions that, due to your understanding of your business, you think exist and making them a regular part of your transformation process.

Last piece, Load. I have nothing to say here. In most analytic platforms, once you have extracted the data, it is loaded. Really, the process for data analyst should be called ELT.

Take aways? You want to be thoughtful about your transformation process. People often create tons of variables as a proxy for thinking deeply about the problem they are trying to solve. Creating big, dumb, datasets have real costs associated with them. Not least of which is that they are hard to analyze. Also, create a standard method for data cleaning and make sure everyone knows the standard.

OOO

Sorry for the radio silence. I have been traveling for the last 10 days and have little access to an internet connection. Today, I am speaking at the "Optimization Summit" in San Francisco, traveling back to DC tomorrow, and back in the harness on Friday. I have a new "Value Chain" post in the hopper and I'll post it in a couple of days.

Monday, September 24, 2007

The Analytic Value Chain - Locate relevant data

Related posts
The Analytic Value Chain introduced
Defining the problem
Determine Data Requirements

This post overlaps with the previous Value Chain post on determining the data requirements, but is meant to be a bit more tactical. If you have followed my advice from the previous post, you have a good sense of what data you are going to need. Now you need to find it. In most organizations I have worked with, the data to solve any given business problem exists. The challenge is that the data often exists in a place that is not accessible to most of the folks in the organization. The data may not be in a production environment (it is sitting on a server under someone's desk) and if it is in production, the data warehouse might be so large that no one really knows what is in there (this is not an uncommon problem in real warehouses. I had an expert in physical warehousing once tell me that a really good warehouse knows where a specific pallet can be found 80% of the time). I once had both problems at the same time. I found two data sets that answered a critical business problem when combined, but were running on different desktops in two different parts of my client's organization. I found the data by luck, but wound up doing a very impactful analysis. My value was in carrying the data sets, on floppy disks, back to my PC. Obviously, you can't analyze data that you can't find.

So, what to do? I don't have a ton of advice here, but I have found somethings work pretty well in identifying the data that is out there and making it accessible. First, treasure your staff that really know your data infrastructure because they have hard fought knowledge (for those in my organization, you know who you are. And you know how much I value your contributions. You also know that I am understating.) There is no replacement for just having experience in your data infrastructure. Having said that we rely on people power, metadata helps. And even the best staff are not going to be able to find useful data if you don't have data dictionaries for every dataset.

Second, create tables that aggregate your most useful data. We do this and can get our hands on useful data sets pretty quickly; in minutes. We evaluate the variables in those tables about once a year (or after any major strategy shift) to ensure that the data set maintains its usefulness. This data set has an additional value in that it can be shared with your entire organization and forms a common "fact base" for the organization.

Third, try to think ahead and ask your team to be on the lookout for certain types of data. There are business questions that I know I am going to want to take a look at and by communicating to the team my topics of interest, they can make serendipitous discoveries.

Currently using Google Presentation and Speadsheet

What pieces of crap. The flaws are too numerous to list. I have confidence that they'll get better over time, but for now, unusable. Back to Office.

It is too bad, this is the first time I have not liked a Google product.

Wednesday, September 19, 2007

The Analytic Value Chain - Determine Data Requirements

Related posts
The Analytic Value Chain introduced
Defining the problem

So, once you have a good understanding, you should probably grab every piece of data you can find, transform the variables every way you can think of and start analyzing to figure out the variables that impact your dependant variable, right?

This way lies madness and spurious results. And is an approach that a naive data analyst would take (experts in data-mining might have a different take, but I was trained as a research scientist and I don't like the kitchen sink approach.) Before you start grabbing data, you should spend some time thinking about the behavior you are trying to predict and what phenomena might drive that behavior. Let me go a step further, I recommend that you generate hypotheses on what relationships you expect to find. Write 'em down. I go so far as to create something I call a conceptual map that graphically shows how I think the all of the variables, including the dependant, impact each other. Once I see the relationships, I can quickly generate hypotheses. Maybe you are thinking, what does this have to do with determining your data requirements? Hang on, I am getting to it.

Once you have the hypotheses, then you can start thinking about what data you need to test those hypotheses. In my world, it is important to have a good idea of what kinds of analyses we are going to be doing because the most time we have access to many more variables than we can possibly look at. We can't go on fishing expedition. Some other reasons to think hard about your data requirements:

  • In rapidly changing businesses, you'll spend more time finding data than analyzing the data. So, parsimony is key, you don't want to spend time getting data that is not going to be useful.
  • Most large enterprises have real restrictions on who can access specific types of data. So, even if you find the data, you'll have to figure out who owns the data.
  • Merging and analyzing larger datasets takes processor time. The smaller the datasets, the less time the analysts spend waiting for results.
  • The world is complicated. The more variables you have, the greater the chance of finding a spurious correlation. Also, if you have a good set of hypotheses and a conceptual map, you'll have a better sense if a relationship makes sense.

In sum, planning for your data needs, though it takes time up front, saves you time in the end.

Friday, September 14, 2007

SAS or SPSS

Someone asked me the other day if I prefered SAS or SPSS. As with so many other things, it depends. Here is a very good discussion on the differences between the two packages. From my perspective, you only go with SAS if you need their functionality, typically required if you are conducting very advanced statistics or handling very large data sets. Also, their support is very good (which you need because their product is complicated) and a very complete product suite (think end-to-end.) SPSS is much more user friendly and produces better looking output that can be easily imported into Office applications. But their product suite is a little more limited. They also have solutions, but they are more of a research focused company. So, for my everyday use, I like SPSS, but like to have access to SAS for when I need the big guns. If you would like to talk about the differences, feel free to drop me a line at ken.rona@gmail.com. I am happy to offer guidance.

Thursday, September 13, 2007

Principles of Innovation

Over the past couple of years, I have been thinking about innovation. Specifically, what (and should ) I do with my teams to encourage innovations. Someone brought to my attention a video from Google's Vice President, Search Products & User Experience, Marisa Mayer.



Marisa talks about nine guiding principles that she believes are key enablers of innovation in her business. I would not argue that we all should adopt her principles, just that it was an interesting discussion and that having such a set of guiding principles is probably something worth thinking about. Mine are (somewhat redundantly with Marisa's):

1. Transparency is your friend.
2. Don't be afraid to iterate and experiment. Always test. Without it, you will not learn.
3. Noble failure is ok. The corollary is that stupid failure is not ok.
5. Share the credit. There is plenty to go around and you will likely have more than 1 good idea in your life.
6. Try to work around people you would be honored to hang with - and recruit someone when you find them! I am fortunate in this regard.
7. Don't have an opinion. Have facts.
8. You can build a business on users, but you need to run it on money. So, while you are innovative, think about how you are going to make money with the idea.
9. Be brave (but not disrespectful) in the presence of senior executives. That is how most of them got to be in their position.
10. Encourage dissent in your organization.
11. Be dissatisfied, but don't be a jerk. It makes you want to make things better.

Sunday, September 9, 2007

The Analytic Value Chain - Determine data requirements

Defining the analytic value chain post

This year, I have had three conversations with vendors who are trying to sell my company some piece of third party data. We currently have well over a thousand variables available to us for modeling and insight, before transformations; our problem is not a lack of data. Our problem is that we have too much of it. On a recent data mining project, we included over 6000 variables in the dataset. So, why do I even mention determining data requirements as part of the analytic value chain? If you can include every variable under the sun, just grab 'em all? This way, you don't have to make any hard choices about what to include and what to leave out of the analysis.

I wish that things were that simple. For well understood problems,

Thursday, September 6, 2007

The Analytic Value Chain - Defining the Problem

Related posts
The Analytic Value Chain introduced

One mistake analysts make when starting analytic work is to just jump right in with the data. I have had staff waste weeks of time learning that they (and me) misunderstood the problem being asked by our business partners. I am not proud of this, but it is a lesson learned. What have we done to mitigate the problem? We try to have kick off meetings for every large scale analysis so that the person doing the analysis (and their manager) can get a good bead on the problem the business is trying to solve. In these conversations, we spend a lot of time trying to understand why our internal clients think something is a problem, try to get a sense of root causes (that drive our data needs and analytic choices), and get agreement as to the deliverable (as well as timing.) We also try to set some ground rules around changing the problem in mid-stream. Once we kick off an analysis, any change in the request resets the timeline that we provided for the analysis. Another best practice here is for the manager to check in with the analyst a week or so after the analysts starts the work.

So, the first way that an analytic staff can add value is to help their internal clients define the problem.

Tuesday, September 4, 2007

Using segmentation in the real world

(note: This post might require a little bit of analytic expertise to understand. Let's see how it goes. I may need to write up a post on explaining clustering...)

I am always looking for creative uses of analytics in business and I saw something interesting in Border's about a week ago,; I noticed they were making product recommendations. Not on their web site, but in their stores. In effect, they were taking a page out of Amazon's playbook and moved product recommendations into the real world.

So what did I see? If you go into a Border's you'll notice a couple of new aspects of how they are merchandising their books. First, they are running a 3 books for the price of 2 promotion, with about 20 books to chose from. Second, they have a section that has a series (may 6) pairs of books, side by side. The book on the right is labeled "If you liked this book" and the book on the left is labeled "You might like this book." I would bet that both of these merchandising efforts are being driven by someone who is mining Border's purchase data. For each case, let me tell you what I think they are doing.

3/2

In the first example, the 3 for 2 promotion: They have figured out what set of books would be interesting to some targeted (though fairly large) set of customers and are both selling and cross-selling them at the same time. The tricky part is getting a large enough set of books so that people can find three books that they would want to buy as a package. Amazon gets around this by showing you something like 20 items for cross-sell on any given page (I went to the latest Harry Potter book and counted 27 cross-selling opportunities). They probably ran a cluster analysis on customers' book purchases. A cluster analysis is a way of grouping like things together, in this case, they were probably grouping the types of books purchased by homogeneous groups customers. The goal in this clustering exercise is to find a set of books purchased by groups of customer and figure out how many people are in each customer group (to give you market sizing.) They don't even need to know anything about the customers except that they like a certain set of books. They may be doing other kinds of optimization to ensure that they promotion is profitable, but let's table that for now. So the output would be a series of list of 10-20 books, all of interest to a specific customer type.

<a quick aside>The number of books (10-20) is just illustrative. Could be more, could be less. In this case, the number of books is driven by a business need. Obviously, you can't have a 3 for 2 promotion with only 2 books. Alternatively, you could not have reasonably display 100 books. This kind of segmentation must be closely linked to business needs. For those who have heard me on my segmentation soapbox...well, lets just say some folks are tired of hearing me talk about the limits of segmentation. </a quick aside>

For a pilot, they probably picked the largest segments (for example, foodies and history buffs), pull the books of interest to the segments, and get the books out on the floor. My guess is that they actually have thew books for a couple of different segments on display. So, say there are 20 books eligible for the promotion. I found 3 books I wanted to read, but would have had a hard time coming up with another set of three. It will be interesting to see if they refresh the merchandising with new books and if they are creating different segments for each store.

Pairwise

For the pairwise book recommendations, this is a pretty straight forward analytic exercise. I would have done the analysis by category (Sci-Fi, Business, Romance, etc.) to see which most popular books had been purchased by people who have also purchased books with relatively few sales (and maybe high profit margins). My first thought is that a bi-variate correlational analysis would be a good first stab at this. In effect, the most popular book is recommending the weaker selling book. I have seen local books stores do something similar by having staff recommendations in each section. I like the Borders approach better. It is data driven.

While I have suggested some possible ways that they have developed their promotions, there are other options. Also, I assume they are testing the heck out of this new merchandising approach. All in all, very clever.

Friday, August 31, 2007

Marc's Blog

One of my friends recommended Marc Andreesen's blog. Marc was one of the founders of Netscape and Loudcloud and seems to have a very good touch on picking ideas for (and executing on) start ups. I have been reading his practical advice on founding starts ups. They are a good read and I recommend them. The first post talks about why not to do a start up. I thought it was instructive, but that some of his reasons why start ups are difficult are applicable even in fairly well established companies. Hiring is always a pain, though in different ways than he describes.

He also mentions the long hours. For anyone who wants to great something great, the hours always are long. If really you care about what you are doing, then you are going to put discretionary time in to the work.

His comment about it being easy for the culture to go sideways I would not argue, but as a company gets bigger, the culture is going to take a turn for the worse. I have worked at two large companies that had grown rapidly and both of them had "old-timers" who spoke loudly about the degradation of the corporate culture over time. Creating a good culture is a constant battle and as a company gets larger, it gets fought (day to day) not from the top, but in the middle. Having said that, a bad CEO can ruin the culture faster than you can drive a termites minbike around a pea. I guess what I am saying is that a good corporate culture can go south at any time. Creating stuff is hard. Even in big companies.

Wednesday, August 29, 2007

The Analytic Value Chain

I am currently hiring for a Director for my statistics team and it is a hard position to hire for (if you think you might be qualified, please send me a resume.) A perfect candidate has to have skills all across the data analysis value chain. I imagine you are thinking "consultant speak"; maybe so, but I think of analyses as a product that needs to run through an analytic "factory." You can manufacture impactful analyses, just like GM manufactures cars. And I think of the value chain as having 9 steps.

1. Define the problem
2. Determine data requirements
3. Locate relevant data
4. Extract, Transform, and Load the data
5. QA data
6. Understand relationships in data
7. Do the analysis
8. Create presentation materials
9. Share results with business

It is hard to find someone who has the business skills to understand the problem faced by my internal clients, has the technical skill to manage the analysis, has the process re-engineering skills to improve the process (it is a factory, remember?), and the communication skills to present the results back to the business.

In the next couple of weeks, I'll discuss each of the steps in the value chain. Hopefully this set of posts will help folks as they set up their own analytic shops.

Tuesday, August 28, 2007

Reflections on setting up a blog and Google

Recently, someone I was speaking with referred to Google as a one trick pony and that observation resonated with me. At the time, I remember thinking "Of course, they only make money on ads. Why are they wasting money on creating all of these other products that don't serve ads?"

In retrospect, I don't think that is right. A couple of observations. First, I am someone who creates content, in effect, I am a publisher who, if I am doing my job right, is increasing the number of useful pages the web has to offer. And there really are not that many folks who produce high quality content (I think the jury is still out on my, but I am in thee game, swinging) and, in general, that Google has good a relationship these folks. Second, the growth of Internet advertising is driven both by having willing advertisers (who create demand) but also of having more content (to drive supply). Google is working both sides of the equation by making it easy for advertisers to buy placements but also for savvy content creators to webify their content. . Let me tell you one way that Google is making content creation (and increasing supply) easy. At least for Bloggers.

I prefer integrated products. I have spent too much time in my life trying to get best of breed products to work (see this post.) Google has set Bloogger up as a more or less integrated blogging publishing platform. Actually, they allow you to select best of breed applications, but they make it easy to use their products. Even still , the point holds.

I routinely use 4 Google products to publish this bloq. I create my posts in Google Docs, publish directly to the blog, have embedded Ad-Sense advertising, and track site usage using Google analytics. So what does this have to do with increasing supply? Google has made it easy to not only publish and track performance, they have made it easy to serve ad-sense ads. I could use other vendors for word processing or ad serving of web analytics, but why would I. I am pretty sure that Google has made it so that all of their products are going to play nice with each other (and so far, they do), so why take the chance.

All this is to say that even though Google may seem like a 1 trick pony, they are really a 2 trick pony. And even though the first trick gets a lot of attention, the second trick, creating new distribution channels for their ads, is also a really good trick.

Tuesday, August 21, 2007

Best of Breed or One Vendor?

Analytics folks need a good set of tools to do their work. I won't get into a SAS vs. SPSS discussion (different tools for different situations. Ok, I got into it) but I did want to talk really briefly about choosing an analytic "stack." By stack, I mean the set of entire tools that enable high quality analytic work; The tools that you need for data extractions, transformation, and loading, data QA, analysis, reporting and data mining, etc. In my organization the stack includes Netezza and Sybase as our database platform, SAS for our data ETL, QA and access, SAS for our ad hoc analysis/data mining, and Business Objects for standard reporting.

if you look not too closely, you can see a common theme. SAS is the dominant vendor. Do we use a lot of SAS because they make the best products for each part of our stack? No, but their products are good and they are well integrated. By concentrating our purchases on one vendor we reduce the integration costs; we are assured that the products are going to work well together. Or at least that we can hold someone accountable if the disparate products don't play nicely. In fact, SAS wrote a Netezza database connector for us when we needed it.

The other option is to take a best of breed approach and buy your products from multiple vendors. The advantage in going best of breed is that you can really tailor your stack to your needs. For example, you may need a lot of control over the charts created by your analysis tools and only one vendor allows you to have that control (I once worked for a company whose tick marks on charts had to point away from the chart. You can actually select this option in Excel, but most vendors don't allow such fine control. Having such a fine level of control over the look of the output was considered a "requirement" for our business intelligence platform.)

So which approach should you take? My advice is to fight the logic of best of breed. I know it is difficult to pick products that are not the best fit for your situation. Fight it. In my experience, most of these products are more similar then different. Think about word processors. Most of us use Word, but would not shy away from trying Open Office (actually, I am writing this post using Google Documents). Word processors all have the same basic functionality. Word may be very full featured, but most of us just use it as a text editor with a touch of formatting. Similarly, the stats packages are more similar than different. Those little features or very specialized functions that seemed so important when you were making your purchase decisions won't make a lick of difference in the quality of your team's analytics. What does make a difference to your team's effectiveness is their ability to deploy and exchange data between applications. There are times when you need to select a "best of breed" application; when you to use proprietary analytic techniques that are offered by a single vendor. But in general, you are better off giving up the perfect analytical app for one that is well integrated into an analytic suite. My advice is to stick with one vendor for your analytic stack or at least minimize the number of times you change vendors as data moves up your stack.

Sunday, August 12, 2007

Helpful tips for off-shoring

I may do a longer post about my experiences with my team in India, but for now, here are somethings you should keep in mind if you are thinking about off-shoring reporting and to a lesser extent, analytics. Here are some practical things you can do to help the effort be a success.

  1. Put process in place. By this I mean that you need to have a documented process (and process maps) for each report that is being run. Documentation includes things like data sources, business owners for the reports, how to run a report, what to do in cases of failure (with each type of failure enumerated), etc. This should be a living document that gets updated as reports find new ways to fail to run (we keep ours on a wiki). And the process of updating should be part of the process documentation. I have a recommended book, Business Process Management: Practical Guidelines to Successful Implementations that is a good reference. By putting process in place, you make the responsibilities of both the US and off-shoring side very clear. This clarity is critical. In fact, we won't start producing a report in India until the US side does documentation for an existing report. Trying to do off-shore reporting on an ad hoc basis will be very difficult.
  2. Be very careful about checking skills. We have not had a hard time finding qualified candidates on paper. We have had a very hard time finding qualified candidates in life. We have had a number of instances where candidates have grossly misrepresented their skills. Worse than anything I have seen in the US. Each candidate now has to pass through multiple screening tests of their skills. The tests are both oral, given during the phone screens and interviews, as well as written (given on site for candidates who have made it to an in-person interview). The tests are not hard, but you can't fake knowing what a proc freq is during an interview. We tried to give a pretty comprehensive written test to prospective candidates after the phone screen to be completed before they came in for an in-person interview, but we found significant cheating. Lesson learned.
  3. Check the references. Nuff said.
  4. Don't hire job hoppers. In the India market, anyway, there is some job hopping going on. It is not uncommon to find candidates who have taken several jobs and moved on. Don't think that they are going to treat you any different. We invest a significant amount in hiring and training and we need to make that training pay off. Also, I want people to become part of the culture. We won't even look at hoppers resumes.
  5. Make sure that the US side is invested in success of the off-shoring efforts. In my case, we track utilization and report quality of the team at our weekly staff meetings. I am the person on the US side who is ultimately responsible for the India's team success and sharing metrics with the rest of the leadership team ensures that both me and my staff stay focused. Also, if I find that someone is not using their India resource effectively, I will take the resource away.
  6. Use the off-shored staff for project based work where they can be fairly self sufficient. The original vision for the India staff was to be the equivalent of a US analyst. These were unrealistic expectations. The US staff works with their clients everyday and are much more able to solve problems both proactively and on an ad hoc basis. The time differences makes it much more difficult for the off-shore staff to find the people they need to speak with and they are, by the nature of the distance, more removed from the day to day needs of the business. In our case, production reporting was a perfect thing to move. Reports are produced on a regular basis, allowing the analyst time to get familiar with the infrastructure needed to run the reports and learn what the results mean. Processes can be documented and, in the case of staff turn over, be easily transitioned to someone else We kept ad hoc reporting stays in the US. Maybe over time it will move to India, but for now, we are staying put.
  7. Travel! Both you and your senior off-shore staff need to travel to each others locations, at bare minimum, a couple of times a year for a week. Also, think about bringing your more junior folks over once a year for 5-6 weeks. That will given them an opportunity to meet their US counterparts and build relationships that they need to do their job effectively. I went to India recently and found the experience invaluable. the trip gave me a first hand appreciation on how difficult managing the time zone differences are. I also got to play cricket. Make sure you get your shots and carry a small pharmacy with you. I got a very small cut that turned into a bad infection in about 8 hours. Thankfully, I had Cipro and Bacitracin with me. I treated the cut myself and it turned out fine, but it was touch and go for a while. Just to reiterate, the trip was invaluable.
  8. Meet the staff regularly and use video. I have a weekly pull up with my manager in India and a monthly round table with the whole team. I also hold a bi-weekly "office hours:" where folks in the US can stop by and give the manager and I feedback on how things are going. I just got tired of all the complaining about things not working right and now people not addressing their challenges in a forthright way. By having these forums, people have no excuses. Another tings. I found that when I was in India it was impossible for the off-shored staff to understand what was being said on speaker phone. The phones cut in and out. And no one said anything. We now try to use video for meetings whenever possible.
  9. Ask other folks what has worked for them. I got good counsel from a number of sources. Some of what I have listed above is redundant with that advice, but I agree with their advice.

I would not say that we have off-shoring nailed, but I think we are making good progress. Our next step is to actually offshore advanced analytic work, but we have just started. Once we get our feet wet, I'll put up a lessons learned for advanced analytics. Are their other things people have learned that should get added to the list?

Tuesday, August 7, 2007

Data Quality on the Cheap

I had a funny moment about 3 months ago. I was chatting with the VP of Advertising for a large retail chain and we got to talking about data quality (He was also responsible for Direct Marketing). I asked him “How do you check your data quality?” and he replied “I don’t think we do. Should I?” Yes. Yes, you should. If no one is checking regularly, your data quality is bad and your resulting decisions are going to be…well, you know.

Data tables are like cars. They need regular attention to ensure that they are performing well. If no one is checking, then your tables are out of tune. And for those of you who get your data from an outside vendor, don’t think that they are doing regular data QA. In my experience, my vendors ensure that the tables are produced, but are not tracking to see if the values in a given variable are reasonable. So, how should you check your data quality? We did some very simple things to give ourselves a pretty complete picture of the state of our data.

The first thing we did was build a historical database of some basic statistics for each variable in our bi-weekly production table. We tracked: mean, median, mode, 25th percentile, 75th percentile, standard deviation, skew, and kurtosis. We also tracked number of 0 values and number of missing values.

We found, straight away, that roughly five percent of our variables had large numbers of 0’s or missing values. We went back to our data provider for an explanation on why data seemed to be missing in these variables. Over the course of the next 2 weeks, they either found a problem with the data feed, the logic used to create the variable, or gave us a satisfactory reason why the data looked like it had some many holes.

Our next step was to look at variables that were not stable over time. Our dataset included all US households; the variables associated with the household don’t vary much in the aggregate. We focuses on calculated the mean, medians, and standard deviations over time for each variable (the other metrics, skew, ketosis, modes, etc, did not add any value over the basics). I think we went back 12 weeks (or 6 periods)

I was frankly shocked at how easy it was to find “suspect” variables! If you plot the values over time, suspects just jump out at you. We had some variables (I want to say 10% of the total number) whose means varied by greater than 10%, period over period. There were too many variables to chase down all at once, so we identified the 20 or so variables who were the worst offenders; their means varied by more than 50%, period over period. We went through the same process with the vendor as we did with the missing variables; fix the variable or justify why it varied so much. Over the next 8 weeks, we steadily reduced the amount of acceptable variation, going back and speaking with the vendor, variable by variable. This was a very valuable exercise. Our current variance threshold now hovers around 2%.

In the last part of the project, we made some process changes. First, I had a conversation with the vendor and offered them the SAS code we were using for data QA. They accepted immediately. They wanted to do the QA themselves, before we found a problem. We keep checking, but now the vendor can get ahead of the game and provide even better service. We review our data QA checks, bi-weekly, at my staff meeting. Typically, the person responsible now says “nothing to report.” In addition, we have created good SAS code to automate the process and have just move the QA process to India (I guess I should do a “Lessons learned in off-shoring” piece). All in all, our ongoing QA process is relatively painless.

Was this the most bullet proof data QA process we could have put in? No. We are relaying on changes in distribution to catch bad data. Some variables may be of poor quality and because they have not varied much, it is possible that we may never catch them. I don’t think this is likely. Each variable is used in some project or another on a pretty regular basis. Once a variable makes it into a projects, its quality is checked extensively. We have not found a new suspect variable this way yet, but you never know. I can say, pretty authoritatively, that our data quality has gotten much better.

Monday, August 6, 2007

Getting a pet project off the ground - Part 3

Getting pet projects off the ground - Part 1
Getting pet projects off the ground - Part 2

Conduct a successful pilot

I can imagine what you are thinking. “Conduct a successful pilot? What kind of advice is that? How am I supposed to know if a pilot is going to be successful? That is why I am running a pilot!” Obviously, you can’t ensure that your pilot is successful, but you can (and should) do everything you can to make sure that the execution of the pilot is high quality. In my case, all I did was facilitate and try to ensure that as the business ran the pilot, that I could help them make good decisions. I went to the planning meetings, I kept abreast of the decisions the business made (but really just advised, they did all the work).

I also paid a lot of attention to the results. In my company’s case, the pilot wave had a good result. I put together a one pager with the results and am now using it to show other business units the impact that site testing could have.

Be patient

The last step is not really a step, it is more of an approach. I would counsel to be patient in all parts of the process. Rushing these things turns folks off. You really need to build support and get buy-in. It took probably 9 months from when I had my first discussion with the vendor and our first test. Let folks see the value and want to make the pet project their own. On that note, don’t hold onto the project too tightly. Let other folks take ownership. In my case, the first business unit, the one who piloted site testing is off and running. I am evangelizing site testing with other parts of the business.

Getting a pet project off the ground - Part 2

Link to Step 1.
Step 2
I don’t control the web properties for any of my company’s web sites, so if I wanted to introduce the organization to the benefits of site testing, I would need partners. So, the next step was to start evangelizing site-testing though the organization (BTW, the vendor was Optimost). This step has multiple sub-stages; create a sound-bite, chat the project up, and educate senior business leaders and others.

Sound Bite
I think that a common mistake with junior folks who want their organization to try a new technology, software, whatever, is to lead discussions with the technology. “Look at how cool this is!”

The problem is that a business leader is not going to care about the technology. They care about the problem. You need to make the problem come to life. So, I gave a lot of thought to being able to quickly explain the business problem I was trying to solve using as little jargon as possible...

You are going to use this sound bite in front of senior execs, so it is worth getting it tight and getting it right. Once I had someone interested in the problem, I could get them interested in the solution. In my case, my problem statement is: “We make site design decisions based on opinion, and not fact. In order to know the facts, we need to be able to rapidly test our site designs. We can use a “site testing” vendor to test billions of site design combinations in a matter of weeks. This will let us build a web page that optimizes for the things we care about, like generating page views, increasing the number of unique visitors, or contribution value.” We can argue metrics later, at this point; I was just trying to get some folks interested.

Chat it up!
So I had my sound bite, my next step was to start using it incessantly; in my meetings with my manager, her manager, my colleagues, you get the idea. I wanted as many people as possible to have heard the sound bite. This is really about laying the groundwork, getting people familiar with your problem and agreeing that this is a problem that needs to be solved. The reality is that big companies have any number of big problems needing to be solved. I was trying to get agreement that site design was an important one, one worthy of solving.

Over the course of the next month, I had a couple of meeting with Very Senior folks (regarding other projects) and worked my problem statement into the discussions (I really was shameless). Both of them agreed that our site design process did not take business needs into account, and that the organization really had no way ensuring that the site of gathering those facts; facts that we needed to optimize our page design. I suggested to both of them that I bring in a vendor and invite each of their senior staffs to learn of the benefits of site testing.

This was an easy sell. Most Very Senior folks want their staff to be more innovative and are happy to give a little push. If you try to go from the bottom of the organization up, well, I have tried that method and have had little success. I am sure it is possible, but in my very big company, people are busy doing their regular jobs and need that push to take on additional responsibilities. I had actually tried to get the organization interested in site testing about 6 months previously. One of my colleagues and I invited a number of junior staff to an information session and nothing came of it. While they appreciated the session, no one felt empowered to kick off a pilot.

One last piece here, you better have become an expert in step 1. One of the business leaders had used site testing in a previous organization and knew his stuff. So, I needed to be able to have a pretty detailed conversation with him in order for him to be confident that I was the right guy to push this project forward. Almost home!

Educate senior business leaders and others

As mentioned above, I asked the Very Senior folks who should be invited to the educational sessions and they both suggested inviting all of their direct reports. I then put together an email that invited the directs to a meeting. The email explained site testing and offered to have a second session for their direct reports. This is a critical point. We actually had 2 meetings. One for folks who could reasonably sponsor a pilot and one for the folks who would be responsible for pulling it off. The types of discussions are different in those meetings and I wanted the leadership to be excited by the potential while I wanted their staff to be interested in the execution. I actually had at least 3 execution level meetings for various groups in the organization, but one Very Senior meeting.

It worked out well, though truthfully, I was trying to generate as much support as I could. If one of the more junior folks would have expressed interest in implementing site testing, I am sure we could have worked something out. Once again, I don’t know if my approach would work in every situation, but I was trying to plant 100 flowers and watched to see which one would bloom. The nice thing about having the Very Senior folks engaged was that their influence could help break log jams.

Next step: Conduct a successful pilot

Sunday, August 5, 2007

Google Analytics

I was wondering how to track visitors, etc, on this site. I just assumed that tracking functionality would be built into the back end. Turns out you can use any web tracking package and insert the script snippet directly into the pages header. Very cool. I really like that Google imposes as few constraints as possible. I went with Google Analytics just because they gave me a choice.

Getting a pet project off the ground - Part 1

Part 1

My staff tells me that one of my core skills is getting large organizations to try complicated things. What I think they really mean is that I am good at getting the organization to try things senior folks don’t fully understand. For more junior folks, the folks who are closest to the technologies and the line, who understand the “thing,” it is very frustrating to try to get an organization to try something new and complicated. It does not have to be frustrating. Most recently, I got my employer, a multi-billion dollar web portal company, to start experimenting around with multi-variate site testing. I have a pretty standard plan of attack for these kinds of things and followed the same strategy for getting site testing moving in the organization: I try to know more, built support, conduct a successful pilot, and be very patient.

Know more
I like taking on pet projects. Even if they don’t go anywhere, I learn a lot. In the case of site testing, I spent about 4 weeks becoming the company expert. I started out by Googling like crazy, identifying vendors who offer site testing, read their white papers, etc. I then called the vendors and set up informational discussions. My advice is “don’t be shy.” Most vendors are happy to take these calls. They love in-bound sales leads and they know most don’t go anywhere. I did 2 calls with my first vendor, both an initial discussion of the technology and then a follow-up where we focused on implementation. I then spoke with 3 other vendors and explicitly asked them how their product differed from my first vendor. At the end of the 4 weeks, I had gotten a great education in the technology and what differentiated each company. I even put together a little one pager for myself to make sure I could articulate those differences.

Next part, we’ll talk about building support and evangelizing. Should be up in a couple of days.

Tuesday, July 24, 2007

First Blurb

After running my own website (www.facto.org) for nine years, I have finally moved over to a service provider.

I really thought I was done with blogging, that I had nothing else to say. But I find myself increasingly talking to people about business analytics. For good reason: it is my business. Also, my friend Jeff Demoff has started a company around data analysis and has me as a contributor on his podcasts – he sends me business-related articles, we chat about them and he posts them on his site (nameofsite.com). But in the podcasts we have done so far, I have found the format a little limiting – with only one take, you have to make up answers on the fly and the whole thing feels a bit rushed. I would rather be able to give more thoughtful comments on each article and talk about other stuff as I am interested. Hence this blog.