Da Facto

Thursday, March 19, 2015

Linkedin for Startups - Gust

Just discovered Gust.com. It is a social network for start-ups and investors. I am just starting with it, but it has, I guess I'll call it "funding workflow" functionality. You can use it to not only develop a presence in a relevant social network, but can use the site to exchange documents with potential investors. Looks good.

What Every Angel Investor Wants you to Know

Just finished "What Every Angel Investor Wants you to Know" by Brian Cohen and John Kador. I have mixed reactions to this one, but I think it is a must read. The author is a well known angel investor and he has good wisdom on how he thinks about evaluating a start-up. In my, admittedly, limited experience, I would take his advice as a data point, but not as roadmap. He helps shine a light on the process of angel investing and provides a lot of detail that will help a new company prepare for the process. But don't confuse successful funding round with a successful company. Financing is fuel. You still need to tend the fire.

Wednesday, March 18, 2015

On selecting a corporate lawyer

I feel very good about all aspects of our progress and state of the business, as it were, except for one thing. Our corporate legal representation. Lawyers are a bit like doctors. They have different specialties and what is common knowledge for one specialty might be totally foreign for another.

The first type of lawyer you will need is a corporate attorney. They will help you set up the company and get things like NDA's and employment agreements together. It is important to get this stuff right. I had figured that this early stage stuff is commodity kind of work and does not need some high powered, big priced attorney. I figured that this is an area where we could just select a competent local lawyer who has some tech experience and move on with our lives. I was wrong.

Upon further review, my advice for a technology start-up is to go with an experienced corporate lawyer who specializes in technology. There are terms that any experienced investor is going to expect to see in the founding documents. A lawyer who does not have the relevant experience will not include them in the documents and that could be detrimental later.

A some of red flags that we saw, but did not realize until later.

If they have never heard of a SAFE.
If there is no kind of vesting schedule or termination provisions for the founders in the shareholder agreement.
If they have you create the company in any state but DE (or maybe CA or NY, if your are based in either of those states.)

We are changing out our corporate representation. Both of the firms we are looking at are not cheap. But some things are worth paying retail.

Tuesday, February 24, 2015

The Five Dysfunctions of a Team

When I was at Turner, my friend and colleague, Karen Painter, said that I absolutely had to read "The Five Dysfunctions of a Team." I had the best of intentions. About halfway through and I wish I had read it earlier. The notion of your peers being your primary team is one construct that I think is right and needs to be set from the start. Also, that the team has the same goals. Obvious, but I have not seen in practice. Regardless, food for thought.

Monday, February 23, 2015

Hard Thing about Hard Things

The current book list is pretty analytics focused, but I have ben reading some of the startup books that folks recommend. First one: Joe Zawadzki (founder of x+1 and MediaMath) called out "The Hard Thing about Hard Things" by Ben Horowitz. Ben is a founder in the VC firm Andreessen/Horowitz. He was the CEO of LoudCloud and Opsware and has great experience building big companies, relatively quickly.

I liked the book, but Ben has a perspective that is obviously driven from his experience. And Ben's experiences are a bit rarified. He was running B2B infrastructure companies and talks about needing to raise (and then spend) $100MM to ramp the businesses. Also, he had enviable advisors. Micheal Orvitz, Bill Campbell to name drop a two. But he also had to deal with serious threats to the business. Regardless, I would recommend the read. Short version, don't give up. There are going to be very hard times. As a CEO you need to be prepared for the hard times. He leaves the "how" as an exercise for the reader (I think he would say that every situation is different and you need to find your own path for your particular hardship), but the book provides plenty of food for thought.

I will also say that this is the only business book I have read twice. I aspire to build not just a company, but a team, as influential as Ben. I have promised myself that when we hit 50MM in revenue, I will read it again. And I read Ben's blog. He just did a piece on "The Prophets of Rage" as a prototypical personality in a company. I know several PoR from both Turner and AOL. Given my time as a sound engineer with Public Enemy, this piece resonated with me in both name and content.

As an aside, I saw an Amazon reviewer ding the book on use of the female pronoun. I did the same thing for my dissertation. One comment: Be the change you want to see.

Friday, February 20, 2015

Good reads

The funding process for the later rounds is a bit opaque. But there are tons of resources to help you understand the investors perspective. I found Reaction Wheel blog, written by Jerry Neumann, a couple of weeks ago. He is a long time angel investor. I have also been checking in on AVC by Fred Wilson. Both are worth reading. Fred publishes every day. I also read the "startup trades", Venture Beat and Tech Crunch daily.

Thursday, February 19, 2015

Where is Kenny?

Some folks are asking why I am not listed on the Capture Your Flag site as an interviewee. Not sure. I reached out to Erik to ask. Here is a link to all my interviews. I did the year 4 interview maybe 5 months ago and we talked how I thought I was ready to be a CEO; that I thought I had been in training for the job and ready to take the plunge. I dId not think we were going to start our own thing. Surprise!

Wednesday, February 18, 2015

Starting a company is like making sourdough. Start with culture.

I know it is a bad title. I thought it was funny.

As a former academic-wannabe, I like to research things before doing them. I am wired to process information. So when Joe and I started to talk about the company, I read books and websites on entrepreneurship and founding a company. There is a lot out there. Over the next couple of months, I'll post resources that I think are worthwhile. To start:

Getting the company culture right (or at least not wrong) in the early days is critical. Once the culture gets ingrained, it is very resistant to change. Like impossible. Just so we are on the same page, I think of culture as the beliefs that drive behaviors in the organization. So the critical thing piece when trying to influence your company's culture is your first set of hires. They bring their beliefs with them and those beliefs drive behaviors. Of course, you have to know what beliefs you want to embed in the company. Then hire, in part, for those beliefs. And if you find that you made a wrong decision on one of your new colleagues, well, as the CEO, need to make it right.

If you are starting a company, you should take a look at the Netflix culture deck. It provides a great food for thought when thinking about what your own company's culture should aspire to be. There are some things I really like, but at its core, it is an exhortation to hire "stunning" colleagues. Having great people around you solves a lot of problems. Check out the discussion of what happens when your company gets large and specialized (around page 45). I have observed this very phenomena and made the same observation. But more importantly, they way they avoid the problem is to just hire great people. And career development? No formal program. They provide great colleagues. You are expected to learn from them. And how do they find these unicorns. They pay top of the market.

I don't like that there are nine values. Seems a bit heavy. I'll share my thinking in a later post.

Tuesday, February 17, 2015

How do you finance a Friends and Family round

I am looking at various resources for startups. Our product will be consumer facing and we'll need to raise money at various stages. Currently, we are putting together our Friends and Family round. Typically, this round uses a convertible note where the investors lend you the money to start the business and then that loan gets converted to stock at the Series A valuation. We are using a similar instrument called a "SAFE" created by Y Combinator. Same notion, but it is not a loan. There are some important terms in the SAFE; the premium and the cap. Both are optional, but seem to be common. At least in my conversations.

First, you may specify a premium that the investors gets, over and above their investment. So, if an investor gives you $100k and the premium is 20%, when the shares get issued, they get $120k. In effect, the risk premium for the investor is 20%, plus they get the upside of any future valuation.

Second, you can set a "Cap." The cap acts as a maximum valuation for the investors. Say the cap is $5MM and the investor puts in $100k. If the valuation at the Series A round is $2.5MM, the cap does not apply. Note that I did not put a premium on the investment. Hold that thought. Now, if the Series A round had a $10MM valuation, the cap applies. Without a cap, the investor would get 1% of the stock (100k/10MM=.01).With a cap, the maximum the denominator can be is $5MM. So, in the example, the investor would get 2% of the stock (100k/5MM=.02) regardless of the valuation.

Typically, the SAFE has both a premium and a cap, but the investor gets one of the other. If the series A valuation hits the cap, then the investor does not get the premium. Of course, if the cap is not reached, then the premium applies and the cap is not used.

Some investors don't like the SAFE; they have no claims on the assets of the company if management needs to close the company down. A convertible note has a little more protection. But in the early rounds, the investors I have spoken with are not worried about a wind down. They are making a bet and assume that if we need to liquidate, we will do right by them. And they are right.

Left Turner, working on something new

A bit of news. I left Turner Broadcasting in December and have started a company with Joe Wilson. Joe was my VP of engineering at Turner and we have had a great professional relationship. I don't want to say too much about the company at this point, but we are working on a media product. More details as we get closer to having the product in a state where we can demonstrate it. I expect to do some postings on starting up a company. More to come.

Tuesday, November 26, 2013

Linkedin Premium Search Traffic

I needed to send some inMail messages, so I signed up for Linkedin Premium. You get a little bit more visibility in terms of who is looking at your profile and how they got there. The most interesting thing for me is how little search traffic comes from anything about my functional job; only 1% of search traffic to my profile is based on the phrase "Big Data." Almost all of my traffic is driven from what my SEO team would call "Branded" terms. That is, derivations of my name. Number one search term is "Kenneth", then "Rona", then "Ken Rona".

Monday, November 25, 2013

Standardizing our Interviews

My current team has grown to a bit over 50 people, including contractors. We are constantly hiring for some function or another and some of my staff seem better at hiring than others. Some teams seem to attract and retain great staff. Some struggle a bit. Even within a team, our hiring experiences vary.

I am not surprised that we have these challenges. The SVP of "People Operations" at Google, speaking about their hiring practices said "Years ago, we did a study to determine whether anyone at Google is particularly good at hiring. We looked at tens of thousands of interviews, and everyone who had done the interviews and what they scored the candidate, and how that person ultimately performed in their job. We found zero relationship. It’s a complete random mess, except for one guy who was highly predictive because he only interviewed people for a very specialized area, where he happened to be the world’s leading expert."

So what are we doing about it? A couple of things. First, we are putting together a small set of attributes that every candidate will be evaluated against and a set of questions that can be used to test for those attributes. We are going to try to improve consistency of our interviews and see if we can get everyone to adopt best practices.

Second, I now interview every candidate. As the leader of my organization, I need to be responsible for the quality of the staff. Problem is, I am not scalable and I bring my own biases. I know that the CEOs of some internet companies want to review all hires. I get why. And to be fair, I don't know that my involvement will fix the problem. But I can make sure that we are hiring people that I can stand behind.

Ah, well. First step is recognizing the problem. I'll tackle the scaling issue when it becomes acute.

Amazon does something interesting. As part of the interview loop, the candidate is evaluated on if they will make Amazon smarter. And the person doing the eval is not part of the reporting structure. I think they are part of HR. I like the notion.

Thinking Fast and Slow Observations

I just finished reading "Thinking Fast and Slow" by Kahneman. My Phd was in behavioral economics and it was great to get a refresher from the master. Given my now 13 years in business, I got to read the book with a different eye; one where I could think about common pathologies I have noticed in business decision making but brought back to first principles. I highly recommend the book. Some random observations:

1. If you have a choice between a for sure likelihood of a bad outcome if you stop a project or a small probability of a good outcome but a small likelihood of a disaster, take the bad outcome. You can explain a bad outcome. It is much harder to explain that you decided to choose to go down a path that had a high probability of disaster.

2. If you see a structural impediment to accomplishing a goal, don’t proceed. See if you can fix it. If not, do something else. It is really hard to overcome a structural governor on change.

3. Take a look at the historical ability of a person, partner or team to do something. If the historical probability is low, do something else.

4. Organizational change is hard because someone always loses. And the change hurts the losers more than helps the winners. So the losers fight harder.

5. Experts do a good job of figuring out the important drivers of some phenomena. But we are not good at using those mental models in a consistent way, in the moment of making a decision. Algorithms are much better at getting to good results. Even imperfect algorithms. Think about this in the context of hiring, or forecasting, or evaluations, or capital budgeting or ...

6. Don’t just evaluate one alternative. Always put two down, even if the other one is do nothing. I like to see if, when something is framed as a positive ("we are giving you a gift") I can reframe as a negative ("You are creating an obligation")

7. People conflate liking with smart. In a hiring context, managers wind up hiring nice people who they think are smart. Not actual smart people. As organizations get bigger, you wind up with a more likable, but less smart organization. Next thing you know, you have a large group of people who have a limited skillset and can't adapt to change.

Thursday, July 5, 2012

I put together a short style guide for chart creation. I got tired of giving my analysts the same spiel over and over again, so I codified it. I should have visuals with the style guide, but it would take forever and I think the texts is pretty explanatory.

Every chart should have a governing thought or main message. The title at the top of a page should not be “Monthly Pages Views.” Rather, there should be a point to why you are showing the user the chart. A better governing thought would be: “Monthly Pages Views have increased by 12% from the previous month.”
Don’t include unnecessary elements in a chart. Sometimes I see a legend where there is only one data series in the chart. In this case, you would not need the legend; there is only one thing being shown on the chart. Another example is gridlines. If knowing the exact numbers of a metric is important to your story, turn on labels and show the numbers. And borders around charts. Lets go minimalist in terms of the elements on the chart.
No 3d. It muddies the visual. See 2.
Don’t go over 4 digits on a scale for a chart axis. There is no room on a page for 7 digit numbers. One digit is even better.
Clearly label the scale. If it is not self evident (like months or business units), please clearly label both what the metric is (Page Views, not PVs) and the scale. If it is in thousands, put that on the axis. If it is Millions, put that on the axis. Above all, I am looking for clarity here. I don’t want people to spend a lot of time figuring out what the “rules of the road” are for a particular chart.
Don’t use our internal labels for external consumers. So, no labeling a chart about page views: “Monthly_pageviews_all.” Rather “Monthly Pageviews.” Use plain English, please.
Don’t use double axis charts. I hate them. If you want to show two different metrics on the same page, just put two charts on the page.
Make sure that the scale for all charts that are using the same metrics are using the same scale. Changing the scale in the middle of a set of related charts messes with the viewer.
Don’t use line charts for anything that is not time/date based. Lines imply date or time to a viewer.
Wherever possible, provide some kind of basis for comparison on a chart. Some options are Year over Year or average. It is really hard to tell how things are going without a comparison.
Don’t vary chart types without good reason. For example, pie charts and column charts can show the same data. Viewers get use to seeing a particular type of chart and if you are changing types on them, they have to mentally change gears. Just pick one type for related charts and stick with it. And generally, I am not a big fan of Pie’s. I would prefer waterfall charts, but am not inflexible about it.
If you are going to show percentages, then you need to show the total n on the slide. If someone needs to calculate the counts for the categories on the pie, they need the total n.
Must always source the data. Tell the user where the charts are coming from.
If there are a material number of data points missing, you have to disclose it on the chart as a footnote or include a “missing” category on the chart. Either way, you need to be explicit about the limits of your analysis.

Any other suggestions?

Friday, August 26, 2011

Mentioned in a Wired article

Last bit of self-promotion today. About a month ago, my dissertation was referenced by Dan Ariely in a Wired article. I went to grad school with Dan. For me, the best thing about the mention was that I am now officially an analytics expert.

New Capture Your Flag Videos

Also have new capture your flag interviews. The first two are from last year. The new stuff is below.

Data Governance

I have not written anything in a while. More a problem of inspiration than anything else. I just didn’t have too much new to say. I now find myself inspired to discuss data governance. How exciting!

My company, Turner, is undergoing some profound changes in how we distribute our content. These changes are requiring us to retrofit existing measurement (that is, data collection) systems and standup new systems. And the process is a bit painful. We are doing a good job making the changes and developing the systems, but despite our best (admittedly organic) efforts we are still wrestling with issues of who makes critical design decisions, how to handle new requests, and who gets informed when changes get made. Though the analytic part of my job is really around building and using the analytic platforms, I was finding myself facilitating discussions around data collection and measurement.

My boss noticed this and decided to make my responsibilities more formal. So, she asked me lead our efforts in data governance and, despite my two degrees in political science, I had no idea what she was talking about. As we were having this discussion, I was thinking, “Do I need to set up a bi-cameral legislature? How about an independent judiciary?”

So what is it? If you do a Google search, you can find long and precise definitions of “Data Governance” but I find those definitions overly complicated. The short version on data governance is: determining, in advance, who gets involved (and defining their role) when there is a change in the data collection and measurement requirements of the company.” At its core, data governance is about communication. Everything else is just tactics. I am admittedly am focused on web marketing and analytics. So, my apologies to folks working other industries if my experiences don’t translate.

In terms of tactics (think policies and procedures), there are a few management techniques that we are using to make sure we include the right folks when data collection and measurement requirements change. First thing we are doing is getting Service Level Agreements (SLA’s) in place that make expectations between internal groups very clear. Our SLA’s specify, in painful detail, for any given situation that we could think of, what our time table is to handle the situation (fix it, meet about it, diagnose it, whatever), who gets contacted, and what the responsibilities of each group is in managing the situation. I treat these things as contracts and we negotiate the “terms” with our internal partners. Also, there are penalties (typically escalating to someone’s boss) for not living up to your part of the contract. I think of the SLA’s as our legal cannon and specify the policies that we are all going to agree to adhere to and what happens when there is an exception.

Another tactic that we are embracing is process documentation. We are trying to get more formal about our internal processes. This is different than the SLA in that they may not be discussed with anyone from any other internal group. We may get their input and have them be part of the process. We may not. Depends on the process. We are using a six-sigma person to do the process mapping, create RACI documents, etc.

On staffing. We are in the process of hiring a ”Data Steward.” Seriously. It is a real job. Don’t take my word for it. Look it up. This is the person who documents stuff and works with our internal partners to get the SLA’s in place, run the meetings. Etc. We are finding that for a company of our size, we need a person handling data quality and collection full time. The data steward will also act as a communications hub and make sure that the appropriate parties are speaking with each other. Note that this role is not a data cop. It is an influence and education type role, not so much a compliance role.

For those few people who have been reading the blog for a while, you know I am a big fan of ensuring that the analytic folks have high quality data to work with. To that end, you can do a bunch of automated data QA to ensure that your data is meeting your quality expectations. One new thing I have learned; you should also check to see that the relationship between variables is possible. For example, you can’t have fewer pages views than unique users. If your data says otherwise, there is a problem. Data quality assurance is going to be a big part of our data governance. In effect, we are looking to ensure that our collection activities are following the “law”

We are doing some other things, but the last thing I want to discuss is conducting prioritization meetings. We have found that if we don’t have dedicated meetings that show all outstanding requests (changes and bug fixes, mostly) it is very difficult to provide visibility to our internal clients what we are doing. They are a very reasonable bunch, but they, understandably, get nervous when they don’t know what we are working on. Or not working on. You can prioritize on a number of issues, but basically it comes down to business impact, effort, and likelihood of success.

Wednesday, May 18, 2011

Moved again

I moved to Turner about 7 months ago and have been occuppied in the new job. More posts coming. Next up, difference between ad inventory from premium publishers vs. not premium publishers. I have been on both sides of this fence and think I can see both sides pretty clearly.

Wednesday, August 4, 2010

[x+1] in the Wall Street Journal

A very fair piece, I thought. My job is make sure that we: "Know something about everyone." Maybe not a lot, but something.

http://online.wsj.com/article/SB10001424052748703294904575385532109190198.html?mod=WSJ_hpp_LEADSecondNewsCollection

Monday, March 29, 2010

Interviewed by Capture Your Flag

I was interviewed by Erik over at Capture Your Flag. Here is the link. It was fun to do and valuable. I got good props from my wife on her mention. I was flattered to be asked.

Wednesday, February 3, 2010

Open data bridge, more coverage.

Thursday, January 28, 2010

Shameless promotion

Quoted in an article about our Open Data Bridge efforts.

Wednesday, January 13, 2010

Why Demos

The other day, I was speaking to an ad agency about use of third party data in online advertising. I spent a fair bit of time talking about my focus on building out [x+1]'s demographic data set. Toward the end of the talk, someone asked a very interesting question: "If you have buyer propensity or behavioral data, why do you need demographics?"

Hmmm. Why do you need demographics in a world of in-market and intender data? Let me talk a little bit about why demos are useful in online ad targeting, and more specifically, for media targeting.

First, demographics act as useful proxies for life stages and interests. An individual’s life stage and interests are powerful drivers of purchase intent. In fact, demos serve as inputs to the models used to create intender/interest segments (but not in-market status). They are foundational.

Second, demographics are an efficient type of data. An ad network can use the data across a wide variety of product categories. So, get the data once, use it many times. This reduces the amount of integration you need your engineering team to do and speeds time to market for product specific targeting.

Third, demographic data is available for large portions of the internet audience. Any of the large online data providers claim to have ~ 30% of the audience. By contrast, counts for intender data are fairly small. How many people at a given time are in-market for airline flights to Mexico? As a percentage of users seen during an ad campaign, the number is certainly in the low single digits.

Fourth, demographic data will be commoditized. I am not suggesting that it will become cheap. I mean in the classic sense of a commodity; one source is as good as another and also comparable to a standard. This is not the case today. Some providers are more accurate than others, but over time, I would think that there will be little to distinguish one data provider from another. This means that, unlike the intender and in-market data, we'll be able to "stitch" together multiple demographic providers to create a file that provides demographics for a fairly wide set of users. Each provider has a unique (but overlapping) set of users, so we are going to want to combine datasets. Demographic data is relatively easy to combine across providers. By contrast, each provider of intender and in-market data defines their own segments, meaning that we are going to need to treat each data source separately. For a longer discussion of creating an aggregate demographics database, see my article here.

Powerful predictors of likely relevance, broadly useful, for many users, simple, and standardized. All good. So, what’s the catch?

I can see three challenges on the demographic side of data. First, the cost to use demographic data has to be very affordable in order for ad networks and agencies to apply the data to all of their ad decisioning. Online data is not yet commoditized (in the classical sense), but I believe it will eventually become so.

Second, most companies don't yet know the number of unique users each data provider can reach. At [x+1] we use enough of it to have a pretty strong idea of what works for a given campaign, but most folks don't have enough experience to understand the reach they can get from each data provider. The value of each providers data is additive to the extent that they provide data on unique users. If they are not providing data on unique users, then the path to commoditization begins. The providers would be supplying the same product. By definition, the data would be a commodity.

Lastly, each of the data providers have varying degrees of accuracy. Online, it is difficult to assess accuracy. You need to find a source of “truth” and advertisers are often reluctant to share their verified customer files with ad networks. Some ad networks rely on straight lift to assess the value of a data set; they don’t worry about accuracy. The problem with this approach is it tends to be brittle. Data sources that have some level of accuracy are useful for a little while they are being used to target users that they can accurately associate to a given data element. Over time, their predictive power degrades. I am a big believer in taking the time and care to find data sources that accurately represent the users’ age, income, whatever. As the accuracy of your data improves, you can be more confident in the longevity of your targeting strategy.

One last point; Should the data providers worry about commoditization of demographic data? If I were them, I would not be losing any sleep over it. In this case, I think commoditization would be good for the data providers. They would get less money per user on any given transaction, but they would truly make it up in volume and because their product has zero marginal cost this is a good thing. In the offline world, that dynamic has played out to the benefit of Acxiom, Equifax, Experian, InfoUSA, etc.

And for those interested, I have been giving talks to agencies and advertisers on the online 3rd party data landscape. I would be happy to talk to your teams about what kinds of third party data is coming on-line, why they should care, and how/when we expect to be able to use the data. There are very interesting capabilities being developed. Please contact me at krona@xplusone.com if you would like to know more.

Friday, December 4, 2009

Why Demos

I wrote a piece for Ad Exchanger and why internet marketers want demographic data. I'll post the text in the next couple of days.

Friday, October 23, 2009

Where is the US household file?

Conducting an offline direct marketing campaign is relatively easy. You can call any one of a number of data providers to get a us household file (that is, demographics on 115 MM US households), run a test campaign, figure out the profile of who responded to the campaign, and you are off to the races. The data exists. You just have to crank up your favorite LOGIT tool and you are in business. In the online space, not so much.

In the online world, there is no single vendor that has the US online user file. Why? It is hard to identify users online in a way that protects privacy and is meaningful for marketers. Though they are all getting better, no single online method of tagging users for demographics has over 30% of the online audience. So, in order to know basic demographics, you need to combine data from multiple data sources.

Say you have signed up multiple data providers. Are you ready to go? No. You now need to make some trade offs on accuracy and comparability. What do I mean by that? Well, all of the data sources have varying levels of accuracy. Is IP based better than cookie based? Do you have a file that you can use to know truth? Also, the data sources may report data at the user level (though anonymous) or the household or the zip4 or the zip. Seems like you should be as close to the user level as you can be, but what about the accuracy issue? Is is better to have accurate data at the Z4 or less accurate data at the user level. By the way, it is going to depend on the data type and category. All in all, very complicated stuff.

What is a online marketer to do? You have two choices on this. Test and learn and know that you are going to need to invest time and resources in getting a good set of data providers in place. Or, self servingly, work with someone who has already climbed up the learning cure :).

Monday, October 12, 2009

Got some press

My new company put together a press release. Check it out here. Too funny.

Wednesday, September 30, 2009

How to start a new job

I was talking to a colleague the other day who was about to start a new job. She had been at her previous company for about 10 years and wanted some pointers on how to make a successful transition. We talked for a while about making a strong start and I gave her three pieces of advice. First, spend the first month listening. Take everyone you can out to lunch; your internal clients and partners, your staff, vendors and ask them about how you can help them be more effective. Get invited to staff meetings and listen to the problems people are wrestling with. Just gather perspectives and try to listen very carefully. In every role, there will be opportunities that you can leverage to get a strong start. Try to find those opportunities. Low hanging fruit and all that.

Second, take what you have learned and come up with a plan to harvest the fruit. Talk to your manager about what you are going to accomplish and get her agreement on the plan. You want fast wins here, so try to avoid committing to a project that is going to take two years to complete. Also, be very wary of projects that have been lying around, incomplete. There may be a reason why the "raw data feed" or some such project has made no progress in 6 months. You can take those projects on, but try to push them back until after you have a track record.

Third, and I hate to be this tactical: Don't walk in and say things like "At we filtered our web log data using SAS" or "At we had the data warehouse take care of this problem". These may be true and relevant observations, but new co-workers react funnily to the comparisons. In some ways, talking about your former is like name dropping a celebrity that you once hung out with and irritating for the same reason. Once is interesting, but it gets old fast. You can say "at a previous employer we did x", but don't use the companies name. It sounds trivial, but take my word for it, people get tired of you saying "We used Lotus Notes at Mckinsey."

And on that note, I get a chance to follow my own advice. I am leaving IXI and moving to an [X+1] as the Vice President for Analytics and Data Strategy. I only work for companies that have an X in their name. Look for the blog to be more active as part of my role will be evangelism in the space.

Tuesday, June 9, 2009

Brute force vs. smarts

My dissertation was, in part, about how to encourage people, when solving problems, to think about the information they already have available to them and to not just gather information for its own sake. I found that you can save a lot of money if you charge just a token amount for each new piece of information. When you charge for information, people think more deeply about the information in their possession and stop asking for information that they don't really need to solve the problem. I had a real world brush with this phenomena the other day.

I got a call from someone who had an enormous database (multi-petabytes) and was looking for some advice on how to "scale" the database. I almost choked. How much more scale do you need? They were saving every piece of information they gathered from their customers and were afraid to throw anything away. "We don't know what we are going to need in the future" was the refrain.

In my mind, the organization was not thinking hard about the data they had and how to use it efficiently. Rather they let cheap storage lead them down an easy path. The brute force path. The engineering path. I can tell you from experience that, assuming money, the technical folks will solve the problem of increasing storage. The organization still thought of their challenge as one of engineering. "How do we save even more data." But the engineers can't fix the underlying problem; the organization was being thoughtful about the data they were saving. At AOL, we did an analysis and found that for predictive modeling, we relied on a small set of data (less than 100 variables) and only used 10-15 for any given use. We had several thousand variables available to us, but most were either correlated with other variables and could be deleted with no loss of usability, the data supported a business we were no longer in, or being saved for no reason that we could determine, other than it was easy.

I would suggest to companies looking to "scale their databases" to first do an inventory of the data they are saving and develop a simple process to determine if the data is worth saving. In AOL's our case, each modeler assigned a letter grade to each variable and we "voted" on the data to give away. And, after this process, at no time did we say "I wish we kept variable x." Reality was we still had too much data.

Monday, June 8, 2009

Thinking about BI recently

It occurred to me that using a BI tool is a hard way to gain insight. You are limited by your own imagination. I like hypothesis driven analysis, but I think you can do a much better job in providing insight if you understand a bit of econometrics (Logit and OLS). You can simply run a stepwise regression on the variable you are trying to understand (say Life Time Value of a customer) and see what pops out (say the interaction of age and education). Once you see what variables pop, you can then use BI tools to illustrate the point. Critical piece, make sure you run all of the interactions. That is where the cool stuff lies.

Data Simplifiction

When I was trained, I was told that you should never take continuous level data and make it categorical. One of the guiding principles of regression analysis is that variance is good; Never reduce it by simplifying your data and creating categories. Maybe an example is in order:

Say you have a variable like temperature. This data is continuous; It is not bounded (except in extreme cases) and the temperature of something can take a wide variety of values. In regression, there would be no reason to define ranges or temperature (0-10, 11-20, 21-30, etc). The computer does the work and if you created the ranges you might reduce the explanatory power of the data (or if the data was used as a dependant variable, make it harder for other variables to predict it value). So, in research, categorizing data is a no no. So said Professor Feldman.

Funny thing was that I had a staff member (David) who kept telling me that while that the theory was right, you can usually create categories without much loss of predictive power. And in certain applications, working with categories is much easier than working with continuous data (ad serving is one such category. But that is another post).

The other day, I went through the exercise. I took continuous level data and made it categorical. David was right. Prof was wrong. At least in the world of Digital Marketing. The data retained its power and it is easier for consumers of the data to use it. Having said that, you still need a fair number of categories (over 10) to retain the power. Even still, I thought this was a fact worth knowing.