I needed to send some inMail messages, so I signed up for Linkedin Premium. You get a little bit more visibility in terms of who is looking at your profile and how they got there. The most interesting thing for me is how little search traffic comes from anything about my functional job; only 1% of search traffic to my profile is based on the phrase "Big Data." Almost all of my traffic is driven from what my SEO team would call "Branded" terms. That is, derivations of my name. Number one search term is "Kenneth", then "Rona", then "Ken Rona".
Tuesday, November 26, 2013
Monday, November 25, 2013
Standardizing our Interviews
My current team has grown to a bit over 50 people, including contractors. We are constantly hiring for some function or another and some of my staff seem better at hiring than others. Some teams seem to attract and retain great staff. Some struggle a bit. Even within a team, our hiring experiences vary.
I am not surprised that we have these challenges. The SVP of "People Operations" at Google, speaking about their hiring practices said "Years ago, we did a study to determine whether anyone at Google is particularly good at hiring. We looked at tens of thousands of interviews, and everyone who had done the interviews and what they scored the candidate, and how that person ultimately performed in their job. We found zero relationship. It’s a complete random mess, except for one guy who was highly predictive because he only interviewed people for a very specialized area, where he happened to be the world’s leading expert."
So what are we doing about it? A couple of things. First, we are putting together a small set of attributes that every candidate will be evaluated against and a set of questions that can be used to test for those attributes. We are going to try to improve consistency of our interviews and see if we can get everyone to adopt best practices.
Second, I now interview every candidate. As the leader of my organization, I need to be responsible for the quality of the staff. Problem is, I am not scalable and I bring my own biases. I know that the CEOs of some internet companies want to review all hires. I get why. And to be fair, I don't know that my involvement will fix the problem. But I can make sure that we are hiring people that I can stand behind.
Ah, well. First step is recognizing the problem. I'll tackle the scaling issue when it becomes acute.
Amazon does something interesting. As part of the interview loop, the candidate is evaluated on if they will make Amazon smarter. And the person doing the eval is not part of the reporting structure. I think they are part of HR. I like the notion.
Posted by
Unknown
at
12:32 PM
0
comments
Thinking Fast and Slow Observations
1. If you have a choice between a for sure likelihood of a bad outcome if you stop a project or a small probability of a good outcome but a small likelihood of a disaster, take the bad outcome. You can explain a bad outcome. It is much harder to explain that you decided to choose to go down a path that had a high probability of disaster.
2. If you see a structural impediment to accomplishing a goal, don’t proceed. See if you can fix it. If not, do something else. It is really hard to overcome a structural governor on change.
3. Take a look at the historical ability of a person, partner or team to do something. If the historical probability is low, do something else.
4. Organizational change is hard because someone always loses. And the change hurts the losers more than helps the winners. So the losers fight harder.
5. Experts do a good job of figuring out the important drivers of some phenomena. But we are not good at using those mental models in a consistent way, in the moment of making a decision. Algorithms are much better at getting to good results. Even imperfect algorithms. Think about this in the context of hiring, or forecasting, or evaluations, or capital budgeting or ...
6. Don’t just evaluate one alternative. Always put two down, even if the other one is do nothing. I like to see if, when something is framed as a positive ("we are giving you a gift") I can reframe as a negative ("You are creating an obligation")
7. People conflate liking with smart. In a hiring context, managers wind up hiring nice people who they think are smart. Not actual smart people. As organizations get bigger, you wind up with a more likable, but less smart organization. Next thing you know, you have a large group of people who have a limited skillset and can't adapt to change.
Posted by
Unknown
at
11:56 AM
0
comments
Thursday, July 5, 2012
- Every chart should have a governing thought or
main message. The title at the top of a page should not be “Monthly
Pages Views.” Rather, there should be a point to why you are showing
the user the chart. A better governing thought would be: “Monthly
Pages Views have increased by 12% from the previous month.”
- Don’t include unnecessary elements in a chart.
Sometimes I see a legend where there is only one data series in the
chart. In this case, you would not need the legend; there is only one
thing being shown on the chart. Another example is gridlines.
If knowing the exact numbers of a metric is important to your story,
turn on labels and show the numbers. And borders around charts.
Lets go minimalist in terms of the elements on the chart.
- No 3d. It muddies the visual. See
2.
- Don’t go over 4 digits on a scale for a chart
axis. There is no room on a page for 7 digit numbers. One
digit is even better.
- Clearly label the scale. If it is not
self evident (like months or business units), please clearly label both
what the metric is (Page Views, not PVs) and the scale. If it is in
thousands, put that on the axis. If it is Millions, put that on the
axis. Above all, I am looking for clarity here. I don’t want
people to spend a lot of time figuring out what the “rules of the road”
are for a particular chart.
- Don’t use our internal labels for external
consumers. So, no labeling a chart about page views:
“Monthly_pageviews_all.” Rather “Monthly Pageviews.” Use plain
English, please.
- Don’t use double axis charts. I hate
them. If you want to show two different metrics on the same page,
just put two charts on the page.
- Make sure that the scale for all charts that
are using the same metrics are using the same scale. Changing the
scale in the middle of a set of related charts messes with the viewer.
- Don’t use line charts for anything that is not
time/date based. Lines imply date or time to a viewer.
- Wherever possible, provide some kind of basis
for comparison on a chart. Some options are Year over Year or
average. It is really hard to tell how things are going without a
comparison.
- Don’t vary chart types without good reason.
For example, pie charts and column charts can show the same data.
Viewers get use to seeing a particular type of chart and if you are
changing types on them, they have to mentally change gears. Just
pick one type for related charts and stick with it. And generally, I
am not a big fan of Pie’s. I would prefer waterfall charts, but am
not inflexible about it.
- If you are going to show percentages, then you
need to show the total n on the slide. If someone needs to calculate
the counts for the categories on the pie, they need the total n.
- Must always source the data. Tell the user
where the charts are coming from.
- If there are a material number of data points
missing, you have to disclose it on the chart as a footnote or include a
“missing” category on the chart. Either way, you need to be explicit
about the limits of your analysis.
Posted by
Unknown
at
7:14 PM
1 comments
Friday, August 26, 2011
Mentioned in a Wired article
Last bit of self-promotion today. About a month ago, my dissertation was referenced by Dan Ariely in a Wired article. I went to grad school with Dan. For me, the best thing about the mention was that I am now officially an analytics expert.
Posted by
Unknown
at
10:35 AM
2
comments
New Capture Your Flag Videos
Also have new capture your flag interviews. The first two are from last year. The new stuff is below.
Posted by
Unknown
at
10:18 AM
0
comments
Labels: capture your flag
Data Governance
I have not written anything in a while. More a problem of inspiration than anything else. I just didn’t have too much new to say. I now find myself inspired to discuss data governance. How exciting!
My company, Turner, is undergoing some profound changes in how we distribute our content. These changes are requiring us to retrofit existing measurement (that is, data collection) systems and standup new systems. And the process is a bit painful. We are doing a good job making the changes and developing the systems, but despite our best (admittedly organic) efforts we are still wrestling with issues of who makes critical design decisions, how to handle new requests, and who gets informed when changes get made. Though the analytic part of my job is really around building and using the analytic platforms, I was finding myself facilitating discussions around data collection and measurement.
My boss noticed this and decided to make my responsibilities more formal. So, she asked me lead our efforts in data governance and, despite my two degrees in political science, I had no idea what she was talking about. As we were having this discussion, I was thinking, “Do I need to set up a bi-cameral legislature? How about an independent judiciary?”
So what is it? If you do a Google search, you can find long and precise definitions of “Data Governance” but I find those definitions overly complicated. The short version on data governance is: determining, in advance, who gets involved (and defining their role) when there is a change in the data collection and measurement requirements of the company.” At its core, data governance is about communication. Everything else is just tactics. I am admittedly am focused on web marketing and analytics. So, my apologies to folks working other industries if my experiences don’t translate.
In terms of tactics (think policies and procedures), there are a few management techniques that we are using to make sure we include the right folks when data collection and measurement requirements change. First thing we are doing is getting Service Level Agreements (SLA’s) in place that make expectations between internal groups very clear. Our SLA’s specify, in painful detail, for any given situation that we could think of, what our time table is to handle the situation (fix it, meet about it, diagnose it, whatever), who gets contacted, and what the responsibilities of each group is in managing the situation. I treat these things as contracts and we negotiate the “terms” with our internal partners. Also, there are penalties (typically escalating to someone’s boss) for not living up to your part of the contract. I think of the SLA’s as our legal cannon and specify the policies that we are all going to agree to adhere to and what happens when there is an exception.
Another tactic that we are embracing is process documentation. We are trying to get more formal about our internal processes. This is different than the SLA in that they may not be discussed with anyone from any other internal group. We may get their input and have them be part of the process. We may not. Depends on the process. We are using a six-sigma person to do the process mapping, create RACI documents, etc.
On staffing. We are in the process of hiring a ”Data Steward.” Seriously. It is a real job. Don’t take my word for it. Look it up. This is the person who documents stuff and works with our internal partners to get the SLA’s in place, run the meetings. Etc. We are finding that for a company of our size, we need a person handling data quality and collection full time. The data steward will also act as a communications hub and make sure that the appropriate parties are speaking with each other. Note that this role is not a data cop. It is an influence and education type role, not so much a compliance role.
For those few people who have been reading the blog for a while, you know I am a big fan of ensuring that the analytic folks have high quality data to work with. To that end, you can do a bunch of automated data QA to ensure that your data is meeting your quality expectations. One new thing I have learned; you should also check to see that the relationship between variables is possible. For example, you can’t have fewer pages views than unique users. If your data says otherwise, there is a problem. Data quality assurance is going to be a big part of our data governance. In effect, we are looking to ensure that our collection activities are following the “law”
We are doing some other things, but the last thing I want to discuss is conducting prioritization meetings. We have found that if we don’t have dedicated meetings that show all outstanding requests (changes and bug fixes, mostly) it is very difficult to provide visibility to our internal clients what we are doing. They are a very reasonable bunch, but they, understandably, get nervous when they don’t know what we are working on. Or not working on. You can prioritize on a number of issues, but basically it comes down to business impact, effort, and likelihood of success.
Posted by
Unknown
at
10:13 AM
0
comments
Wednesday, May 18, 2011
Moved again
Posted by
Unknown
at
6:13 AM
0
comments
Wednesday, August 4, 2010
[x+1] in the Wall Street Journal
A very fair piece, I thought. My job is make sure that we: "Know something about everyone." Maybe not a lot, but something.
Posted by
Unknown
at
5:47 AM
0
comments
Monday, March 29, 2010
Interviewed by Capture Your Flag
I was interviewed by Erik over at Capture Your Flag. Here is the link. It was fun to do and valuable. I got good props from my wife on her mention. I was flattered to be asked.
Posted by
Unknown
at
7:13 AM
1 comments
Labels: capture your flag, video
Wednesday, February 3, 2010
Open data bridge, more coverage.
More on the Open Data Bridge.
Posted by
Unknown
at
5:36 AM
0
comments
Labels: ETL, open data bridge, x+1, xplusone
Thursday, January 28, 2010
Shameless promotion
Quoted in an article about our Open Data Bridge efforts.
Posted by
Unknown
at
7:17 AM
0
comments
Wednesday, January 13, 2010
Why Demos
The other day, I was speaking to an ad agency about use of third party data in online advertising. I spent a fair bit of time talking about my focus on building out [x+1]'s demographic data set. Toward the end of the talk, someone asked a very interesting question: "If you have buyer propensity or behavioral data, why do you need demographics?"
Hmmm. Why do you need demographics in a world of in-market and intender data? Let me talk a little bit about why demos are useful in online ad targeting, and more specifically, for media targeting.
First, demographics act as useful proxies for life stages and interests. An individual’s life stage and interests are powerful drivers of purchase intent. In fact, demos serve as inputs to the models used to create intender/interest segments (but not in-market status). They are foundational.
Second, demographics are an efficient type of data. An ad network can use the data across a wide variety of product categories. So, get the data once, use it many times. This reduces the amount of integration you need your engineering team to do and speeds time to market for product specific targeting.
Fourth, demographic data will be commoditized. I am not suggesting that it will become cheap. I mean in the classic sense of a commodity; one source is as good as another and also comparable to a standard. This is not the case today. Some providers are more accurate than others, but over time, I would think that there will be little to distinguish one data provider from another. This means that, unlike the intender and in-market data, we'll be able to "stitch" together multiple demographic providers to create a file that provides demographics for a fairly wide set of users. Each provider has a unique (but overlapping) set of users, so we are going to want to combine datasets. Demographic data is relatively easy to combine across providers. By contrast, each provider of intender and in-market data defines their own segments, meaning that we are going to need to treat each data source separately. For a longer discussion of creating an aggregate demographics database, see my article here.
Powerful predictors of likely relevance, broadly useful, for many users, simple, and standardized. All good. So, what’s the catch?
I can see three challenges on the demographic side of data. First, the cost to use demographic data has to be very affordable in order for ad networks and agencies to apply the data to all of their ad decisioning. Online data is not yet commoditized (in the classical sense), but I believe it will eventually become so.
Second, most companies don't yet know the number of unique users each data provider can reach. At [x+1] we use enough of it to have a pretty strong idea of what works for a given campaign, but most folks don't have enough experience to understand the reach they can get from each data provider. The value of each providers data is additive to the extent that they provide data on unique users. If they are not providing data on unique users, then the path to commoditization begins. The providers would be supplying the same product. By definition, the data would be a commodity.
One last point; Should the data providers worry about commoditization of demographic data? If I were them, I would not be losing any sleep over it. In this case, I think commoditization would be good for the data providers. They would get less money per user on any given transaction, but they would truly make it up in volume and because their product has zero marginal cost this is a good thing. In the offline world, that dynamic has played out to the benefit of Acxiom, Equifax, Experian, InfoUSA, etc.
And for those interested, I have been giving talks to agencies and advertisers on the online 3rd party data landscape. I would be happy to talk to your teams about what kinds of third party data is coming on-line, why they should care, and how/when we expect to be able to use the data. There are very interesting capabilities being developed. Please contact me at krona@xplusone.com if you would like to know more.
Posted by
Unknown
at
4:14 PM
0
comments
Labels: ad exchanger, data sourcing, demographic data
Friday, December 4, 2009
Why Demos
I wrote a piece for Ad Exchanger and why internet marketers want demographic data. I'll post the text in the next couple of days.
Posted by
Unknown
at
6:36 AM
0
comments
Labels: demographic data
Friday, October 23, 2009
Where is the US household file?
Conducting an offline direct marketing campaign is relatively easy. You can call any one of a number of data providers to get a us household file (that is, demographics on 115 MM US households), run a test campaign, figure out the profile of who responded to the campaign, and you are off to the races. The data exists. You just have to crank up your favorite LOGIT tool and you are in business. In the online space, not so much.
Posted by
Unknown
at
12:02 PM
0
comments
Labels: data sourcing
Monday, October 12, 2009
Got some press
My new company put together a press release. Check it out here. Too funny.
Posted by
Unknown
at
6:57 AM
0
comments
Labels: self promotion
Wednesday, September 30, 2009
How to start a new job
Posted by
Unknown
at
2:47 AM
4
comments
Labels: new job
Tuesday, June 9, 2009
Brute force vs. smarts
My dissertation was, in part, about how to encourage people, when solving problems, to think about the information they already have available to them and to not just gather information for its own sake. I found that you can save a lot of money if you charge just a token amount for each new piece of information. When you charge for information, people think more deeply about the information in their possession and stop asking for information that they don't really need to solve the problem. I had a real world brush with this phenomena the other day.
Posted by
Unknown
at
1:28 PM
0
comments
Labels: data simplification
Monday, June 8, 2009
Thinking about BI recently
It occurred to me that using a BI tool is a hard way to gain insight. You are limited by your own imagination. I like hypothesis driven analysis, but I think you can do a much better job in providing insight if you understand a bit of econometrics (Logit and OLS). You can simply run a stepwise regression on the variable you are trying to understand (say Life Time Value of a customer) and see what pops out (say the interaction of age and education). Once you see what variables pop, you can then use BI tools to illustrate the point. Critical piece, make sure you run all of the interactions. That is where the cool stuff lies.
Posted by
Unknown
at
2:46 PM
0
comments
Labels: business intelligence, regression
Data Simplifiction
When I was trained, I was told that you should never take continuous level data and make it categorical. One of the guiding principles of regression analysis is that variance is good; Never reduce it by simplifying your data and creating categories. Maybe an example is in order:
Posted by
Unknown
at
2:29 PM
0
comments
Labels: data simplification
Monday, October 13, 2008
On data quality
I believe that the single easiest and more impactful thing you can do to improve your analytics is to check your data quality. Sometimes, however, ensuring quality data can have a direct impact on improving your business results. Case in point, I have a client whose business depends on the accuracy of personal information given to them by their customers. However, they did no address verification on the data; they took whatever information that was provided with out checking its accuracy. We just did a check on the quality of their data. Turns out over 20% of the people do not give accurate personal information. This is an easy fix: Put in an address verification system to make sure that at least the address they get are valid. If someone is going to make up an address, at least force them to give you a valid address. This can only make the problem better.
Posted by
Unknown
at
8:36 AM
0
comments
Best Electoral Prediction Site
One of the things that drives me nuts during campaign season is the reporting of national polls. There is a little thing called the Electoral College, CNN. Ever heard of it? You need to look at the state level polling. Problem is, the state level polls are often conflicting. The margin of errors can be large or the results not reliable due to bias in the polling methodology. What is a political junkie to do. Go to Five Thirty Eight. This is the best site I have seen on predicting the outcome of the election. In fact, if you had asked me how to predict the election, I would have suggested something like this. Note that they don't say who is going to win, but the probability of a win by either candidate.
Posted by
Unknown
at
7:49 AM
1 comments
Thursday, August 28, 2008
Hammer on Analytics
I used to be a sound engineer and one of my clients was MC Hammer. In fact, at one show in LA, I told him that he was spending too much on his entourage and he was going to go broke. Well, here we are, 20 years later, and we are both commenting on analytics.
Posted by
Unknown
at
11:08 AM
0
comments
Labels: hammer
Thursday, June 12, 2008
Made the papers
Here is a very brief writeup of my business intelligence talk at the AICPA. For those who attended, thanks for the warm welcome. I enjoyed getting in front of the CPA crowd.
Posted by
Unknown
at
7:38 AM
0
comments
Tuesday, May 27, 2008
How to block ads
I got tired of looking at all the internet advertising and just install some ad-blocking software. Link to the article that inspired me, here. The plugins are for Firefox. Updating the hosts file was simple. I just searched for the file name "hosts" and then added the big list of advertisers to both files that were found. No more ads, but Yahoo looks weird.
Posted by
Unknown
at
7:29 AM
0
comments
Labels: advertising, Tips
Thursday, May 15, 2008
Add more data? No, just more understanding
A couple of months ago, Anand Rajaraman, a professor at Stanford who teaches a class on data mining, wrote a blog post talking about a class assignment; students have to do a non-trivial data mining exercise. A bunch of his students decided to go after the Netflix prize, a contest, run by Netflix, to see if anyone could improve their movie suggestion algorithm by greater than 10%. I love these kinds of contests. One team in Anand's class added data from the Internet Movie Database to the Netflix supplied dataset. Another team did not add data, instead, they spent time on optimizing the recommendation algorithm . Turns out that the team that added data did much better on movie recommendations. So good, in fact, that they made the leaderboard. So what should we take away from this?
Anand suggests "adding more, independent data usually beats out designing ever-better algorithms to analyze an existing data set." He is right, but glosses over a critical word "independent." That is, the data being added has to not be correlated with the existing data. It needs to represent something new.
My take: The team that added the data were smart and operationalized descriptions of the movies better than the Netflix data. They found a dataset that added a missing theoretical construct, good descriptions of movies, and that made the difference. Just adding a bunch of data is not the takeaway here. (At my previous employer, we had over 8000 variables at the household level (we did a lot of transformation) that we could use to predict if someone was going to take an offer. In a typical model, we used less than 20 variaibles. Of the 20 models we had in production, we used less than 200 of the 8000 variables.)
So what is the secret sauce to model improvement? Adding data that operationalized a construct previously in the error term. In English: The team thought about what factors (what I was calling theoretical constructs) could possibly be used in a recommendation system and went to find data that could be a proxy for those factors. You should be willing to add (read:pay) for more data, but only if it measures something where you don't have an effective proxy.
Posted by
Unknown
at
7:50 AM
0
comments
Labels: data exploration, data mining
Wednesday, May 14, 2008
Winds of Change video
Have you ever seen the Kodak "Winds of Change" video? It has nothing to do with analytics, but I love the way that they confront the brand perception of Kodak as no longer relevant and show that they understand the issue and are working to become relevant again. I heard the CMO speak and he said that he almost got fired over this video. It was an internal piece that got out. Turned out that it was a viral hit and did wonders for the re-branding.
Posted by
Unknown
at
7:54 AM
0
comments
Monday, May 12, 2008
Tips for implementing a BI project
I am speaking at AICPA in Las Vegas on Business Intelligence. My talk is supposed to be a "lessons learned" kind of case study on using BI. I developed 11 tips when rolling out a BI solution. Some of these may look very familiar:
Tip 1: When deciding what to show in your BI tools, use a balanced score card approach.
Balanced scorecards provide a nice framework for thinking about developing useful metrics
Tip 2 : Select right hardware.
We needed a “Data Appliance” like Netezza. Feel free to overbuy. Your future self will thank you.
Tip 3: Take your time building requirements.
Figure out who is going to use the data and for what. What are needs going to be a year from now? Three years from now?
Tip 4: Conduct a data quality audit.
Check for missing data, unusual variability, unusual stability
Tip 5: Make your data warehouse responsible for fixing data quality problems.
Don’t try to build in work-arounds. You will have bought yourself a bunch of maintenance headaches. Let the guys who are supposed to give you clean data do their jobs.
Tip 6: Provide some context for each metric.
Show trends, goals, and/or min-max for each metric. This will allow the exec to decide if some metric is worth further attention.
Tip 7: Enable drill down on your charts (but don’t overdo it).
When an exec sees something “anomalous” they are going to want to see if they can figure out what is going on. Computer users are trained to clicking on things they are curious about. Leverage this behavior.
Tip 8: Avoid being “flashy”and cool.
Keep your charts simple and redundant. Allow your audience to learn how to consume your BI quickly, not be impressed with your teams programming skills.
Tip 9: Conduct 1-on-1 sessions with senior execs to ensure that they found the BI tools useful and informative. Adoption of these things is much harder than technical implementation. Do anything you can do to drive adoption
Tip 10: Choose a software package for ease of integration.
Time spent integrating is not worth the loss of strategic use of the data. Remember, the time you take to get things working right has a cost to the business.
All of the major BI vendors have very similar functionality and differences are not likely to have any impact on business decision making
Tip 11: Be ruthlessly simple about what you metrics you show. Complexity is your enemy.
Strive for few, but very meaningful metrics. Too often, you are going to want to create complex reports. Fight the urge. They will never be looked at. In this context, I will always sacrifice completeness for simplicity.
Posted by
Unknown
at
1:19 PM
2
comments
Tuesday, April 15, 2008
At Ad-Tech
I don't think I ever posted that I took a new job. I am the Senior Vice President of Internet Products for IXI Corp. IXI is a financial data consortium that collects personal and business asset data from financial service providers, cleans it up, and provides it back to member firms for use in marketing, resourcing, and strategic decision making.
And on that note, I am in San Fran boning up on the latest in Web-based advertising. There are a number of uses of IXI's credit data for ad targeting and fraud prevention.
Posted by
Unknown
at
9:07 AM
0
comments
Labels: conferences
Wednesday, March 5, 2008
Free data!
I have a longer post about selecting data for modeling, but for now, just know that the UN has put it's data statistical data online. Perhaps the nicest feature is that the site will search across all of their published datasets. I like adding macro level data into modeling and customer insight projects and the UN is a good source.
Posted by
Unknown
at
6:18 AM
0
comments