One Year of Data Science

I was doing it wrong. I tried to write article 1 as a really good warm-up into data analysis. I had some code samples, free easy data sets and everything. It was really boring though, and the article didn't write itself - so I hereby skip that article and move straight on to article 2 realizing full well that I had been fore-warned to "always work on things you love".

So something interesting that I've been reading on lately and is becoming a new trend in the software lifecycle of web services, is AB Testing. AB Testing is all about deployment by data, where your software goes through a type of evolutionary process, at the end of which the fittest version of your software wins. It's the data that allows you to decide what the fitter software is, and to expunge the less suitable. There is heavy overhead in setting up an environment around your product which makes AB testing possible, and its for this reason that very few companies do it (and most of the ones that do have large user bases).

Before we look into AB testing any further, we should understand the context of where it fits in the software lifecycle.

Before there was AB testing, there was user feedback pages. You'd write some software, send it out into the world or publish it to your web service, and users would go to a special page if they felt one way or another, and provide feedback. If you had a user base of a few hundred or in the low thousands, this type of system worked, but it had several problems:

Doing feedback regularly irritates the very people you're trying to bring joy to by providing the software in the first place.
Open-ended "comments" boxes didn't (and don't) scale. Who is going to spend their morning reading through the hundreds or thousands of comments from the day before? Most of them are likely to be "You suck" or "BUY CHEAP PILLS AT medco.cn". Unless you have only a small number of users, the value from this kind of open ended feedback is minimal. (And hey, if you only have a clutch of customers, just call them up and ask them what they think!)
Closed answered surveys are only any good at finding out if something is bad, not necessarily what is bad or what alternative might be better. Surveys are also more time consuming and people are more likely to ignore it.
Most feedback is negative. If people like what they have, they just use it, meaning that the only people who are likely to comment are those who really don't like it, and even then they'll probably only give you feedback if there's no alternative software they can use thats better.

This last point is interesting, because its not related to scale, but rather is endemic to any software feedback. If we consider the old addage that "There's thousands of ways to be wrong, and only few ways to be right", then negative feedback isn't very useful as it doesn't help you uncover what the user wants, only what they don't want. Negative feedback isn't very constructive.

Extending on the problem of negative feedback and the points above, we see that what you will get is a small amount of feedback, that will mostly be extremely negative. So when you get a single user feedback saying the big red button is absolutely heinous and destroys their will to live, and no other feedback says anything about it (or even that there may not be any other feedback), do you take it seriously? Is this user right and no-one else can be bothered to say anything, or are they just having a bad day and decided to take it out on your software interface? If you change the big red button, will it piss off other users? (And indeed, with the negative feedback, what do you change it to?).

Clearly, this sucks. But its easy to do.

Enter the realm of AB Testing. The concept is similar in nature to conducting medical trials, where one set of patients gets a medicine, and the other gets placebos. In software, you show several different users (or groups of users) several different styles of your software, and see which version is liked the most. Some users don't see any change at all, this is the control group. At a high level, there are three elements to this:

You have to actually make several different styles or versions of your software.
You have to have some way to present it to user groups.
You have to have some way to determine if they like it.

It would be easy to think the first one is all about man hours, but thats not really true. The key to making many changes easily to the software (usually the interface) is to have a good architecture that separates things you want to change from other core functionality. This allows you to easily localize the change.

The second bit is where the overhead starts to creep in. You need some way of dividing the user base up so some people are diverted to the trial product and others to the normal product. An easy way is to email users and give them the option to either install a new set of binaries, or go to a "-trial" URL instead of the normal one. But this is hit-and-miss, and as we'll see later can lead to data skews. What you really want is an automated process by which you can select a set of users, and force them to experience the new product. The framework and process development for this is considerable.

The last step is actually a two part item. Firstly, you need to figure out what they do, and then you need to figure out if what they're doing means they like your product (or new feature) more or less. So you instrument your product so that events are triggered by their actions. These events can be used to answer questions like "did they click save or press ctrl-s?" or "how long did that spend on that screen? (how long did it take them to find what they were looking for)". With enough, and the right kind of instrumentation you can find out how your users actually use your product without invading their privacy (since how they use it is not the same as what they use it for, or what content they produce with it). Keep in mind that you need to instrement both the control product and the trial product. (There are actually a lot of other reasons why you should instrument your product, more on this later).

This instrumentation sends what it collects back to a central source. Aha! Data! The final step is now clear, to figure out an algorithm to run on the data you have which will tell you quantitatively if the user likes the product. Interestingly, one of the less understood problems with AB Testing, is "Do I have enough data?". For example, if you have only a few people using the trial product, and they stop using the product when you switch to the trial, you can assume they don't like it, but you should ask the question "what is the likelihood that they were going to quit anyway even if I didn't change the product?". The secret to reducing this random behavior error, is to use a bigger user set. But if you plan on launching a dramatic change in your product, you don't want to force it in the face of any more users than is absolutely necessary, incase it really is bad and they all do stop using it!

There's no art to getting the right user size, its science, and I'll say more on it in the next article! (And remember, science works - bitches!).

-Adam

Data mining, and Data Science is an exciting emerging field in software engineering. I say exciting because it's like panning for gold, with volumes of simple data potentially hiding previously unknown facts about human behavior, weather patterns and planet health, human health and more. Take the human genome, it's just a big set of data that you can put on your hard drive with access to all of it, but we don't know which bit is important for curing which illnesses. Imagine to go panning through the sea of human genomes and pull out a gem!

So I'm Adam, and I love data.

This last month I've been challenged by some reasonably hard data problems, and some exciting new ideas. I was asked today whether or not I have a new-year's resolution, and I said 'no' as I tend to think new-year's resolutions are largely self-defeating. But when I think about it, there is something I would like to get better at this year - data. I want to get better at solving hard problems and I want to make a contribution. I want to see where whatever ideas and creativity I have will take me!

Working in software has taught me that success must be measurable, quantitative, otherwise how do you measure your progress against it? So to that end, this blog is the answer to my new years resolution. I want to learn something substantial about Data Science every two weeks, and I want to write about them here. If I can make 26 postings about interesting areas of Data Science, then I will have learned something substantial and improved myself over the year.

The source of this fresh passion over the past few months comes essentially from one guy whom I have huge respect for in the field of software engineering. He's a fantastic orator and in the past written some interesting articles on his blogger, as well as made some wonderful blunders. He's been a keynote at various O'Rielly conferences over the past few years as he's both intelligent and funny, and seems to be on a different wavelength to most. He usually presents from a new interesting perspective in a way that challenges thinking rather than delivers information (and I would think that should have a much more lasting effect on the audience!). OSCON Data in 2011 was no exception, and even though he isn't a 'Data Guy', he gave this great presentation...

So I was challenged, and then I saw this video on reddit.com a few weeks ago which talks about what Data Science is, and why it's such a rare skill.

So come with me on an exciting adventure to learn more about Data Science, Data Mining, and Data Analysis; and become prepared to solve the hard problems with big data sets and fun math!

Adam.

One Year of Data Science

Saturday, March 17, 2012

Article 2: Applications of Data - AB Testing

Tuesday, January 3, 2012

Obligatory Disclaimer

Friday, December 30, 2011

Come with me on a journey!