The Good and Evil of Time-Based Predictions
A few months back, Airbnb ran a great post about how its trust and safety data scientists build machine learning models to protect users from fraud by predicting bad actors. As the piece illustrated using Game of Thrones, a highly nuanced model is required to determine something like whether someone is “good” or “evil.” But what if people aren’t just born good or evil? What if they change over time? And wouldn’t it be great if you could not only predict whether or not they would betray you, but answer the question of when they’re likely to do so?
Applying Predictive Models to Sales & Marketing
In the predictive models our team builds for sales and marketing, the challenge of prediction over a time period is especially critical. That’s because we’re looking to uncover hidden states that can identify the precise time when someone is getting ready to make a purchase. Inspired by Airbnb, we’ll tackle another machine learning model for fantasy characters, but add a degree of difficulty that’s common in the real world of sales, where you want to know precisely when to reach out to a hot prospect. If you pretend that potential buyer is actually a citizen of Westeros, and blur the lines of “good” and “evil” in the Airbnb model, you have to consider that everyone is a potential candidate to betray you (aka buy your product) at any time.
So, how can you predict when someone is ready to make their move? Our first challenge is turning our training data — a list of behaviors or activities by different characters — into features that we can process into our models. We’ll start by associating these activities with the characters that are responsible for them.
Behavioral Scoring Approaches
One approach here might be to count the total number of activities associated with each character, and use that to train our models (this is similar to the way marketing automation systems score leads). Unfortunately, that won’t allow us to distinguish between activities that occurred a long time in the past vs. recent developments. This is particularly important when trying to predict actions that might occur in the near future.
On the other hand, we could just look at the number of activities that have occurred in the recent past. This definitely helps us keep up-to-date, and solves the problem of ancient data biasing our evaluations. But what if a character hasn’t done anything recently? We’d still like our estimate of her trustworthiness to be influenced by her past actions. And we’d also like to keep some history around, because what seemed like a one-off event in the past may turn into a significant pattern and can shape future decision making.
We can benefit from a hybrid approach here. Suppose we combine features in the model that target activities from the entire past with a set of features that target recent data? In addition, we can use a series of windows to treat activities from the recent past differently. That way, we remember what happened three weeks ago, but we don’t give it the same weight as something that happened yesterday.
It’s important to remember that the hidden state of a character can change over time. To see how this can impact our prediction target, let’s take a look at one character’s history:
Model Evaluation Considerations
You can see that in August, our model thinks that this character is about to betray us (buy the product) based on his recent pattern of activity. But despite our expectations, he served loyally for months. Of course, he did eventually betray us. Since someone’s internal state (whether they’re ready to betray) can change over time, our model needs to predict whether someone is about to betray us so we know exactly when to reach out them.
In order to know whether our model accurately reflects characters’ motives, each character should always have a score attached to them — our estimate of how trustworthy they are — and that score changes over time. This of course makes our evaluation very complex, since whether we are thinking of a character as “good” or “evil” will change over time, just like their own motives.
Another issue can occur when a score peaks for a while before leveling back off. To mitigate misleading forecasts that might cause us to temporarily mistrust a perfectly loyal character, we need to ensure that our model evaluation function looks at all the scores over time. We should penalize these mistaken scores when we retrain the model, and look at them to judge which models were better than others.
To evaluate a model, we’ll just consider the score we assigned to a character every time we scored (every day or every week), and see how well it predicts their actions in, say, the next week. If at the beginning of the week we said a character was likely to betray us, and they betrayed us on that Thursday, that’s a true positive and a victory for our model. If they didn’t betray us until next Thursday, though, we’d consider that a false positive — our model said they would betray us this week, and they didn’t. In that case we’ll also look at the score we gave them the following week.
This fictional example gives you a glimpse into how much thought and expertise should go into evaluating behavior models and coming up with the right metrics to determine the accuracy of their resulting scores. When doing machine learning over a time series, it is especially important to monitor your models and watch for drift. Keep in mind that a model could end up having multiple “false positives” associated with the same character from week to week (i.e. if it kept incorrectly predicting betrayals that didn’t happen), and this would be a clear indication that it’s time for a model refresh.
If you address all of the factors covered above, behavior scoring can be extremely useful for a wide variety of business needs. Knowing when people are going to do something (as opposed to just the open ended inevitability) is a key to predictive success.