The Data Science of Chocolate Brownies
By Kirk Borne
One of the main benefits of working with a team is that occasionally someone in the group will bring brownies to work! (Okay, maybe this is not a frequent event, but it could happen – and it happened today with the group that I have been working with this summer at Syntasa.) One of the team came into the office after lunch with such a delightful dessert in hand (brownies!), making the rest of us wish we had one. Then, another team member (Joe) volunteered that he had brought them in today – his wife had baked them, and they are now in the kitchen. So, two of us made the across-the-hall trip to the kitchen, and surprise! – Data Science happened! “What does that mean?” you are probably asking yourself. Well, here is the meaning:
As we entered the kitchen, we noticed several little sealed containers of treats, snacks, and other edible contributions from our fellow workers. But where were the brownies? We then found a slightly opened tin foil wrap, calling for our attention. After examining the contents within the foil-wrapped packaging, we determined through an objective data science methodology that we had found the brownies.
The data science methodology began with the development of a feature vector (in our case, 5 or 6 attributes) that contained the key distinguishing characteristic features of brownies: (1) the color (they were brown); (2) the size (they were the right size – about 4 cm across); (3) the shape (a rectangular solid); (4) the texture (it certainly looked like the average of chocolate cake and chocolate fudge, though I was not really sure how to take the mean value of such non-numeric attributes); (5) the wrapping (it was clearly an informal contribution to the kitchen that arrived today, and not a commercially packaged selection); and (6) the location (it was on public display in the kitchen – though we doubted whether this attribute was a defining characteristic for our brownies, since we limited our search to the kitchen anyway, nevertheless we decided that an essential defining characteristic of Joe’s wife’s brownies was that they be found in the kitchen!).
So, with this multi-feature model, the aforementioned snack was appropriately classified as Joe’s wife’s brownies, and the rest is history – well, the brownies are now history.
The lesson here is a very common data science lesson – when searching for instances of some class of objects in a database (e.g., a new customer, versus a customer on their first return visit, versus a loyal customer; or else a winning marketing strategy versus a weak strategy; or else a customer who is likely to abandon their cart, versus one who is likely to add to their cart when offered an incentive, versus one who might be responsive to an upsell or cross-sell offer; etc.), it is imperative to identify key defining attributes that distinguish these different classes of behavior. These characteristic features may be (a) derived from first principles, then validated by a data science model; or (b) discovered through machine learning of historical data (e.g., using information gain metrics on the various attributes used in a decision tree classifier). If these characteristics can then be “personalized” as much as possible to the specific instance (e.g., not just any brownies, but Joe’s wife’s brownies), then there is greater likelihood of success (e.g., successful upsell offers, or successful customer engagement, or successful marketing campaigns targeted and personalized to each individual customer’s preferences, motivations, and context).
The ultimate goal of any data science modeling effort is a good descriptive, predictive, and/or prescriptive model of the set of objects (customers or products or events) that drive your business strategy. A descriptive analytics model provides hindsight and oversight (i.e., describe the object’s characteristics, or how has it behaved in the past or in the present). A predictive analytics model provides foresight (i.e., predict how the object will behave in the future). A prescriptive analytics model provides insight (i.e., discover the most effective trigger or offer or incentive that will optimize the object’s behavior (e.g., product sales, or customer purchases, or customer loyalty).
At the end of the day, it is a win-win for everyone – you get the sale, and the customer gets the brownie!
The lesson here is a very common data science lesson. When searching for instances of some class of objects in a database, it is imperative to identify key defining attributes that distinguish these different classes of behavior.
For more from Kirk Borne, follow him on Twitter: @kirkdborne
Blog post originally published: June 26, 2014