Over the past few months I've been spending a fair amount of time working on personalization, leveraging one of my new favorite AWS services - Amazon Personalize. Needless to say there is much more that goes into building and launching a personalization system than just turning on a few services and feeding in some data. In this article I'll focus on what it takes to launch a new personalization strategy, and architect it to evolve over time.
In many cases we have a classic chicken-or-the-egg scenario - we feel uncomfortable launching personalization features without confirmed performance and perhaps limited data, but without launching we won't have the feedback loop and data to measure and improve performance?
In some cases this is driven by the maturity and level of adoption of the application. If we don’t include ML recommendations, we are unprepared for growth in users, but without data the ML recommendations alone may not produce relevant enough results to engage users. In other cases we may have a mature product and user base, but still dealing with considerable unknowns. In both cases we need to be able to experiment, measure, and adapt quickly.
Let's consider the special case where we are dealing with a relatively new application, in the early cycles of adoption. In these scenarios our ML algorithms might not be producing high relevancy scores, and simple logic and/or a healthy dose of manual curation may perform better. Note that "performing better" may be more subjective than statistical or metric driven in very early stages.
However, there are several good reason to start introducing these algorithms early
- testing the infrastructure - work out any functional or non-functional issues early on
- supplementing simple logic - use ML recommendations to add variety, and reduce the chance of recommendation depletion from simple hardcoded algorithms.
- being ready for scale - the tipping point when ML recommendation will need to take over is somewhat unpredictable.
- accelerating training - gathering implicit feedback from recommendation algorithms will help the models train faster
My recommended method for integrating ML recommendations in through “candidate sourcing”. This method will require a service layer to be built on top of the recommendation infrastructure to combine and component algorithms. These components could be a Personalize User-Personalization endpoint recipe, perhaps a vector search against some text embeddings, and even a simplistic recommender based on manually curated entries and if-then else.
When a user is to be served recommendations, the underlying service will draw “candidates” for recommendations from the appropriate components , and then combine them through configured percentages/weights.
For a simple example let’s consider the "for you" page or "feed" scenario (something we are all familiar with). In this case the API will make requests of the 3 example component services previously mentioned. The results from each service will then be combined by configurable weights. Let's assume 50% user-personalization, 25% vector search, and the remaining 25% from the simplistic recommender to render their feed.
Ideally we should be able to easily add a 4rd algorithm for candidate sourcing, with a configurable percentage (perhaps we include 10% popular items).
We should be able to tune these percentage, preferably without deployment, and also run experiments (group A is 50/25/25, group B is 75/15/10). There are many things we could use to measure the performance of these algorithms, but most simply we could measure the click thru rates by groups, as well as the component services to guide tuning and further experimentation. Obviously, with a bit of work, we could fully automate the deployment scenarios on top of these basic principles (blue green, etc).