Flexible recommendations: a context-aware API
One API serving every surface — PDP, cart, wishlist, search, home. Context-aware delivery from a composable model library.
- 3×recommendation impression share
- Single APIevery surface
- 3 cache tierspage · slot · interaction
Recommendations appeared everywhere — product description pages, cart, wishlist, add-to-cart drawer, home, search — and on every platform (desktop, mobile web, iOS, Android). Each surface had a bespoke implementation. Each new slot was an engineering project. Cross-team coordination was the bottleneck for every experiment.
The underlying need was simple: serve the *right* recommendation for the *current context* — the page the customer is on, what they're looking at, their identity, their platform. The existing architecture made that easy only on the surfaces that had already been built; everywhere else, product asked engineering for months of work just to try something.
Reframe recommendations as a platform. One API that any surface can call with a context payload; the API selects the right model and returns the right recommendation. Product configures slots; engineering owns the platform.
- Public API with two primary calls: get_layout (what slots does this page have?) and get_recommendations (fill this slot for this context). The context payload carries everything the API needs to choose: product identifier, brand, category, customer identifier, platform, and location on the page.
- Split the API stack into a Backend-for-Frontend (BFF) that talks to the front-end, and a Gateway Service that runs the exclude / rank / merge logic across multiple sub-models. Both sit behind a single public surface so callers don't care about the internals.
- Composable sub-models: cart-based co-occurrence, customer-based co-occurrence, implicit-feedback collaborative filtering (after Hu, Koren and Volinsky 2008), the same with a custom feedback loop layered on top, trending, and most-viewed. Adding a new model means registering a model key, not rewriting a service. Slot configuration is decoupled from recommendation logic.
- Three explicit cache tiers — page-load (highly cached, layout per page), per-slot (moderately cached, model output by context), per-interaction (not cached, click and dismiss tracking). Each tier sized for its consumer.
- Exclusion logic centralised: Redis / Memcached track items the customer has already seen, per model key, per customer. Recommendations can constrain or share exclusions across models for a given customer.
Many surfaces · one API · composable sub-models. Three cache tiers.
3× product-impression share for recommended items across the surfaces that adopted the platform. Critically, new contexts (cart, brand pages, add-to-cart drawer) shipped without bespoke engineering for each one — product configured slots; the API delivered.
Experimentation cadence shifted: A/B tests that previously needed an engineering quarter could be set up with a model-key change and a slot config. The Gateway's exclude / rank / merge layer made interaction tracking and feedback signals first-class — recommendation quality became iterable, not bespoke.
The temptation was to build the second slot the same way as the first and worry about the platform later. We built the platform first and shipped slots into it. Net effect: every slot after the second one was effectively free.