AI Recommendation Engine for Global E-Commerce Marketplace

What did ShopStream Global need from this AI project?

ShopStream needed a recommendation engine that could surface the right product to the right shopper across 12 million SKUs and 8 million monthly active users, replacing a stack that was leaking revenue at every touchpoint. The existing system — collaborative filtering plus hand-curated merchandising rules — was producing low click-through rates, almost no cross-category discovery, and a measurable engagement gap against competitors who had moved to modern personalization.

ShopStream's catalog and traffic profile broke the simple approaches. Cold-start was constant: thousands of new SKUs every week with no interaction history. Sessions were short and intent-driven; a shopper who came in for a laptop did not want yesterday's clicks recommended back to them. Cross-category lift — selling someone a phone case after they bought a phone — was where the real margin was, and the legacy system could not see across categories at all. Off-the-shelf recommendation SaaS struggled with the catalog scale and could not be tuned against the business rules (margin, inventory health, promotional priority) that the merchandising team needed to keep on every surface.

How did Clearframe Labs approach the build?

Phase 1: Unified customer data platform

We built a real-time feature store on top of a streaming pipeline that ingests browsing events, purchase history, search queries, product interactions, and contextual signals (device, time of day, location, referrer) into a single, queryable substrate. The platform processes 50M+ events daily, and the same feature store serves both offline training and online inference — so the models see the same view of the world they were trained against.

Phase 2: Multi-model recommendation architecture

No single technique covers every recommendation surface, so we built four specialized models behind a unified ranking layer. Deep collaborative filtering — neural matrix factorization on user/product interactions — captures long-run affinity. Content-based models generate product embeddings from images (CLIP-style vision encoders) and text descriptions (sentence transformers), which solves cold-start by letting brand-new SKUs inherit the position of similar items in embedding space. Transformer-based session models read the current browsing trajectory and predict next-click intent in real time. Contextual bandits balance exploration of new inventory with exploitation of known winners, which is what keeps the catalog from collapsing into a popularity loop.

Phase 3: Ranking layer and business rule integration

A final ranking layer combines model scores with business signals — margin, inventory level, promotional priority, supplier diversity rules — into a single score per (user, item, surface) tuple. Merchandisers configure the trade-offs through a UI, not a code change, so the team can tune for revenue, sell-through, or new-brand exposure without engineering involvement.

Phase 4: Multi-surface deployment and experimentation

Recommendations ship across every shopper touchpoint: homepage personalization, "customers also viewed" and "complete the look" on PDPs, search re-ranking, cart-page cross-sell, email and push, and post-purchase follow-up. We built an A/B testing platform with automated statistical analysis, guardrail metrics (so a CTR win that tanks AOV gets caught), and progressive rollout — every new model variant runs in shadow first, then in a holdout, then full traffic.

What were the results?

The new engine moved the metrics that matter on a marketplace at scale, with the strongest gains in cross-category discovery — exactly the surface the legacy system could not see.

Revenue increase from recommendations: 22%
Click-through rate improvement: 3.4x versus the legacy collaborative filter
Average order value increase: 17%
Customer return rate improvement: 28%

Users exposed to the new system explored 40% more product categories, which is how the AOV and return-rate gains compound: a shopper who finds adjacency once is materially more likely to come back.

What technical decisions made this work?

Hybrid multi-model over a single recommender: collaborative filtering for affinity, content embeddings for cold-start, session transformers for intent, contextual bandits for exploration — each model carries the surface it is best at, and the ranking layer fuses them. A single end-to-end model would have been simpler to ship and worse at every individual surface.
Image and text embeddings as the cold-start solution: every new SKU gets a position in embedding space the moment it is listed, so it is recommendable on day one. This eliminated the multi-week cold-start window that the legacy system imposed on new inventory.
Business rules as a configurable ranking layer, not hard-coded: merchandisers tune margin, inventory, and promotion weights through a UI. The model serves the recommendations; the business decides the trade-offs. That separation is what made the system survive promotional seasons without hot patches.
One feature store for training and serving: training features and serving features are defined once. Recommendation systems quietly degrade when the two drift; we removed the failure mode by construction.
Guardrails on every experiment: every A/B test ships with revenue, AOV, and return-rate guardrails. A CTR win that tanks AOV is auto-rolled-back, which keeps the team from optimizing one metric at the cost of the business.

Lessons for teams considering similar projects

Hybrid recommendation architectures beat any single technique on real catalogs. The cost is integration complexity; the benefit is that each surface is served by the model that fits it.
Cold-start is solved by content embeddings, not by waiting for interaction data. Stand up vision and text encoders early; everything downstream gets cheaper.
A robust experimentation platform is not optional. Recommendation quality is a directional bet at every release, and only an A/B platform with guardrails tells you whether you actually shipped progress.
Treat business rules as configuration, not code. Merchandisers will need to retune weekly, and engineering should not be in the loop for that.
Recommendation gains compound. A 22% revenue lift sounds incremental until you watch the return-rate and AOV numbers move alongside it — the same shopper coming back more often is the strongest signal in the system.

What's next

ShopStream is extending the same architecture into post-purchase reordering, supplier discovery, and personalized marketing, using the unified feature store as the data layer for every personalization surface across the marketplace.