20.9 Building a Recommendation Prototype

Design a recommendation system that suggests items to users based on past behavior.

20.9 Building a Recommendation Prototype

Problem

Design a recommendation system that suggests items to users based on past behavior. The system should take user-item interactions, estimate which unseen items a user may like, and return a ranked list of recommendations.

Recommendation systems appear in e-commerce, video platforms, music apps, news feeds, online learning systems, developer tooling, and search products. The interface is simple:

recommend items for user U

The internal problem is more subtle. The system must infer preference from incomplete and noisy signals. A click may indicate interest. A purchase may indicate stronger interest. A skipped item may indicate disinterest, or simply poor timing.

A prototype should start with explicit, explainable algorithms before adding heavier machine learning infrastructure.

Solution

Use collaborative filtering. Represent users by the items they interacted with, compute similarity between users, and recommend items liked by similar users.

Example interactions:

Alice -> Graph Algorithms
Alice -> Dynamic Programming
Bob   -> Graph Algorithms
Bob   -> Shortest Paths
Cara  -> Dynamic Programming
Cara  -> Knapsack

Alice and Bob overlap on Graph Algorithms, so Bob's Shortest Paths item is a plausible recommendation for Alice.

Alice and Cara overlap on Dynamic Programming, so Cara's Knapsack item is also plausible.

Modeling Interactions

Start with a simple event table.

interactions = [
    ("alice", "graph_algorithms"),
    ("alice", "dynamic_programming"),
    ("bob", "graph_algorithms"),
    ("bob", "shortest_paths"),
    ("cara", "dynamic_programming"),
    ("cara", "knapsack"),
]

Build a user-to-items index.

from collections import defaultdict, Counter

def build_user_items(interactions):
    user_items = defaultdict(set)

    for user_id, item_id in interactions:
        user_items[user_id].add(item_id)

    return dict(user_items)

Build the reverse item-to-users index.

def build_item_users(interactions):
    item_users = defaultdict(set)

    for user_id, item_id in interactions:
        item_users[item_id].add(user_id)

    return dict(item_users)

These two indexes support most simple recommendation strategies.

User-Based Collaborative Filtering

To recommend items for a user, find users with overlapping items.

def jaccard_similarity(left, right):
    if not left and not right:
        return 0.0

    intersection = len(left & right)
    union = len(left | right)

    return intersection / union

Compute similar users:

def similar_users(target_user, user_items, limit=10):
    target_items = user_items.get(target_user, set())
    scores = []

    for other_user, items in user_items.items():
        if other_user == target_user:
            continue

        score = jaccard_similarity(target_items, items)

        if score > 0:
            scores.append((other_user, score))

    scores.sort(key=lambda pair: pair[1], reverse=True)

    return scores[:limit]

Recommend items from similar users:

def recommend_user_based(target_user, user_items, limit=10):
    target_items = user_items.get(target_user, set())
    candidates = Counter()

    for other_user, similarity in similar_users(target_user, user_items):
        for item in user_items[other_user]:
            if item in target_items:
                continue

            candidates[item] += similarity

    return candidates.most_common(limit)

Example:

user_items = build_user_items(interactions)

print(recommend_user_based("alice", user_items))

Possible output:

[('shortest_paths', 0.3333333333333333), ('knapsack', 0.3333333333333333)]

This is simple, transparent, and good enough for a first prototype.

Item-Based Collaborative Filtering

User-based filtering can be expensive when there are many users. Item-based filtering often behaves better because item relationships are more stable.

Two items are similar when many users interact with both.

Build co-occurrence counts:

def build_item_cooccurrence(user_items):
    cooccurrence = defaultdict(Counter)

    for items in user_items.values():
        item_list = list(items)

        for i, left in enumerate(item_list):
            for right in item_list[i + 1:]:
                cooccurrence[left][right] += 1
                cooccurrence[right][left] += 1

    return {
        item: Counter(related)
        for item, related in cooccurrence.items()
    }

Recommend items related to the user's existing items:

def recommend_item_based(target_user, user_items, cooccurrence, limit=10):
    seen = user_items.get(target_user, set())
    candidates = Counter()

    for item in seen:
        for related_item, count in cooccurrence.get(item, {}).items():
            if related_item in seen:
                continue

            candidates[related_item] += count

    return candidates.most_common(limit)

This tends to be fast at query time if the co-occurrence table is precomputed.

Weighting Interactions

Not all events have the same meaning.

Example weights:

Event Weight
view 1
click 2
bookmark 4
purchase 8
dislike -5

Represent weighted interactions:

weighted_interactions = [
    ("alice", "graph_algorithms", "view"),
    ("alice", "dynamic_programming", "bookmark"),
    ("bob", "graph_algorithms", "purchase"),
    ("bob", "shortest_paths", "view"),
]

Convert events to scores:

EVENT_WEIGHTS = {
    "view": 1,
    "click": 2,
    "bookmark": 4,
    "purchase": 8,
    "dislike": -5,
}

def build_user_item_scores(weighted_interactions):
    scores = defaultdict(Counter)

    for user_id, item_id, event_type in weighted_interactions:
        scores[user_id][item_id] += EVENT_WEIGHTS[event_type]

    return {
        user_id: dict(items)
        for user_id, items in scores.items()
    }

Weighted data supports richer ranking than binary interaction sets.

Cosine Similarity

For weighted vectors, cosine similarity is often more useful than Jaccard similarity.

import math

def cosine_similarity(left, right):
    shared = set(left) & set(right)

    numerator = sum(
        left[item] * right[item]
        for item in shared
    )

    left_norm = math.sqrt(
        sum(value * value for value in left.values())
    )

    right_norm = math.sqrt(
        sum(value * value for value in right.values())
    )

    if left_norm == 0 or right_norm == 0:
        return 0.0

    return numerator / (left_norm * right_norm)

This handles both overlap and strength of preference.

A user who merely viewed an item should contribute less than a user who purchased or bookmarked it.

Popularity Baseline

Every recommender should have a popularity baseline.

def popular_items(interactions, limit=10):
    counts = Counter()

    for _, item_id in interactions:
        counts[item_id] += 1

    return counts.most_common(limit)

Popularity is not personalized, but it is robust. It is especially useful for new users with no history.

def recommend_with_fallback(user_id, user_items, cooccurrence, interactions, limit=10):
    if user_id not in user_items or not user_items[user_id]:
        return popular_items(interactions, limit)

    recommendations = recommend_item_based(
        user_id,
        user_items,
        cooccurrence,
        limit,
    )

    if recommendations:
        return recommendations

    return popular_items(interactions, limit)

This handles cold start for users.

Cold Start Problems

Recommendation systems have two common cold starts.

Cold start Problem Common solution
New user No behavior history Popular items, onboarding choices, location, declared interests
New item No interaction history Content features, editorial boost, exploration traffic

A collaborative filtering system cannot recommend a new item unless someone has interacted with it. Content-based features solve this by comparing item metadata.

Example item features:

items = {
    "graph_algorithms": {"topic": "graphs", "level": "intermediate"},
    "shortest_paths": {"topic": "graphs", "level": "intermediate"},
    "knapsack": {"topic": "dynamic_programming", "level": "intermediate"},
}

A simple content-based fallback recommends items with matching topics.

def content_based_candidates(seen_items, item_features):
    seen_topics = {
        item_features[item]["topic"]
        for item in seen_items
        if item in item_features
    }

    candidates = []

    for item_id, features in item_features.items():
        if item_id in seen_items:
            continue

        if features.get("topic") in seen_topics:
            candidates.append(item_id)

    return candidates

Ranking and Filtering

A recommender usually performs candidate generation first, then filtering, then ranking.

user history
  -> candidate generation
  -> business filters
  -> scoring
  -> ranking
  -> diversity adjustment
  -> recommendations

Filters may remove:

  • Items already seen.
  • Unavailable products.
  • Region-restricted content.
  • Unsafe content.
  • Items below quality thresholds.
  • Items incompatible with user settings.

Ranking combines multiple signals:

def rank_candidate(candidate):
    return (
        0.60 * candidate["collaborative_score"]
        + 0.20 * candidate["popularity_score"]
        + 0.10 * candidate["freshness_score"]
        + 0.10 * candidate["quality_score"]
    )

The exact weights should be tuned using experiments and offline evaluation.

Diversity

Pure ranking can produce repetitive results.

Example:

shortest_paths
dijkstra
bellman_ford
floyd_warshall
a_star

These are relevant, but they may be too narrow. A useful list might include one or two graph topics, one dynamic programming topic, and one data-structure topic.

A simple diversity rule limits items per category.

def diversify(ranked_items, item_features, limit=10, per_topic=2):
    result = []
    topic_counts = Counter()

    for item_id, score in ranked_items:
        topic = item_features.get(item_id, {}).get("topic", "unknown")

        if topic_counts[topic] >= per_topic:
            continue

        result.append((item_id, score))
        topic_counts[topic] += 1

        if len(result) == limit:
            break

    return result

Diversity reduces short-term score greediness to improve the quality of the whole list.

Evaluation

Offline evaluation uses held-out interactions.

For each user:

train on earlier interactions
hide a later interaction
ask whether recommender retrieves it

Common metrics:

Metric Meaning
Precision@k Fraction of recommended items that were relevant
Recall@k Fraction of relevant items retrieved
Hit rate@k Whether at least one relevant item appears
NDCG@k Rewards relevant items near the top
Coverage Fraction of catalog that can be recommended
Diversity Variety within recommendation lists
Novelty Whether recommendations avoid obvious popular items

Offline metrics are useful, but they are only proxies. Online A/B tests are needed to measure product impact.

Complexity

For user-based collaborative filtering:

Operation Complexity
Build user-item sets O(n)
Compare one user to all users O(U × I)
Generate candidates O(S × I)

Where:

  • n is interaction count.
  • U is number of users.
  • I is average items per user.
  • S is number of similar users considered.

For item-based collaborative filtering:

Operation Complexity
Build co-occurrence O(sum of user history sizes squared)
Query recommendations O(H × R)

Where:

  • H is target user's history size.
  • R is average related items per item.

Item-based filtering shifts more work to preprocessing and usually makes online queries faster.

Testing

Start with small deterministic examples.

def test_user_based_recommendation():
    interactions = [
        ("alice", "a"),
        ("alice", "b"),
        ("bob", "a"),
        ("bob", "c"),
    ]

    user_items = build_user_items(interactions)
    recommendations = recommend_user_based("alice", user_items)

    assert recommendations[0][0] == "c"

Test that seen items are never recommended.

def test_seen_items_are_excluded():
    interactions = [
        ("alice", "a"),
        ("alice", "b"),
        ("bob", "a"),
        ("bob", "b"),
        ("bob", "c"),
    ]

    user_items = build_user_items(interactions)
    recommendations = recommend_user_based("alice", user_items)

    recommended_items = {
        item
        for item, _ in recommendations
    }

    assert "a" not in recommended_items
    assert "b" not in recommended_items

Test cold start fallback.

def test_cold_start_uses_popularity():
    interactions = [
        ("alice", "a"),
        ("bob", "a"),
        ("cara", "b"),
    ]

    user_items = build_user_items(interactions)
    cooccurrence = build_item_cooccurrence(user_items)

    recommendations = recommend_with_fallback(
        "new_user",
        user_items,
        cooccurrence,
        interactions,
    )

    assert recommendations[0][0] == "a"

These tests check the recommender contract before any scaling work begins.

Common Bugs

The most common recommender bug is recommending items the user has already consumed. Always filter seen items after candidate generation.

Another common bug is evaluating on training data. This produces inflated metrics because the system is rewarded for remembering interactions it already saw.

Popularity can overwhelm personalization. If every list returns the same top items, the recommender may look accurate offline while providing little discovery value.

Sparse data creates brittle similarities. Two users who share one rare item may appear overly similar. Use minimum overlap thresholds.

Item co-occurrence can favor globally popular items. Normalize by item frequency or combine with inverse popularity.

Cold start behavior is often forgotten. A recommender without fallback returns empty lists for new users and new items.

Recipe

Build the prototype in stages.

Start with a popularity baseline. Add user-item sets. Implement user-based collaborative filtering with Jaccard similarity. Add item-based co-occurrence for faster online queries. Add weighted interactions when event types matter. Filter seen and unavailable items. Add content-based fallback for cold-start cases. Add diversity rules. Evaluate with held-out interactions. Move to matrix factorization, embeddings, or learning-to-rank only after the simple system has a clear baseline.

The main lesson is that recommendation is a ranking pipeline, not a single algorithm. Collaborative filtering generates useful candidates, but production-quality recommendations require filtering, fallback logic, diversity, evaluation, and careful interpretation of user behavior.