Learn how to build a recommendation engine using collaborative filtering and content-based methods. Includes concepts, real-world examples, and Python code.
From Netflix suggesting your next binge to Amazon nudging your next purchase, recommendation engines are everywhere. These intelligent systems filter and predict items a user may like based on their behavior, preferences, or other users’ patterns.
In this blog, we’ll break down the core types of recommendation systems, key algorithms, and show Python code examples to help you build a basic engine from scratch.
📚 What Is a Recommendation Engine?
A recommendation engine (or recommender system) is an algorithm that suggests relevant items to users. Examples include:
- Products (Amazon)
- Movies (Netflix, Hulu)
- Music (Spotify)
- Content/News (YouTube, LinkedIn)
⚙️ Types of Recommendation Systems
1. Content-Based Filtering
- Recommends items similar to those a user liked in the past
- Based on item features (e.g., genre, category, tags)
- Independent of other users
✅ Pros: Personalized
❌ Cons: Cold start (new items), narrow scope
2. Collaborative Filtering
- Recommends items based on similar users or behaviors
- Ignores item metadata; focuses on user-item interaction
Two types:
- User-based: “Users like you also liked…”
- Item-based: “Items similar to this are…”
✅ Pros: Learns hidden patterns
❌ Cons: Needs enough data (sparse matrix problem)
3. Hybrid Models
- Combines content-based + collaborative filtering
- Can include additional data like demographics, location, ratings
💻 Basic Collaborative Filtering in Python (with Surprise)
Let’s implement a simple collaborative filtering model using the Surprise library.
📦 Step 1: Install the library
pip install scikit-surprise
📄 Step 2: Load a Dataset (MovieLens 100k)
from surprise import Dataset, Reader
from surprise.model_selection import train_test_split
from surprise import SVD
from surprise import accuracy
# Load dataset
data = Dataset.load_builtin('ml-100k')
trainset, testset = train_test_split(data, test_size=0.2)
# Build the model
model = SVD()
model.fit(trainset)
# Predict and evaluate
predictions = model.test(testset)
print("RMSE:", accuracy.rmse(predictions))
🔍 Step 3: Make a Prediction
# Predict rating for user 196 and item 302
pred = model.predict(uid='196', iid='302')
print(f"Predicted rating: {pred.est}")
📊 Content-Based Filtering Example (TF-IDF + Cosine Similarity)
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Sample product data
df = pd.DataFrame({
'product_id': [1, 2, 3],
'description': ['wireless mouse', 'wired gaming mouse', 'ergonomic wireless keyboard']
})
# TF-IDF encoding
tfidf = TfidfVectorizer()
tfidf_matrix = tfidf.fit_transform(df['description'])
# Cosine similarity
cos_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
# Recommend items similar to product 0
similar_scores = list(enumerate(cos_sim[0]))
sorted_items = sorted(similar_scores, key=lambda x: x[1], reverse=True)
recommended = [df['product_id'][i[0]] for i in sorted_items[1:]]
print("Recommended Products:", recommended)
🧠 Advanced Techniques
- Matrix Factorization (SVD, NMF)
- Deep Learning Models (Autoencoders, Neural CF)
- Session-Based Models (RNNs for time-aware recommendations)
- Reinforcement Learning (long-term user engagement)
- Contextual Recommendations (based on time, device, location)
✅ Best Practices
- Normalize ratings or scale scores
- Evaluate with RMSE, MAE, Precision@K, Recall@K
- Use implicit data (clicks, views) in addition to explicit ratings
- Handle cold start using hybrid or metadata
- Consider business constraints (diversity, freshness)
🧪 Real-World Tools & Libraries
Tool | Use Case |
---|---|
Surprise | Classic recommender algorithms |
LightFM | Hybrid models with metadata |
Implicit | Matrix factorization on implicit feedback |
RecBole | Deep learning recommenders |
TensorFlow Recommenders | Customizable DL pipelines |
🚀 Conclusion
Building a recommendation engine is a rewarding challenge. Whether you start with simple collaborative filtering or scale up to deep learning-powered systems, the goal remains the same: deliver personalized, value-driven experiences to users.
Next post will guide through the steps in building your own engine. Please don’t miss the next post.