This project focuses on designing a scalable content discovery engine that suggests personalized movie recommendations using graph-based relationships and semantic similarity scoring. The system connects movies through shared features such as genres, tags, and descriptions, and ranks potential recommendations through probabilistic traversal and similarity metrics. The engine models user inputs as nodes within a bipartite graph, linking movies and genres, and performs random walks to explore related nodes—mirroring techniques used in large-scale systems like Pinterest’s Pixie engine. The resulting recommendations aim to balance personalization and diversity.
- Dynamic Graph Construction Built a bipartite graph using Python dictionaries and hash maps to efficiently connect movies and genres.
- Random Walk Traversal Implemented a probabilistic traversal algorithm that iteratively visits connected nodes, emphasizing relevance through repeated visits.
- Similarity Scoring Combined Jaccard and Cosine Similarity to evaluate content overlap and semantic distance between recommended items.
- Filtering and Optimization Introduced subgraph selection to restrict walks to relevant clusters, improving precision and reducing noise.
Python
Pandas
sklearn to process string information and draw similarity using cosine similarity index
nltk: Natural Langualge Tool Kit to pre-process string before applying functions from sklearn
Random Walk Algorithm
Bipartite Graph
Jaccard Similarity Index
Cosine Similarity Index
Built a prototype recommendation engine that generates high-quality, diverse movie suggestions. Improved relevance scores by filtering the traversal graph using similarity thresholds. Enhanced understanding of graph-based personalization strategies inspired by large-scale systems like Pinterest’s Pixie.