Luke Doolittle Amod Ghangurde Overview Problem statement It takes far too long to choose a movie to watch Even when you do choose how do you know youll like it Movie Match hasnt been done successfully crossplatform ID: 801489
Download The PPT/PDF document "Movie Match Cameron Bell" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Movie Match
Cameron BellLuke DoolittleAmod Ghangurde
Slide2Overview
Problem statementIt takes far too long to choose a movie to watchEven when you do choose, how do you know you’ll like it?Movie Match hasn't been done successfully cross-platform
(s)
Slide3Solo and Group recommender
Slide4What we’ve done
Slide5Dataset
Update (Near real time)Updates made by users in the front end web application
Dataset
Initial Load
Update
SourceGroupLens
Netflix prize
Ratings
24 million
100 million
Movies and Users
40,000 movies by 260,000 users
17,000 movies by 480,000 users
Timeframe
Jan, 1995 and Oct, 2016
Oct, 1998 and Dec, 2005
Size
~1Gb
~2Gb
Format
.csv files
text files
Slide6Dataset
PreprocessingUsed Movielens dataset as baseExact title and year matchAfter merging -> ~70 million Netflix ratings
Slide7Architecture
Hardware OptionsPhysical systemsVirtual environment (EC2, Azure, etc)Data Storage OptionsSparkGraph Database (Neo4j)
Machine Learning Options
Collaborative Filtering
Nearest Neighbors
Slide8Architecture (Proposed)
Slide9Architecture (Prototype)
Slide10Requirements (for the given datasets)
Storage> 10 GB storageMemorywithin Spark requires > 8 GB RAM
CPUwithin Spark requires > 2 virtual cores
Slide11Website
Slide12Population view - Most often rated movies and average rating
Slide13Population view - Count of ratings and Genre distribution
Slide14Distribution of my ratings (user id = 186590)
Slide15My unusual likes and dislikes
Slide16Rare movies rated by me
Slide17Data Load Performance
Real-time updates should be possible
Slide18Machine Learning Performance
Hyperparameter TuningGrid search on small datasetHyperparameters: rank, iterations, lambda
Slide19Machine Learning Performance