/
Movie Match Cameron Bell Movie Match Cameron Bell

Movie Match Cameron Bell - PowerPoint Presentation

joyousbudweiser
joyousbudweiser . @joyousbudweiser
Follow
342 views
Uploaded On 2020-08-07

Movie Match Cameron Bell - PPT Presentation

Luke Doolittle Amod Ghangurde Overview Problem statement It takes far too long to choose a movie to watch Even when you do choose how do you know youll like it Movie Match hasnt been done successfully crossplatform ID: 801489

000 movies dataset users movies 000 users dataset ratings performance architecture machine learning movie million load real time updates

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Movie Match Cameron Bell" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Movie Match

Cameron BellLuke DoolittleAmod Ghangurde

Slide2

Overview

Problem statementIt takes far too long to choose a movie to watchEven when you do choose, how do you know you’ll like it?Movie Match hasn't been done successfully cross-platform

(s)

Slide3

Solo and Group recommender

Slide4

What we’ve done

Slide5

Dataset

Update (Near real time)Updates made by users in the front end web application

Dataset

Initial Load

Update

SourceGroupLens

Netflix prize

Ratings

24 million

100 million

Movies and Users

40,000 movies by 260,000 users

17,000 movies by 480,000 users

Timeframe

Jan, 1995 and Oct, 2016

Oct, 1998 and Dec, 2005

Size

~1Gb

~2Gb

Format

.csv files

text files

Slide6

Dataset

PreprocessingUsed Movielens dataset as baseExact title and year matchAfter merging -> ~70 million Netflix ratings

Slide7

Architecture

Hardware OptionsPhysical systemsVirtual environment (EC2, Azure, etc)Data Storage OptionsSparkGraph Database (Neo4j)

Machine Learning Options

Collaborative Filtering

Nearest Neighbors

Slide8

Architecture (Proposed)

Slide9

Architecture (Prototype)

Slide10

Requirements (for the given datasets)

Storage> 10 GB storageMemorywithin Spark requires > 8 GB RAM

CPUwithin Spark requires > 2 virtual cores

Slide11

Website

Slide12

Population view - Most often rated movies and average rating

Slide13

Population view - Count of ratings and Genre distribution

Slide14

Distribution of my ratings (user id = 186590)

Slide15

My unusual likes and dislikes

Slide16

Rare movies rated by me

Slide17

Data Load Performance

Real-time updates should be possible

Slide18

Machine Learning Performance

Hyperparameter TuningGrid search on small datasetHyperparameters: rank, iterations, lambda

Slide19

Machine Learning Performance