/
Class Competition: Netflix data Class Competition: Netflix data

Class Competition: Netflix data - PowerPoint Presentation

calandra-battersby
calandra-battersby . @calandra-battersby
Follow
412 views
Uploaded On 2016-03-07

Class Competition: Netflix data - PPT Presentation

Statistical Learning Course Prof Saharon Rosset January 2015 Keren Levinstein Hallak Overall Linear regression with Ridge regularization Main steps Matrix completion Dates insight ID: 246019

movies parameters data completion parameters movies completion data matrix dates rated number users insight day additional congeniality training rates

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Class Competition: Netflix data" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Class Competition: Netflix data

Statistical Learning Course, Prof.

Saharon

Rosset

, January 2015

Keren

Levinstein

HallakSlide2

Overall:

Linear regression with Ridge regularization

Main steps:

Matrix completion

Dates insight

Small steps:

Additional parametersSlide3

Matrix completion

Used

M

atlab

code available online for matrix completion via soft

thresholding. Solves:min nuclear-norm(X) subject to Nuclear norm:Data completion was performed for training and testing data togetherRMSE = 0.766796 (with some additional parameters)Slide4

Dates insightSlide5
Slide6

Dates Insight

Users rate a lot of movies on the same day

~93% of the users rated other movies on the day they rated Miss Congeniality both in the training and in the testing set

For each user, the mean, median, variance and number of movie rates given at the day Miss Congeniality was rated are useful parametersSlide7

Additional parameters:

Considering

only the ‘true’ rates and not the ones given by matrix

completion:

Variance,

skewness and quartiles for each userNumber of zeros (unwatched movies)Percentage of [1,2,3,4,5] ratings out of the number of watched moviesMiss Congeniality dates85 indicator parameters indicating missing values for movies 15:99 (the first 14 movies were rated by all users) Slide8

Some points to ponder

One should be very careful evaluating the RMSE when the data is divided into subgroups and a different model is built for each subgroup

The preprocess phase (choosing parameters, dealing with missing values) seems to be the most important one

Good to know:

Weka

- a free Data Mining Software in JavaSlide9

Thank you for listening!