Automatic Content Rating via Reaction Sensing Xuan Bao Songchun Fan Romit Roy Choudhury Alexander Varshavsky Kevin A Li Rating Online Content Movies Manual rating not incentivized not easy does not reflect experience ID: 790440
Download The PPT/PDF document "Your Reactions Suggest You Liked the Mov..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Your Reactions Suggest You Liked the Movie Automatic Content Rating via Reaction SensingXuan Bao, Songchun Fan, Romit Roy Choudhury, Alexander Varshavsky, Kevin A. Li
Slide2Rating Online Content (Movies)Manual rating not incentivized, not easy … does not reflect experience
Slide3Our VisionOverall star rating
Reaction tags
Reaction-based highlights
Slide4Our VisionOverall star rating
Reaction tags
Reaction-based highlights
Automatically
Slide5Key Intuition
Multi-modal sensing / learning
Reactions / Ratings
02:43 - Action
09:21 - Hilarious
Overall – 5 Stars
……
12:01 - Suspense
Slide6Specific OpportunitiesVisualFacial expressions, eye movements, lip movements …AudioLaughter, talkingMotionDevice stabilityTouch screen activitiesFast forward, rewind, checking emails and IM chats …CloudAggregate knowledge from others’ reactionsLabeled scores from some users
Slide7Pulse: System Sketch
Slide8Applications (Beyond Movie Ratings)Annotated movie timelineSlide forward to the action scenesPlatform for ad analyticsAssess which ads grabbing attention … Customize ads based on scenes that user reacts toPersonalized replays and automatic highlightsUser reacts to specific tennis shot, TV shows personalized replayHighlights of all exciting moments in the superbowl gameOnline video courses (MOOCs)May indicate which parts of lecture needs clarification
Early disease symptom identifcationADHD among young children, and other syndromes
Slide9First Step: A Sensor Assisted Video Player
Slide10Developed on Samsung Galaxy tablet (Android)Sensor meta-data layered on video as output
Sensing threads control
Observe the user from front cam
Media player control functions monitored
Pulse Media Player
Slide11Basic Design
Features from Raw Sensor Readings
Microphone, Camera, Acc, Gyro, Touch, Clicks
Reactions: Laugh, Giggle, Doze, Still, Music …
Signals to Reactions (S2R)
Reaction to Rating & Adjective (R2RA)
English Adjectives
Numeric Rating
Tag Cloud
Final Rating
Data Distillation Process
Slide12Basic Design
Features from Raw Sensor Readings
Microphone, Camera, Acc, Gyro, Touch, Clicks
Reactions: Laugh, Giggle, Doze, Still, Music …
Signals to Reactions (S2R)
Reaction to Rating & Adjective (R2RA)
English Adjectives
Numeric Rating
Tag Cloud
Final Rating
Data Distillation Process
Cloud
Slide13Visual ReactionsFacial expressions (face size, eye size, blink, etc.)Track viewers’ face through the front cameraTrack eye position and size (challenging with spectacles)Track partial faces (via SURF points matching)Face Tracking
Eye Tracking (Green)Blink (Red)
Partial Face
Slide14Visual ReactionsFacial expressions (face size, eye size, blink, etc.)Track viewers’ face through the front cameraTrack eye position and size (challenging with spectacles)Track partial faces (via SURF points matching)Detect blinks, lip size
Look for difference between frames
Slide15Acoustic ReactionsLaughter, Conversation, Shout-outs …Cancel out (known) movie sound from recorded sound Laughter detection, conversation detectionEven with knowledge of the original movie audio (Blue), it is hard to identify user conversation (distinguish
Red and Green)
Slide16Acoustic ReactionsSeparating movie from user’s audioSpectral energy density comparison not adequateDifferent techniques for different volume regimes
High Volume
Low Volume
Slide17Acoustic ReactionsLaughter, Conversation, Shout-outs …Cancel out (known) movie sound from recorded sound Laughter detection, conversation detectionEarly results demonstrate promise of detecting acoustic reactions
Slide18Motion ReactionsReactions also leave footprint on motion dimensionsMotionless during intense sceneFidget during boredom
Intense scene
Calm scene
Time to stretch
Slide19Motion ReactionsReactions also leave footprint on motion dimensionsMotionless during intense sceneFidget during boredom
Slide20Motion ReactionsReactions also leave footprint on motion dimensionsMotionless during intense sceneFidget during boredomMotion readings correlate with changing in ratings …
Slide21Motion ReactionsMotion readings correlate with changing in ratings …Timing of motions also correlate with timing of scene changesReactions also leave footprint on motion dimensionsMotionless during intense sceneFidget during boredom
Slide22Extract Reaction Features – Player controlCollect users’ player control operationsPause, fast forward, jump, roll back, …All slider movement
Seek bar
Slide23Challenges in Learning
Slide24Problem – A Generalized Model Does Not WorkDirectly trained model does not capture the rating trendWhy?
Slide25The Reason it Does Not Work is …Human behaviors are heterogeneousUsers are differentEnvironments are different even for same user (home vs. commute)
homecommute
Sensed motion patterns very different when the same movie wateched
during a bus commute vs. in bed at home.
Slide26The Reason it Does Not Work is …Human behaviors are heterogeneousUsers are differentEnvironments are different even for same user (home vs. commute)Gyroscope readings from same user (at home and office)
Slide27The Reason it Does Not Work is …Human behaviors are heterogeneousUsers are differentEnvironments are different even for same user (home vs. commute)Gyroscope readings from same user (at home and office)Naïve solution build specific models one by oneImpossible to acquire data for all <User, Context, Movie> tuples
OfficeHome
Commute…
Slide28Challenges in LearningApproach:Bootstrap from Reaction Agreements
Slide29Approach: Bootstrap from AgreementThoughtsWhat behavior means positive/negative for a particular settingHow do we acquire data without explicitly asking the user every time
Approach: Utilize reactions that most people agree onTime
Climax
Boring
Cloud Knowledge
(Other users’ opinions)
Sensor Reading
Ratings
Slide30Approach: Bootstrap from AgreementSolution: spawn from consensusLearn user reactions during the “climax” and the “boring” moments Generalize this knowledge of positive/negative reactions
Gaussian process regression (ratings) and svm (labels)
GPR
SVM
Slide31Evaluation
Slide32User Experiment Setting11 participants watch preloaded movies (~50 movies)2 comedies, 2 dramas, 1 horror movie, 1action movieUsers provide manual ratings and labels For ground truthWe compare Pulse’s ratings with manual ratings
Slide33Pulse Truth1234510
1000204
201301
17
0
1
4
0
0
2
5
2
5
0
0
2
1
7
Preliminary Results – Final (5 Star) Rating
Slide34Difference with true 5 star manual ratingPreliminary Results – Final (5 Star) Rating
Slide35Preliminary Results – Myth behind the ErrorFinal ratings can deviate significantly from the average segment ratingsUser-given scores may not be linearly related to quality
Slide36Preliminary Results – Lower Segment Rating ErrorFinal ratings come from averaging segment ratingsOur system outperforms other methodsMean Error(5-point scale)Random ratings
CollaborativefilteringOur system
3
4
4
2
2
2
5
Per-segment ratings
Slide37Preliminary Results – Better Tag QualityTags capture users’ feelings better than SVM aloneHappy
IntenseWarmHappyIntense
Warm
Slide38Preliminary Results – Reasonable Energy OverheadReasonable energy overhead compared to without sensing
More tolerable on tablets. May need duty-cycling on smart phones
Slide39Closing ThoughtsHuman reactions are in the mindHowever, manifest into bodily gestures, activitiesRich, multi-modal sensors on moble devicesA wider net for “catching” these reactionsPulse is an attempt to realize this opportunityDistilling semantic meanings from sensor streamsRating movies … tagging any content with reaction meta dataEnabler forRecommendation enginesContent/video searchInformation retrieval, summarization
Slide40Thoughts?
Slide41Backup – potential questionsPrivacy concernLike every technology, pulse may attract early adoptorsIf only final ratings are uploaded, the privacy level is similar to current ratingsWhy not just emotion sensing/just laughter detectionEmotion sensing is a broad and challenging problem…but the goal is different than ours (rating)… Explicit signs like laughter usually only account for a small duration of movie viewing, we need to explore other opportunities (motion)Our approach takes advantage of the specific task – 1. we know the user is watching a movie 2. we can observe the user for a longer duration (than most emotion sensing work) 3. we know other users’ opinions
How is this possible…human mind is too complexHuman thoughts are complicated… but they may produce footprints in behaviorsUsing collaborative filtering explicitly uses knowledge of other users’ thoughts to bootstrap our algorithmThe sample size is small…only 11 usersThe sample size is limited, but
Each user watched multiple movies (50+ movies viewed)… segment ratings are for 1-minute segments (thousands of points)Collaborative filtering shows that even within this data set, the ratings can diverge and naïve solution does not work as well as ours
Slide42Preliminary Results – Better Retrieval AccuracyViewers care more about the highlights of a movieFind the contribution by using sensingGain
Additional error
Total goal
Overall achieved performance
Slide43Challenges in Learning
Slide44Problem – A Generalized Model Does Not WorkDirectly trained model does not capture the rating trend
Why?
Slide45The Reason it Does Not Work is …Human behaviors are heterogeneousUsers are differentEnvironments are different (e.g., home vs. commute)
homecommute
Sensed motion patterns very different when the same movie
wateched during a bus commute vs. in bed at home.
Slide46The Reason it Does Not Work is …Human behaviors are
heterogeneousUsers are differentEnvironments are different (e.g., home vs. commute)Impact of sensor readings histograms
Slide47Human behaviors are heterogeneousUsers are differentEnvironments are different (e.g., home vs. commute)Impact on sensor readings histogramsNaïve solution build specific models one by oneImpossible to acquire data for all <User, Context, Movie> tuples
OfficeHomeCommute
…The Reason it Does Not Work is …
Slide48Challenges in LearningApproach:Bootstrap from Reaction Agreements
Slide49Approach: Bootstrap from AgreementThoughtsWhat behavior means positive/negative for a particular settingHow do we acquire data without explicitly asking the user every timeApproach: Utilize reactions that most people agree on
Time
Climax
Boring
Cloud Knowledge
(Other users’ opinions)
Sensor Reading
Ratings
Slide50Approach: Bootstrap from AgreementSolution: spawn from consensusLearn user reactions during the “climax” and the “boring” moments Generalize this knowledge of positive/negative
reactions Gaussian process regression (ratings) and svm (labels)
GPR
SVM
Slide51Approach: Bootstrap from Agreement
GPRA Simple Example of GPR
Slide52Approach: Bootstrap from AgreementOn GPR and SVM - SVMSVM is a supervised learning method for classificationIdentify hyperplanes in high-dimensional space that can best separate observed samplesFor our purpose, we used non-linear SVM with RBF kernel for its wide applicability
Slide53User Experiment Setting11 participants watch preloaded movies (46 movies)Two comedy, two dramas, one horror movie, one action movieUsers give manual ratings and labelsEvaluate by comparing generated ratings with manual ratings
Slide54Evaluation
Slide55Pulse Truth1234510
100020
4201301
17
0
1
4
0
0
2
5
2
5
0
0
2
1
7
Preliminary Results –
Good Final
Rating
Slide56Preliminary Results – Myth behind the ErrorFinal ratings can deviate significantly from segment ratingUser-given scores may not be linearly related to quality
Slide57Preliminary Results – Lower Segment Rating ErrorFinal ratings come from averaging ratings for each segmentOur system outperforms other methodsMean Error
(5-point scale)Random ratingsCollaborativefilteringOur system
3
4
4
2
2
2
5
Movie segments
Slide58Preliminary Results – Better Retrieval AccuracyViewers care more about the highlights of a movieFind the contribution by using sensing
GainAdditional error
Total goal
Overall achieved performance
Slide59Preliminary Results – Better Tag QualityGenerated tags captures users’ feelings much better than using SVM alone
HappyIntenseWarmHappy
IntenseWarm
Slide60Preliminary Results – Reasonable Energy OverheadReasonable energy overhead compared to without sensing
More tolerable on tablets. May need duty-cycling on smart phones
Slide61Closing ThoughtsHuman reactions are in the mindHowever, manifest into bodily gestures, activitiesRich, multi-modal sensors on moble devicesOpportunity for “catching” these activitiesMulti-modal capability – whole is greater than sum of partsPulse is an attempt to realize this opportunityDistilling semantic meanings from sensor streamsRating movies … tagging any content with reaction meta dataEnabler forRecommendation enginesContent/video searchInformation retrieval, summarization
Slide62Questions?
Slide63Extract Reaction Features – Player controlPlayer control and taps Pause, fast forward, jump, roll back, …All slider movement
Seek bar
Slide64Approach: Bootstrap from Agreement
GPRA Simple Example of GPR
Slide65Approach: Bootstrap from AgreementOn GPR and SVM - SVMSVM is a supervised learning method for classificationIdentify hyperplanes in high-dimensional space that can best separate observed samplesFor our purpose, we used non-linear SVM with RBF kernel for its wide applicability