Automatic Content Rating via Reaction Sensing Xuan Bao Songchun Fan Romit Roy Choudhury Alexander Varshavsky Kevin A Li Rating Online Content Movies Manual rating not incentivized not easy does not reflect experience ID: 646388
Download Presentation The PPT/PDF document "Your Reactions Suggest You Liked the Mov..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Your Reactions Suggest You Liked the Movie Automatic Content Rating via Reaction SensingXuan Bao, Songchun Fan, Romit Roy Choudhury, Alexander Varshavsky, Kevin A. Li Slide2
Rating Online Content (Movies)Manual rating not incentivized, not easy … does not reflect experienceSlide3
Our VisionOverall star rating
Reaction tags
Reaction-based highlightsSlide4
Our VisionOverall star rating
Reaction tags
Reaction-based highlights
AutomaticallySlide5
Key Intuition
Multi-modal sensing / learning
Reactions / Ratings
02:43 - Action
09:21 - Hilarious
Overall – 5 Stars
……
12:01 - Suspense Slide6
Specific OpportunitiesVisualFacial expressions, eye movements, lip movements …AudioLaughter, talkingMotionDevice stabilityTouch screen activitiesFast forward, rewind, checking emails and IM chats …CloudAggregate knowledge from others’ reactionsLabeled scores from some usersSlide7
Pulse: System SketchSlide8
Applications (Beyond Movie Ratings)Annotated movie timelineSlide forward to the action scenesPlatform for ad analyticsAssess which ads grabbing attention … Customize ads based on scenes that user reacts toPersonalized replays and automatic highlightsUser reacts to specific tennis shot, TV shows personalized replayHighlights of all exciting moments in the superbowl gameOnline video courses (MOOCs)May indicate which parts of lecture needs clarification
Early disease symptom identifcationADHD among young children, and other syndromesSlide9
First Step: A Sensor Assisted Video PlayerSlide10
Developed on Samsung Galaxy tablet (Android)Sensor meta-data layered on video as output
Sensing threads control
Observe the user from front cam
Media player control functions monitored
Pulse Media PlayerSlide11
Basic Design
Features from Raw Sensor Readings
Microphone, Camera, Acc, Gyro, Touch, Clicks
Reactions: Laugh, Giggle, Doze, Still, Music …
Signals to Reactions (S2R)
Reaction to Rating & Adjective (R2RA)
English Adjectives
Numeric Rating
Tag Cloud
Final Rating
Data Distillation ProcessSlide12
Basic Design
Features from Raw Sensor Readings
Microphone, Camera, Acc, Gyro, Touch, Clicks
Reactions: Laugh, Giggle, Doze, Still, Music …
Signals to Reactions (S2R)
Reaction to Rating & Adjective (R2RA)
English Adjectives
Numeric Rating
Tag Cloud
Final Rating
Data Distillation Process
CloudSlide13
Visual ReactionsFacial expressions (face size, eye size, blink, etc.)Track viewers’ face through the front cameraTrack eye position and size (challenging with spectacles)Track partial faces (via SURF points matching)Face Tracking
Eye Tracking (Green)Blink (Red)
Partial FaceSlide14
Visual ReactionsFacial expressions (face size, eye size, blink, etc.)Track viewers’ face through the front cameraTrack eye position and size (challenging with spectacles)Track partial faces (via SURF points matching)Detect blinks, lip size
Look for difference between framesSlide15
Acoustic ReactionsLaughter, Conversation, Shout-outs …Cancel out (known) movie sound from recorded sound Laughter detection, conversation detectionEven with knowledge of the original movie audio (Blue), it is hard to identify user conversation (distinguish
Red and Green)Slide16
Acoustic ReactionsSeparating movie from user’s audioSpectral energy density comparison not adequateDifferent techniques for different volume regimes
High Volume
Low VolumeSlide17
Acoustic ReactionsLaughter, Conversation, Shout-outs …Cancel out (known) movie sound from recorded sound Laughter detection, conversation detectionEarly results demonstrate promise of detecting acoustic reactionsSlide18
Motion ReactionsReactions also leave footprint on motion dimensionsMotionless during intense sceneFidget during boredom
Intense scene
Calm scene
Time to stretchSlide19
Motion ReactionsReactions also leave footprint on motion dimensionsMotionless during intense sceneFidget during boredomSlide20
Motion ReactionsReactions also leave footprint on motion dimensionsMotionless during intense sceneFidget during boredomMotion readings correlate with changing in ratings …Slide21
Motion ReactionsMotion readings correlate with changing in ratings …Timing of motions also correlate with timing of scene changesReactions also leave footprint on motion dimensionsMotionless during intense sceneFidget during boredomSlide22
Extract Reaction Features – Player controlCollect users’ player control operationsPause, fast forward, jump, roll back, …All slider movement
Seek barSlide23
Challenges in LearningSlide24
Problem – A Generalized Model Does Not WorkDirectly trained model does not capture the rating trendWhy?Slide25
The Reason it Does Not Work is …Human behaviors are heterogeneousUsers are differentEnvironments are different even for same user (home vs. commute)
homecommute
Sensed motion patterns very different when the same movie wateched
during a bus commute vs. in bed at home.Slide26
The Reason it Does Not Work is …Human behaviors are heterogeneousUsers are differentEnvironments are different even for same user (home vs. commute)Gyroscope readings from same user (at home and office)Slide27
The Reason it Does Not Work is …Human behaviors are heterogeneousUsers are differentEnvironments are different even for same user (home vs. commute)Gyroscope readings from same user (at home and office)Naïve solution build specific models one by oneImpossible to acquire data for all <User, Context, Movie> tuples
OfficeHome
Commute…Slide28
Challenges in LearningApproach:Bootstrap from Reaction AgreementsSlide29
Approach: Bootstrap from AgreementThoughtsWhat behavior means positive/negative for a particular settingHow do we acquire data without explicitly asking the user every time
Approach: Utilize reactions that most people agree onTime
Climax
Boring
Cloud Knowledge
(Other users’ opinions)
Sensor Reading
RatingsSlide30
Approach: Bootstrap from AgreementSolution: spawn from consensusLearn user reactions during the “climax” and the “boring” moments Generalize this knowledge of positive/negative reactions
Gaussian process regression (ratings) and svm (labels)
GPR
SVMSlide31
EvaluationSlide32
User Experiment Setting11 participants watch preloaded movies (~50 movies)2 comedies, 2 dramas, 1 horror movie, 1action movieUsers provide manual ratings and labels For ground truthWe compare Pulse’s ratings with manual ratings Slide33
Pulse Truth1234510
1000204
201301
17
0
1
4
0
0
2
5
2
5
0
0
2
1
7
Preliminary Results – Final (5 Star) RatingSlide34
Difference with true 5 star manual ratingPreliminary Results – Final (5 Star) RatingSlide35
Preliminary Results – Myth behind the ErrorFinal ratings can deviate significantly from the average segment ratingsUser-given scores may not be linearly related to qualitySlide36
Preliminary Results – Lower Segment Rating ErrorFinal ratings come from averaging segment ratingsOur system outperforms other methodsMean Error(5-point scale)Random ratings
CollaborativefilteringOur system
3
4
4
2
2
2
5
Per-segment ratingsSlide37
Preliminary Results – Better Tag QualityTags capture users’ feelings better than SVM aloneHappy
IntenseWarmHappyIntense
WarmSlide38
Preliminary Results – Reasonable Energy OverheadReasonable energy overhead compared to without sensing
More tolerable on tablets. May need duty-cycling on smart phonesSlide39
Closing ThoughtsHuman reactions are in the mindHowever, manifest into bodily gestures, activitiesRich, multi-modal sensors on moble devicesA wider net for “catching” these reactionsPulse is an attempt to realize this opportunityDistilling semantic meanings from sensor streamsRating movies … tagging any content with reaction meta dataEnabler forRecommendation enginesContent/video searchInformation retrieval, summarizationSlide40
Thoughts?Slide41
Backup – potential questionsPrivacy concernLike every technology, pulse may attract early adoptorsIf only final ratings are uploaded, the privacy level is similar to current ratingsWhy not just emotion sensing/just laughter detectionEmotion sensing is a broad and challenging problem…but the goal is different than ours (rating)… Explicit signs like laughter usually only account for a small duration of movie viewing, we need to explore other opportunities (motion)Our approach takes advantage of the specific task – 1. we know the user is watching a movie 2. we can observe the user for a longer duration (than most emotion sensing work) 3. we know other users’ opinions
How is this possible…human mind is too complexHuman thoughts are complicated… but they may produce footprints in behaviorsUsing collaborative filtering explicitly uses knowledge of other users’ thoughts to bootstrap our algorithmThe sample size is small…only 11 usersThe sample size is limited, but
Each user watched multiple movies (50+ movies viewed)… segment ratings are for 1-minute segments (thousands of points)Collaborative filtering shows that even within this data set, the ratings can diverge and naïve solution does not work as well as oursSlide42
Preliminary Results – Better Retrieval AccuracyViewers care more about the highlights of a movieFind the contribution by using sensingGain
Additional error
Total goal
Overall achieved performanceSlide43
Challenges in LearningSlide44
Problem – A Generalized Model Does Not WorkDirectly trained model does not capture the rating trend
Why?Slide45
The Reason it Does Not Work is …Human behaviors are heterogeneousUsers are differentEnvironments are different (e.g., home vs. commute)
homecommute
Sensed motion patterns very different when the same movie
wateched during a bus commute vs. in bed at home.Slide46
The Reason it Does Not Work is …Human behaviors are
heterogeneousUsers are differentEnvironments are different (e.g., home vs. commute)Impact of sensor readings histogramsSlide47
Human behaviors are heterogeneousUsers are differentEnvironments are different (e.g., home vs. commute)Impact on sensor readings histogramsNaïve solution build specific models one by oneImpossible to acquire data for all <User, Context, Movie> tuples
OfficeHomeCommute
…The Reason it Does Not Work is …Slide48
Challenges in LearningApproach:Bootstrap from Reaction AgreementsSlide49
Approach: Bootstrap from AgreementThoughtsWhat behavior means positive/negative for a particular settingHow do we acquire data without explicitly asking the user every timeApproach: Utilize reactions that most people agree on
Time
Climax
Boring
Cloud Knowledge
(Other users’ opinions)
Sensor Reading
RatingsSlide50
Approach: Bootstrap from AgreementSolution: spawn from consensusLearn user reactions during the “climax” and the “boring” moments Generalize this knowledge of positive/negative
reactions Gaussian process regression (ratings) and svm (labels)
GPR
SVMSlide51
Approach: Bootstrap from Agreement
GPRA Simple Example of GPRSlide52
Approach: Bootstrap from AgreementOn GPR and SVM - SVMSVM is a supervised learning method for classificationIdentify hyperplanes in high-dimensional space that can best separate observed samplesFor our purpose, we used non-linear SVM with RBF kernel for its wide applicabilitySlide53
User Experiment Setting11 participants watch preloaded movies (46 movies)Two comedy, two dramas, one horror movie, one action movieUsers give manual ratings and labelsEvaluate by comparing generated ratings with manual ratingsSlide54
EvaluationSlide55
Pulse Truth1234510
100020
4201301
17
0
1
4
0
0
2
5
2
5
0
0
2
1
7
Preliminary Results –
Good Final
RatingSlide56
Preliminary Results – Myth behind the ErrorFinal ratings can deviate significantly from segment ratingUser-given scores may not be linearly related to qualitySlide57
Preliminary Results – Lower Segment Rating ErrorFinal ratings come from averaging ratings for each segmentOur system outperforms other methodsMean Error
(5-point scale)Random ratingsCollaborativefilteringOur system
3
4
4
2
2
2
5
Movie segmentsSlide58
Preliminary Results – Better Retrieval AccuracyViewers care more about the highlights of a movieFind the contribution by using sensing
GainAdditional error
Total goal
Overall achieved performanceSlide59
Preliminary Results – Better Tag QualityGenerated tags captures users’ feelings much better than using SVM alone
HappyIntenseWarmHappy
IntenseWarmSlide60
Preliminary Results – Reasonable Energy OverheadReasonable energy overhead compared to without sensing
More tolerable on tablets. May need duty-cycling on smart phonesSlide61
Closing ThoughtsHuman reactions are in the mindHowever, manifest into bodily gestures, activitiesRich, multi-modal sensors on moble devicesOpportunity for “catching” these activitiesMulti-modal capability – whole is greater than sum of partsPulse is an attempt to realize this opportunityDistilling semantic meanings from sensor streamsRating movies … tagging any content with reaction meta dataEnabler forRecommendation enginesContent/video searchInformation retrieval, summarizationSlide62
Questions?Slide63
Extract Reaction Features – Player controlPlayer control and taps Pause, fast forward, jump, roll back, …All slider movement
Seek barSlide64
Approach: Bootstrap from Agreement
GPRA Simple Example of GPRSlide65
Approach: Bootstrap from AgreementOn GPR and SVM - SVMSVM is a supervised learning method for classificationIdentify hyperplanes in high-dimensional space that can best separate observed samplesFor our purpose, we used non-linear SVM with RBF kernel for its wide applicability