ActorObject Relation in Videos Volodymyr Bobyr and Aayushjungbahadur Rana Task Input A video with Actors Adult Child Dog Objects toys furniture etc Actions holding in front talking to etc ID: 772024
Download Presentation The PPT/PDF document "Actor-Object Relation in Videos" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Actor-Object Relation in Videos Volodymyr Bobyr and Aayushjungbahadur Rana
Task Input: A video with: Actors: Adult, Child, DogObjects: toys, furniture, etc.Actions: “holding”, “in front”, “talking to”, etc.Output:Spatial & Temporal Pixel-Perfect Localization of actors, objects, and actionsDataset: VidOR – 10,000 Video-Clips
Approach Convolutional encoder/decoder network: Encoder backbone: I3D pretrained on kineticsDecoder: Feature pyramid network with diluted convolutions and side-connections4 Stages:Actor & Object spatial segmentationCentroid DetectionAction spatial segmentationTemporal connection – postprocessing
Details Input: ( n_frames, 224, 224, 3)Output:Actor/Object Segmentation: (n_frames, 56, 56, 80)Centroid Detection: (n_frames, 56, 56, 1)Action Segmentation: (n_frames , 56, 56, 52)Class Imbalance:People: 56% of all objectsBackground: in every videoclip Solution: class weights
IoU Metrics Mean Intersection over Union among pixels in each frame
Data Preparation & Output Example Original centroids Original image Augmented Image Augmented Centroids Experimental Segmentation Output
Experimental Results In the past: Loss: Binary Cross-Entropy
Experimental Results Before: Loss: Categorical Cross-Entropy
Experimental Results Now: Categorical Cross-Entropy + Augmentation Tweaks