/
Deep Learning for Expression Recognition in Image Sequences Deep Learning for Expression Recognition in Image Sequences

Deep Learning for Expression Recognition in Image Sequences - PowerPoint Presentation

phoebe-click
phoebe-click . @phoebe-click
Follow
343 views
Uploaded On 2019-11-09

Deep Learning for Expression Recognition in Image Sequences - PPT Presentation

Deep Learning for Expression Recognition in Image Sequences Daniel Natanael García Zapata Tutors Dr Sergio Escalera Dr Gholamreza Anbarjafari April 27 2018 Introduction and Goals Introduction Dennis Hamester et al Face ExpressionRecognition with a 2Channel ConvolutionalNeural Network ID: 765107

image recognition emotion models recognition image models emotion deep features input 2015 facial convolutional learning cnn face neural expressions

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Deep Learning for Expression Recognition..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Deep Learning for Expression Recognition in Image Sequences Daniel Natanael García ZapataTutors:Dr. Sergio EscaleraDr. Gholamreza AnbarjafariApril 27 2018

Introduction and Goals

Introduction Dennis Hamester et al., “Face ExpressionRecognition with a 2-Channel ConvolutionalNeural Network”, International Joint Conference on Neural Networks (IJCNN), 2015.Zhang, Y., & Ji, Q. (2003, October). Facial expression understanding in image sequences using dynamic and active visual information fusion. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on (pp. 1297-1304). IEEE. Facial expressions convey information for transmitting emotions. Emotion recognition is a complex task even some humans. Deep Learning algorithms have gotten great results in the area of computer vision.

Goals Identify pros and cons of the different deep learning models that are tested.Compare computer vision techniques that recognises emotions in facial expressions. The comparison includes still images models and image sequences models in the datasets.

Index BasicsBackgroundEvaluated Deep ModelsResultsConclusions

Basics

Convolutional Neural Network Yang, S., Luo, P., Loy, C. C., Shum, K. W., & Tang, X. (2015, January). Deep Representation Learning with Target Coding. In AAAI (pp. 3848-3854).Example of a Convolutional Neural Network

Convolutional Layer ConvolutionConnects each neuron to a local region.NeuronsThree dimensions: width, height, depth.Depth is the activation volume called kernels.KernelDescribe the set of weights learnt. A. Karpath. CS231n Convolutional Neural Networks for Visual Recognition, 2018.

Pooling Layer Partitions the input into non-overlapping rectangles.For each sub-region outputs the maximum value of the features in that region.A. Karpath. CS231n Convolutional Neural Networks for Visual Recognition, 2018.

Modalities and Features A. Karpath. CS231n Convolutional Neural Networks for Visual Recognition, 2018.I. Ofodile, K. Kulkarni, C. A. Corneanu, S. Escalera, X. Baro, S. Hyniewska, J. Allik, and G. Anbarjafari. Automatic Recognition of Deceptive Facial Expressions of Emotion. 2017.

Background

Deep Learning Based Emotion Recognition from Still Images M. G. Calvo and D. Lundqvist. Facial expressions of emotion (KDEF): Identification under different display-duration conditions. Behavior Research Methods, 2008.A pre-trained deep CNN as a Stacked Convolutional AutoEncoder (SCAE). The model is trained using the Karolinska Directed Emotional Faces (KDEF) dataset.

Deep Learning Based Emotion Recognition from Image Sequences A. Dhall, O. V. Ramana Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015.N. Ronghe, S. Nakashe, A. Pawar, and S. Bobde. Emotion recognition and reaction prediction in videos, 2017.CNN-RNN architecture for emotion transaction analysis. The model is trained with two datasets: CASIA-Webface Emotion Recognition in the Wild.

Evaluated Deep Models

VGG-Face Basic CNNTwo-Stream CNNMiddle Fusion CNNO. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep Face Recognition. In Procedings of the British Machine Vision Conference 2015, 2015.

C3D Frames InputBalu, Aditya et al. “Learning Localized Geometric Features Using 3D-CNN: An Application to Manufacturability Analysis of Drilled Holes.” (2016).

Recurrent Models LSTMGRULSTMGRU Olah Christopher. Understanding LSTM Networks, 2015.

Results

SASE-FE I. Ofodile, K. Kulkarni, C. A. Corneanu, S. Escalera, X. Baro, S. Hyniewska, J. Allik, and G. Anbarjafari. Automatic Recognition of Deceptive Facial Expressions of Emotion. 2017. Examples: 29,798 Classes: 12 emotions Data Split: Training: 75% Validation: 15% Test: 10%

OULU-CASIA G. Zhao, X. Huang, M. Taini, S. Z. Li, and M. Pietika ̈inen. Facial expression recognition from near-infrared videos. Image and Vision Computing, 2011. Examples: 8,384 Classes: 6 emotions Data Split: Training: 75% Validation: 15% Test: 10%

Preprocessing Extract each frame as an image.Only considers the frames from half of the video duration until the 90% of the duration.Frontalization process to transform into forward facing faces.Obtain face landmarks geometry. No Symmetry Soft Symmetry Geometry I. Ofodile, K. Kulkarni, C. A. Corneanu, S. Escalera, X. Baro, S. Hyniewska, J. Allik, and G. Anbarjafari. Automatic Recognition of Deceptive Facial Expressions of Emotion. 2017. T. Hassner, S. Harel, E. Paz, and R. Enbar. Effective Face Frontalization in Unconstrained Images.

Models Parameters Loss FunctionSparse Categorical Cross EntropyOptimizer AdamLearning Rate: 0.001Beta 1: 0.9Beta 2: 0.999RNN Layers: 2

Multi-modalities overall improve the models in about 5% on average.Image input has a higher accuracy of 4% on average compared to Features input on OULU.Feature input has a higher accuracy of 3% on average compared to Image input on SASE. OULU receives a boost of 17% compared to 1% of SASE.Quantitative Results Best Models for each dataset and model

Qualitative Results Frontalization outperforms raw face input. All multimodality models improve over vanilla CNN.Most models are not able to distinguish Disgust at all. Fear only has a handful of images correctly classified or even misclassified.Anger, Sadness and Surprise are proven to be correctly classified.LSTM and GRU both have higher accuracy than 3D CNN.SASE has a higher accuracy with image features; on the contrary OULU when using features input. GRU obtain the highest test accuracy in both datasets. The evidence shows that GRU’s lesser parameters are more effective in considerably smaller datasets. Although evidence suggests that temporal features are not as important, given the small size of both datasets there is no sufficient evidence to conclude that temporal models are not effective. Still Image Image Sequence

Conclusions

Conclusions The evaluation clearly demonstrated the superiority of performing face frontalization for pre-processing the data over performing no pre-processing at all.In spite the fact that both multi-modal and middle-fusion models improve over the base model. There is clear consensus which model improved to a greater extent. It was demonstrated the superiority of the GRU models in image-sequence inputs over the 3D CNN and LSTM. It was not possible to concretely conclude if using extracted feature vectors from the CNN as the input of the RNNs were better than using image vectors as inputs.It is compelling to include more databases in a future work and try other pre-trained models. Also, It would be interesting to combine hand-crafted features that combine spatio-temporal features.

Thank You! Daniel Natanael García Zapata