الفهرس | Only 14 pages are availabe for public view |
Abstract Human face expression recognition is one of the most challenging tasks in social communication. It plays a crucial role in the area of computer vision and human-machine interaction. It is an active research area that has massive applications in the medical field, crime investigation, marketing, online learning, automobile safety, and video games. The first part of this thesis describes a deep neural network model-based framework for recognizing the seven main types of facial expression, which are found in all cultures. These are anger, disgust, fear, happiness, sadness, surprise, and neutrality. The proposed methodology involves four stages: (a) pre-processing the FER2013 dataset through relabeling to avoid misleading results, and getting rid of non-face and non-frontal faces; (b) design of an efficient stable Cycle Generative Adversarial Network (CycleGAN), which provides unsupervised expression-to-expression translation. The CycleGAN has been designed and trained with a new cycle consistency loss. (c) Generating new images to overcome the class imbalance, especially for the disgust class; and finally (d) building the deep neural network architecture for recognizing the face sign expression, using the pre-trained VGG-Face model with vggface weights. The model has been tested on the original version of the FER2013 dataset and the modified balanced version. Also, the designed model has been utilized for identifying face sign expressions in real-time images after detecting faces using Multi-task Cascaded Convolutional Networks (MTCNN). Results show that the model is robust. The designed model run time to recognize a face sign is 0.44 seconds. Besides, the average test accuracy has been increased from 64% for the original FER2013 dataset to 91.76% for the modified balanced version using the same transfer learning model. The second part of the thesis encompasses the design of a GPU-accelerated face expression recognition system for expression recognition in real-time video sequences. Any face expression recognition system encompasses two basic stages, face detection for face localization and facial expression recognition for expression classification. Unfortunately, face detection algorithms require intensive computational power, which makes them an inadequate choice for performing face detection tasks in real-time video sequences. To capture real-time video streams in python, the OpenCV library which is an open-source python has been used. To overcome processing limitations, computations should be pushed to the graphics processing unit (GPU) using NVIDIAs Compute Unified Device Architecture (CUDA). But the available OpenCV versions, unfortunately, don’t have CUDA support to achieve optimal performance on GPU-enabled workstations. For optimal utilization of hardware resources, there should have been a solution to cope with this gap between available hardware resources and python libraries that use CPU as backend. OpenCV CUDA module which is a set of classes and functions to utilize CUDA computational capabilities is the clue. It is an influential tool for the fast implementation of CUDA-accelerated computer vision algorithms. This part of the thesis encompasses the compilation of the OpenCV library from scratch with CUDA and CUDNN support, which is the cornerstone of the real-time face expression system from video streams. In the Face Expression Recognition stage, the compiled model on the new relabeled balanced FER2013 dataset has been used. The designed scheme was employed in real-time video processing to classify frames into one of the universal facial expressions Anger, Disgust, Fear, Happiness, Sadness, Surprise, and Neutrality. For the face detection stage, Haar Cascaded and deep learning were used and tested using both CPU and GPU as backend and the results were compared. In terms of Frame Per Second (FPS) metric of the overall video stream starting which includes face detection and facial expression recognition, there was a great improvement in FPS after using GPU as backend in both Haar and deep learning thanks to the CUDA module of the newly compiled version of OpenCV. Using OpenCV’s Deep Neural Network (DNN) module with NVIDIA GPUs, CUDA, and cuDNN, the FPS has been improved from 7.41 on the CPU to 23.12 achieving 312.01% faster inference for feature-based approach. The inference speed has been also improved by up to 169.74% for the deep learning-based approach as the FPS has been increased from 30.30 using CPU as a backend to 51.43 using GPU as backend. Deep learning is recommended to be used in the face detection stage as it was found to be more accurate and faster than the Haar cascade. |