Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

  • سال انتشار: 1399
  • محل انتشار: مجله هوش مصنوعی و داده کاوی، دوره: 8، شماره: 2
  • کد COI اختصاصی: JR_JADM-8-2_003
  • زبان مقاله: انگلیسی
  • تعداد مشاهده: 412
دانلود فایل این مقاله

نویسندگان

M. Kurmanji

Human Computer Interaction Lab., Electrical and Computer Engineering Department, Tarbiat Modares University, Tehran, Iran.

F. Ghaderi

Human Computer Interaction Lab., Electrical and Computer Engineering Department, Tarbiat Modares University, Tehran, Iran.

چکیده

Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total frames of a video. So far, both 2D and 3D convolutional neural networks have been used to manipulate the temporal dynamics of the video frames. 3D CNNs can extract the changes in the consecutive frames and tend to be more suitable for the video classification task, however, they usually need more time. On the other hand, by using techniques like tiling it is possible to aggregate all the frames in a single matrix and preserve the temporal and spatial features. This way, using 2D CNNs, which are inherently simpler than 3D CNNs can be used to classify the video instances. In this paper, we compared the application of 2D and 3D CNNs for representing temporal features and classifying hand gesture sequences. Additionally, providing a two-stage two-stream architecture, we efficiently combined color and depth modalities and 2D and 3D CNN predictions. The effect of different types of augmentation techniques is also investigated. Our results confirm that appropriate usage of 2D CNNs outperforms a 3D CNN implementation in this task.

کلیدواژه ها

Convolutional Neural Networks, deep learning, Hand gesture recognition, Video Classification

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.