Improvement in Accuracy and Speed of image Semantic Segmentation via Convolution Neural Network ENCODER-Decoder

Hanieh Zamanian; Hassan Farsi; Sajad Mohamadzadeh

Improvement in Accuracy and Speed of image Semantic Segmentation via Convolution Neural Network ENCODER-Decoder

محل انتشار: فصلنامه سیستم های اطلاعاتی و مخابرات، دوره: 6، شماره: 3

سال انتشار: 1397

نوع سند: مقاله ژورنالی

زبان: انگلیسی

مشاهده: 558

فایل این مقاله در 8 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

این مقاله در بخشهای موضوعی زیر دسته بندی شده است:

هوش مصنوعی > شبکه عصبی

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/993153

شناسه ملی سند علمی:

JR_JIST-6-3_001

تاریخ نمایه سازی: 6 اسفند 1398

چکیده مقاله:

Recent researches on pixel-wise semantic segmentation use deep neural networks to improve accuracy and speed of these networks in order to increase the efficiency in practical applications such as automatic driving. These approaches have used deep architecture to predict pixel tags, but the obtained results seem to be undesirable. The reason for these unacceptable results is mainly due to the existence of max pooling operators, which reduces the resolution of the feature maps. In this paper, we present a convolutional neural network composed of encoder-decoder segments based on successful SegNet network. The encoder section has a depth of 2, which in the first part has 5 convolutional layers, in which each layer has 64 filters with dimensions of 3×3. In the decoding section, the dimensions of the decoding filters are adjusted according to the convolutions used at each step of the encoding. So, at each step, 64 filters with the size of 3×3 are used for coding where the weights of these filters are adjusted by network training and adapted to the educational data. Due to having the low depth of 2, and the low number of parameters in proposed network, the speed and the accuracy improve compared to the popular networks such as SegNet and DeepLab. For the CamVid dataset, after a total of 60,000 iterations, we obtain the 91% for global accuracy, which indicates improvements in the efficiency of proposed method.Keywords: Semantic Segmentation; Convolutional Neural Networks; Encoder – Decoder; Pixelwise Semantic Interpretation.1. IntroductionSemantic segmentation for 2D images, video and even 3D data is one of the key problems in computer vision [1]. For large images, semantic segmentation is one of the high-level tasks that makes a full scene understanding [2]. The importance of the scene understanding as a major problem in computer vision is due to the fact that a large number of applications is improved or developed by the inference of image information [3,4]. Some of these include independent driving, human-machine engagement, image search engines, and virtual reality [5]. In the past, solutions were developed by using various machine learning techniques for this problem. Despite the popularity of machine learning based methods, deep learning has revolutionized the solution of these problems, so that many computer vision problems, including semantic segmentation, with the use of deep architecture, especially the convolutional neural networks (CNN), perform with even better accuracy than other approaches [6-8]. Semantic segmentation is still challenging task today. Theoretically, semantic segmentation combines two functions [9]; one is the segmentation of the image, and the other is the classification of the objects in which eventually connects parts of the image that belongs to one object class. By semantic segmentation, we can obtain the pixelwise semantic interpretation of the image [10]. Compared to the object detection, semantic segmentation is considered to be a major improvement because the distinction between objects is mentioned based on the distinction between the pixels. However, there are several problems and challenges that are mainly summarized in the following aspects: 1) Object Level: due to differences in lighting, viewing points and distance, an object in the image may be seen in very different ways. 2) Class Level: objects in one class may be different, and objects in different classes may be similar. For example, a pedestrian in front of a car divides the visual view of the car into two parts. 3) Background: a clean background helps to split, but in practice, the background is usually complicated which may be misleading [11].Before the development of the deep learning algorithms, there were

کلیدواژه ها:

Semantic Segmentation ، Convolutional Neural Networks ، Encoder – Decoder ، Pixelwise Semantic Interpretation

نویسندگان

Hanieh Zamanian

Department of Electrical and Computer Engineering., University of Birjand, Birjand, Iran

Hassan Farsi

Department of Electrical and Computer Engineering., University of Birjand, Birjand, Iran

Sajad Mohamadzadeh

Technical faculty of Ferdows, University of Birjand, Birjand, Iran