Author: Barıs¸ Kayalıbay, Grady Jensen, Patrick van der Smagt

Year: 2017


Computer vision tasks uses the application of convolutional neural network Computer vision tasks use the application of a convolutional neural network and it’s also applied to the segmentation of medical images. This paper discussed a three-dimensional filter with convolutional neural network-based and, it’s applied to both hand and brain magnetic resonance images. A current convolutional neural network design was discussed with two modification. This effort is authenticated on data on the central nervous system and bones of the hand.


CNN is broadly used in the field of computer vision and they are a type of deep learning artificial neural network. The medical image comes in various forms such as computed tomography, ultrasound, x-ray and also, magnetic resonance images. Human anatomy is shown via medical images and with unhealthy parts such as tumors, lesions, and other diseases.  Describing various anatomical structures and detecting tissues that are unhealthy are the most common goal of segmentation. This paper validates the method segmentation of convolutional based medical image on either MRI brain images or on one of the hands and to this resolution, a U-net architecture was used and the two modifications been tested which is combining multiple segmentation maps and using element-wise summation to forward to feature maps. 


Network Architecture: The element-wise summation inserts straight to the details locally originate in the feature maps, and it can be viewed together in a large residential block for both skip connection and layers that are among the source and destination of the skip connection. The prior theory on the residual blocks says when the optimal function is approximated by a stack of layers it is most useful and this is in a deep neural network and the feature maps is created by growing steps and it acts as a refinement of feature maps that is created in contracting stages. Lastly, this network is tested without the usage of skip connection to know its full usefulness. This work combines segmented maps created at various point in the network. 3D medical image demands high memory and this I solved by splitting the image into smaller sections and also down sampling can be used when the memory is too big. In the network, the feature maps size changes only by stride convolutional operations or deconvolutional operation by using zero-padding. ReLU is used in the network and max-pooling isn’t used but stride convolution was used because it gives better result in the preliminary experiments.

Loss Metric: In medical segmentation class imbalance is very important and it needs special needs with regards to the used loss function, so in this work, a loss function similar with dice similarity coefficient is been used for the data training. There is a limitation to this choice of loss function because with no foreground label of the ground truth like a brain image with no tumor region this will have a maximized loss function even though the output image contains little false positives.


Data: As the said earlier network is applied to two tasks. The first dataset for this work has different hand postures of 30 MRIs (the metacarpal phalanx, the proximal phalanx, the middle phalanx, and the distal phalanx), unlike the tumor, the anatomy of the hand has limitations on the size, shape, and location of the bone.

The green part is the metacarpal phalanx, yellow is the proximal phalanx, orange is the middle phalanx and red is the distal phalanx.

The second dataset is the objective of the annual BRATS challenge which is brain tumor region segmentation and the training data for this work is the BRAT 2015 training data set which has 274 images in four modalities which is the Flair, T1, T1C and T2. 220 of the 274 images are classified has high-grade gliomas and the other 54 are classified has low grade glioma. Necrosis, Edema, Non-enhancing, and Enhancing tumor are the classes involved in the segmentation. Unlike the hand, the tumor comes out in different locations and in different sizes and shapes. For each class, the average frequency is taken over all the datasets and it was calculated for both the hand MRI and the brain MRI.

This is a 2D image from the BRAT dataset which we have the raw image to the left and the overlayed ground truth is to the right. Red is for Necrosis, Green is for Edema, Yellow is for Non-enhancing and Orange is for Enhancing tumor.

Learning and Evaluation:

Training: 20 images were used for training from the 30 MRI images samples while the remaining is going to be for validation and testing set that will be divided into 5 equally. A random transformation is used on the images because of the shortage of data and one of the three transformations is used at each iteration on the image which to be classified by the network. For the brain image data 220 images are used for testing, 27 for validation and also 27 for testing sets. The same random transformation is used for the brain image has it is used for the hand image and the same setting of Adam optimizer is used for both hand and brain image.

Cross-validation: This work associates the practice of element-wise summation with the practice of concatenation, combining the feature maps from one stage of the network with the feature maps of another network. The hand data set uses the dice score for each class on six cross-validations and for the BRATS data which is the brain data the average dice score for each region from five cross-validations and certified evaluation of 110 additional unlabeled images.

Experiments: The hand experiments made some comparison with the Jaccard loss and categorical cross-entropy, a network with and without long skip connections and a network combining multiple segmentation maps and network creating a single segmentation and also an Element-wise summation and concatenation the BRATS data looked at the performance achieved using each modality on its own, the performance achieved using different combinations of the modalities and the final performance achieved by the best-performing network(which takes place on over five cross-validations).

Results: The result of hand MRI data with and without skip connections and it shows that if the long skip connection is removed it does not unambiguously worsen the network performance. The summation network performs all other cross validations but in long skip connections concatenation works better than the element-wise summation and to explain this the feature for the expanding stages for both networks was visualized.

The concatenation shows clear segmentation maps which are M1, M2, P, D1 and D2 while the summation network shows many irrelevant details except D3 and D4. Each slice is extracted from different locations.

The BRAT data is trained on the best performing modalities and each modality represent one input channel. The result shows that when each modality is used on its own TC1 has the better outcome in every class and then T2 follows with a good outcome then since T2 is the next best performing modality it was trained on the images in Flair, T1c, and T2 modalities. Now if T1 is discarded the network has a higher dice score and this shows the combination of different networks on different MRI modalities is good by training them on each other The BRAT data uses concatenation for the final result which has a better performance on the hand MRI. The behavior of the model is well clear for unhealthy tissues and Specificity is a more appropriate metric and it defines the probability of voxel that is healthy being classified as healthy although the mean specificity of this network is a high and visual review of the segmentation maps shows many occurrences of false-positive and including showing healthy brain in the dataset benefits the outputs and this can be done by generating a synthetic image.

Green is for necrosis, yellow is for the edema, red is for the non-enhanced and orange is for the enhanced

Related works

Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla, Senior Member, IEEE” SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation”

A novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation called SegNet. This segmentation consists of an encoder network, a matching decoder network and then a pixel-wise classification layer.


Two modifications to U-net design was tested on merging multiple segmentation image at dissimilar scales and with the use of element-wise Two modifications to U-net design was tested on merging multiple segmentation image at dissimilar scales and with the use of element-wise summation to forward feature map from one network phase to another and the multiple segmentation maps ahs no effect on the final performance but the element wise summation produce a worse result. To solve the class imbalance, issue a loss function was introduced and the high memory of the MRI images was down sampled. 3D convolutional neural network design can get good result regardless of scarcity of labelled images.

Future work

The future work for this paper is relating weight with each class which caused more hyper-parameter problem in this work