yongchanchun

yongchanchun

I’m trying to post on a daily basis
this may be hard but Nothing is achieved without trying!

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

2 minute read

Abstract

Inception and residual network have yielded outbreaking performace in the 2015 ILSVRC challenge.
This paper proves that applying residual connection to inception networks accelerates the training process significantly.

Introduction

Since inception network tend to get very deep, replacing filter concatenation stage of the Inception architecture with residual architecture is very effective.
In this paper, they compared two pure Inception variants, Inception-v3 and v4, to two other variants with similarly expensive Inception-ResNet variants.

Architectural Choices

1. Pure Inception blocks

General look

Large scale of Inception-v4 looks as follows:

Previous inception networks tend to be very conservative about changing the architectural choices, and had various forms among structures.
Inception-v4 tackled the problems that previous inception networks had by making uniform choices for the inception blocks.

Stem

‘V’ in the figures indicates valid padding, which input grid size and output grid size differ.
figures not marked with ‘V’ indicates same padding, which input grid size matches output grid size.
Stem structure is used for input part of Inception-v4 and Inception-ResNet-v2 networks.
Input size is 299x299x3, and output size is 35x35x384

Inception-A

Inception-B

Inception-C

k, l, m, n indicated varying filter sizes

Reduction-A

Reduction-B

2. Residual Inception Blocks

Inception-ResNet-v1 and Inception-ResNet-v2 networks looks as follows:

Each Inception block starts with 1x1 convolution layer, which is used for scaling up the dimensionality of the filter banks. This later compensates to dimensionality reduction caused by inception block.
Batch normalization is applied only on traditional layers, not on top on summations. This has significantly less GPU memory usage compared to model with BN applied to all layers.

Inception-resnet-A

Reduction-A

Inception-resnet-B

Reduction-B

Inception-resnet-C

Scaling of the residuals

If number of filters exceed 1000, networks die, meaning last layer before average pooling starts to produce only zeros.
This problem is solved by scaling down residuals before being added to the accumulated layer activations.

Training Methodology

Best models were achieved using RMSProp with a learning rate of 0.045.

Experimental Results

The graph below shows the top-5 error evaluations of all four models. Inception with residual block converges much faster than pure Inception networks, while having similar performance.

Test on 144 crop images were also conducted. Ensembled Model, which consists of one pure Inception-v4 model and three Inception-ResNet-v2 models, was used for evalution. Table on the left refers to the error rate of single model while the right refers to error rate of ensembled model. Ensembled model showed slightly better performance than single model.

Conclusions

Inception networks with residual connections, inception-resnet-v2, showed fastest training speed, while having similar performance with pure counterparts.
Residual connections contribute heavily to improving training speed of inception networks.

Twitter Facebook LinkedIn

Comments

You May Also Enjoy

특별한 파이썬 함수

less than 1 minute read

파이썬에는 복잡한 과정을 단순화 시켜주는 특별한 함수들이 많이 존재한다. 이러한 함수들을 정리해보았다.

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

less than 1 minute read

이번에 발표한 논문의 제목은 Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning 으로, Parameter-efficient fine-tuning 기법 중에 하나인 IA3에 대한...

Robust Speech Recognition via Large-Scale Weak Supervision

1 minute read

이번에 발표한 논문의 제목은 Robust Speech Recognition via Large-Scale Weak Supervision 으로, 일명 Whisper에 대한 논문이다.

Resilient Distributed Dataset

1 minute read

What is Resilient Distributed Dataset?