์ƒˆ์†Œ์‹

๐Ÿ“‘ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ/CV

[AlexNet] ImageNet Classification with Deep Convolutional Neural Networks

2022. 7. 13. 10:55

  • -

CNN์˜ ์‹œ์ž‘์ธ AlexNet
ImageNet Classification with Deep Convolutional Neural Networks




Abstract

  • ImageNet LSVRC-2010์˜ test ๋ฐ์ดํ„ฐ์…‹์—์„œ top-1 error rate: 37.5%, top-5 error rate: 17.0% ๋‹ฌ์„ฑ
  • ILSVRC-2012์—์„œ๋Š” top-5 test error rate 15.3% ๋‹ฌ์„ฑ
  • ์‹ ๊ฒฝ๋ง์€ 6์ฒœ๋งŒ ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์™€ 65๋งŒ ๊ฐœ์˜ ๋‰ด๋Ÿฐ์„ ๊ฐ€์ง€๊ณ  ์žˆ๊ณ ,
    max-pooling layer๋ฅผ ์‚ฌ์šฉํ•œ 5๊ฐœ์˜ convolutional layer์™€ final 1000-way softmax๋ฅผ ์‚ฌ์šฉํ•œ 3๊ฐœ์˜ fully-connected layer๋กœ ๊ตฌ์„ฑ
  • ํ•™์Šต์„ ๋น ๋ฅด๊ฒŒ ์ง„ํ–‰์‹œํ‚ค๊ธฐ ์œ„ํ•ด non-saturating ๋‰ด๋Ÿฐ๊ณผ convolution ๊ฐ€๋™์— ๋งค์šฐ ํšจ์œจ์ ์ธ GPU ์‚ฌ์šฉ
  • fully-connected layer์—์„œ overfitting์„ ์ค„์ด๊ธฐ ์œ„ํ•ด dropout ์‚ฌ์šฉ




Dataset

  • ILSVRC-2010์—์„œ๋งŒ test์…‹ label์ด ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ด ๋Œ€๋ถ€๋ถ„์˜ ์‹คํ—˜์„ ์ด ๋ฒ„์ „์œผ๋กœ ์‚ฌ์šฉ
  • ILSVRC-2012์—๋„ ๋ชจ๋ธ์„ ๋“ฑ๋กํ•ด ์ด ๋ฒ„์ „์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ๋„ ์žˆ์ง€๋งŒ test์…‹ label์€ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์—ˆ์Œ
  • ImageNet์—์„œ๋Š” ๋‘๊ฐ€์ง€ error rate ์‚ฌ์šฉ
    • 1) top-1 error rate: (์ •๋‹ต label) == (๊ฐ€์žฅ probable ํ•œ label)
    • 2) top-5 error rate: (์ •๋‹ต label) in (์ƒ์œ„ 5 probableํ•œ label)
    • error rate ์ฐธ๊ณ 
  • ImageNet์€ ๋‹ค์–‘ํ•œ ํ•ด์ƒ๋„์˜ ์ด๋ฏธ์ง€๋“ค๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์ง€๋งŒ ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ์€ ์ผ์ •ํ•œ input ์ฐจ์› ์ˆ˜๊ฐ€ ํ•„์š”ํ•ด
    ํ•ด์ƒ๋„๊ฐ€ ๊ณ ์ •๋œ 256 × 256์œผ๋กœ down-sample ์ง„ํ–‰
    • ์ง์‚ฌ๊ฐํ˜• ์ด๋ฏธ์ง€๊ฐ€ ์ฃผ์–ด์ง€๋ฉด ๋จผ์ € ์งง์€ ๋ณ€์˜ ๊ธธ์ด๊ฐ€ 256์ด ๋˜๋„๋ก ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•œ ๋‹ค์Œ,
      ๊ทธ ์กฐ์ •๋œ ์ด๋ฏธ์ง€์—์„œ ์ค‘์•™์˜ 256x256 ํŒจ์น˜๋ฅผ ์ž˜๋ผ๋ƒ„
  • train์…‹์˜ ๊ฐ ํ”ฝ์…€์— mean activity๋ฅผ ๋นผ๋Š” ๊ฒƒ์„ ์ œ์™ธํ•œ ๋‹ค๋ฅธ ์ „์ฒ˜๋ฆฌ๋Š” ์ง„ํ–‰ํ•˜์ง€ ์•Š์Œ
  • ํ”ฝ์…€์˜ raw RGB value๋ฅผ ๊ฐ€์ง€๊ณ  network๋ฅผ ํ›ˆ๋ จ์‹œํ‚ด




Architecture

5๊ฐœ์˜ convolutional layer์™€ 3๊ฐœ์˜ fully-connected layer๋กœ ์ด 8๊ฐœ์˜ layer๋กœ ๊ตฌ์„ฑ



ReLU Nonlinearity

  • ๋ณดํ†ต ๋‰ด๋Ÿฐ์˜ output์€ tanh๋‚˜ sigmoid๋ฅผ ๊ฑฐ์น˜๋Š”๋ฐ
    • tanh: $f(x) = tanh(x)$
    • sigmoid: $f(x) = (1 + e^−x)^−1$
  • ์ด๋Ÿฌํ•œ saturating nonlinearity๋Š” gradient descent๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ํ•™์Šต ์†๋„๊ฐ€ non-saturating nonlinearity๋ณด๋‹ค ๋งค์šฐ ๋Š๋ฆผ

 

  • ๋”ฐ๋ผ์„œ ๋…ผ๋ฌธ์—์„œ๋Š” non-saturating nonlinearity๋กœ ReLU๋ฅผ ์‚ฌ์šฉ
    • $f(x) = max(0, x)$

  • tanh(์ ์„ )๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ๋ณด๋‹ค ReLU๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ 6๋ฐฐ ๋นจ๋ฆฌ training error rate 0.25 ๋‹ฌ์„ฑ
    -> ์ˆ˜๋ ด ์†๋„๊ฐ€ ๊ฐœ์„ ๋๋‹ค




Training on Multiple GPUs

  • ํ•˜๋‚˜์˜ GTX 580 GPU๋Š” ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 3GB์ด๊ธฐ ๋•Œ๋ฌธ์— ๋…ผ๋ฌธ์—์„œ๋Š” net์„ 2๊ฐœ์˜ GPU์— ๋‚˜๋ˆ ์„œ ์ง„ํ–‰
  • ๋…ผ๋ฌธ์—์„œ ์ ์šฉํ•œ ๋ณ‘๋ ฌํ™” ๋ฐฉ์‹์€ ๊ธฐ๋ณธ์ ์œผ๋กœ kernel(๋˜๋Š” ๋‰ด๋Ÿฐ)์˜ ์ ˆ๋ฐ˜์„ ๊ฐ GPU์— ๋ฐฐ์น˜ํ•˜๊ณ , ์ถ”๊ฐ€์ ์œผ๋กœ GPU๋“ค๋ผ๋ฆฌ ํŠน์ • layer์—์„œ๋งŒ ์ „๋‹ฌํ•˜๋„๋ก ํ•จ
    • ์˜ˆ๋ฅผ ๋“ค์–ด, layer 3์˜ kernel๋“ค์€ layer 2์˜ ๋ชจ๋“  kernel map์œผ๋กœ๋ถ€ํ„ฐ ์ž…๋ ฅ(input)์„ ๋ฐ›์•„์˜ค์ง€๋งŒ, layer 4์˜ kernel๋“ค์€ ๊ฐ™์€ GPU์— ์žˆ๋Š” layer 3์˜ kernel map์œผ๋กœ๋ถ€ํ„ฐ๋งŒ ์ž…๋ ฅ ๋ฐ›์Œ
  • ์—ฐ๊ฒฐ ํŒจํ„ด์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ์€ cross-validation์˜ ๋ฌธ์ œ์ด์ง€๋งŒ, ์ด๋ฅผ ํ†ตํ•ด ์ „๋‹ฌ๋Ÿ‰์ด ๊ณ„์‚ฐ๋Ÿ‰์˜ ํ—ˆ์šฉ ๊ฐ€๋Šฅํ•œ ๋ถ€๋ถ„์ด ๋  ๋•Œ๊นŒ์ง€ ์ „๋‹ฌ๋Ÿ‰์„ ์ •๋ฐ€ํ•˜๊ฒŒ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Œ
  • ๊ทธ์— ๋”ฐ๋ฅธ ์•„ํ‚คํ…์ฒ˜๊ฐ€ columnar CNN์˜ ์•„ํ‚คํ…์ฒ˜์™€ ๋‹ค์†Œ ์œ ์‚ฌํ•˜์ง€๋งŒ ๋…ผ๋ฌธ์˜ column๋“ค์€ ๋…๋ฆฝ์ ์ด์ง€ ์•Š๋‹ค๋Š” ์ ์ด ๋‹ค๋ฆ„
  • ํ•˜๋‚˜์˜ GPU์—์„œ ํ›ˆ๋ จ๋œ ๊ฐ convolutional layer์˜ kernel ์ˆ˜๊ฐ€ ์ ˆ๋ฐ˜์ธ net์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ, top-1๊ณผ top-5 error rate๋ฅผ ๊ฐ๊ฐ 1.7%, 1.2% ์ค„์ž„
  • 2๊ฐœ์˜ GPU๋ฅผ ์‚ฌ์šฉํ•œ net์ด ํ•˜๋‚˜์˜ GPU๋ฅผ ์‚ฌ์šฉํ•œ net๋ณด๋‹ค ํ•™์Šต ์‹œ๊ฐ„์ด ์กฐ๊ธˆ ๋” ์งง์•˜์Œ




Local Response Normalization

  • ReLU๋Š” saturating์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ input normalization์ด ํ•„์š”ํ•˜์ง€ ์•Š๋‹ค๋Š” ์†์„ฑ์„ ๊ฐ€์ง
  • ๊ทธ๋Ÿฌ๋‚˜ ๋…ผ๋ฌธ์—์„œ๋Š” local normalization์ด generalization์— ๋„์›€์ด ๋œ๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•จ

 

 

  • $a^i_{x,y}$: (x, y) ์œ„์น˜์— ์ปค๋„ i kernel $i$๋ฅผ ์ ์šฉํ•œ ๋‹ค์Œ ReLU nonlinearity (๋น„์„ ํ˜•์„ฑ)์„ ์ ์šฉํ•ด ๊ณ„์‚ฐ๋œ ๋‰ด๋Ÿฐ์˜ ํ™œ๋™
  • response-normalized ํ™œ๋™์ธ $b^i_{x,y}$ ๋Š” ์œ„์˜ ์‹์œผ๋กœ ํ‘œํ˜„๋จ
  • sum์€ ๋™์ผํ•œ ๊ณต๊ฐ„ ์œ„์น˜์— ์žˆ๋Š” n๊ฐœ์˜ ์ธ์ ‘ํ•œ kernel map๋“ค์—์„œ ์ด๋ฃจ์–ด์ง
  • $N$: layer์˜ ์ด kernel ์ˆ˜
  • kernel map์˜ ์ˆœ์„œ๋Š” ์ž„์˜์ ์ด๋ฉฐ ํ•™์Šต์ด ์‹œ์ž‘๋˜๊ธฐ ์ „์— ์ •ํ•ด์ง
  • ์ด๋Ÿฌํ•œ ์ข…๋ฅ˜์˜ response normalization์€ ์‹ค์ œ ๋‰ด๋Ÿฐ์—์„œ ๋ฐœ๊ฒฌ๋˜๋Š” ์œ ํ˜•์—์„œ ์˜๊ฐ์„ ๋ฐ›์€ lateral inhibition์„ ๊ตฌํ˜„ํ•œ ํ˜•ํƒœ
    • ๋‹ค๋ฅธ kernel์„ ์‚ฌ์šฉํ•ด ๊ณ„์‚ฐ๋œ ๋‰ด๋Ÿฐ output๋“ค ๊ฐ„์— ๊ฒฝ์Ÿ์„ ์ผ์œผํ‚ด
  • ์ƒ์ˆ˜ $k$, $n$, $α$, $β$๋Š” validation์…‹์„ ์‚ฌ์šฉํ•ด ๊ฒฐ์ •๋˜๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ, ๋…ผ๋ฌธ์—์„œ๋Š” $k = 2$, $n = 5$, $α = 10^−4$ (=0.0001), $β = 0.75$ ๋กœ ์„ค์ •
  • ํŠน์ • layer๋“ค์—์„œ ReLU nonlinearity๋ฅผ ์ ์šฉํ•œ ํ›„ ์ด normalization์„ ์ ์šฉํ•จ
  • Response normalization์€ top-1๊ณผ top-5 error rate๋ฅผ ๊ฐ๊ฐ 1.4%, 1.2% ๊ฐ์†Œ์‹œํ‚ด
  • CIFAR-10 ๋ฐ์ดํ„ฐ ์…‹์—์„œ๋„ ์ด ๋ฐฉ์‹์˜ ํšจ๊ณผ๋ฅผ ํ™•์ธํ•จ
    • four-layer CNN์˜ test error rate์ด normalization์„ ์ ์šฉํ•˜์ง€ ์•Š์•˜์„ ๋•Œ๋Š” 13%, ์ ์šฉํ–ˆ์„ ๋•Œ๋Š” 11%๋ฅผ ๋‹ฌ์„ฑ




Overlapping Pooling

  • ์ „ํ†ต์ ์œผ๋กœ, pooling unit์€ ๊ฒน์น˜์ง€ ์•Š์Œ
  • pooling layer์˜ kernel ์‚ฌ์ด์ฆˆ๊ฐ€ $z$, stride๋Š” $s$๋ผ๊ณ  ํ•œ๋‹ค๋ฉด
    • $s = z$ ๋กœ ์„ค์ • -> CNN์— ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ์ „ํ†ต์ ์ธ local pooling
    • $s < z$ ๋กœ ์„ค์ • -> overlapping pooling
    • => stride๋ฅผ kernel ์‚ฌ์ด์ฆˆ์™€ ๊ฐ™๊ฒŒ ์„ค์ •ํ•˜๋ฉด ๋ณดํ†ต์˜ local pooling, stride๋ฅผ kernel ์‚ฌ์ด์ฆˆ๋ณด๋‹ค ์ž‘๊ฒŒ ์„ค์ •ํ•˜๋ฉด overlapping pooling
  • ๋…ผ๋ฌธ์—์„œ๋Š” $s = 2$, $z = 3$์œผ๋กœ ์‚ฌ์šฉ
    • => stride = 2, kernel size = $3 × 3$

 

์ถœ์ฒ˜: https://bskyvision.com/421

 

  • ๋™์ผํ•œ ์ฐจ์›์˜ output์„ ์ƒ์„ฑํ•˜๋Š” $s = 2, z = 2$์˜ non-overlapping pooling๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ, overlapping pooling์„ ์‚ฌ์šฉํ•œ ๋ฐฉ์‹์ด top-1 error rate: 0.4%, top-5 error rate: 0.3%๋กœ ๊ฐ์†Œ์‹œํ‚ด
  • ํ•™์Šตํ•˜๋Š” ๋™์•ˆ overlapping pooling์„ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ๋“ค์—์„œ overfit์ด ๋œ ๋ฐœ์ƒํ•จ




Overall Architecture

  • AlexNet์˜ ๊ตฌ์กฐ๋Š” ์œ„/์•„๋ž˜๋กœ ๊ตฌ๋ถ„๋˜์–ด ์žˆ๋Š”๋ฐ ์ด๋Š” 2๊ฐœ์˜ GPU๋ฅผ ๋ณ‘๋ ฌ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•จ
  • ๋งˆ์ง€๋ง‰ fully-connected layer์˜ output์€ 1000๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” 1000-way softmax๋กœ ์ „๋‹ฌ๋จ

 

  • 2, 4, 5๋ฒˆ์งธ conv layer์˜ kernel์€ ๋™์ผํ•œ GPU์— ์žˆ๋Š” ์ด์ „ layer์˜ kernel map์—๋งŒ ์—ฐ๊ฒฐ๋จ

 

  • 3๋ฒˆ์งธ conv layer์˜ kernel์€ 2๋ฒˆ์งธ layer์˜ ๋ชจ๋“  kernel map์— ์—ฐ๊ฒฐ๋จ

 

  • fully-connected layer์˜ ๋‰ด๋Ÿฐ์€ ์ด์ „ layer์˜ ๋ชจ๋“  ๋‰ด๋Ÿฐ๊ณผ ์—ฐ๊ฒฐ๋จ
  • Local Response Normalization์ด 1๋ฒˆ์งธ์™€ 2๋ฒˆ์งธ conv layer์—์„œ ์ผ์–ด๋‚จ
  • Max-pooling layer๋Š” Local Response Normalization์„ ์ ์šฉํ–ˆ๋˜ 1๋ฒˆ์งธ์™€ 2๋ฒˆ์งธ conv layer๊ณผ 5๋ฒˆ์งธ conv layer์—์„œ ์ผ์–ด๋‚จ
  • ReLU non-linearity๋Š” ๋ชจ๋“  conv layer์™€ fully-connected layer์˜ output์— ์ ์šฉ๋จ

 

 

Conv layer 1

  • input image = $224×224×3$
    • AlexNet์€ RGB 3๊ฐ€์ง€ ์ƒ‰์ƒ์„ ๊ฐ€์ง€๋Š” Image๋ฅผ input์œผ๋กœ ์‚ฌ์šฉ. ๊ทธ๋ž˜์„œ ์ด๋ฏธ์ง€์˜ depth๊ฐ€ 3์ด๊ณ , ์ด๋ฅผ convolution ํ•˜๊ธฐ ์œ„ํ•ด filter์˜ depth๋„ 3์ด ๋จ
  • kernel = 96
  • kernel size = $11×11×3$
  • stride = 4

 

 

Conv layer 2

  • input = Local Response Normalization๊ณผ pooling์ด ์ ์šฉ๋œ conv layer 1์˜ output
  • kernel = 256
  • kernel size = $5 × 5 × 48$

 

 

Conv layer 3

  • input = Local Response Normalization๊ณผ pooling์ด ์ ์šฉ๋œ conv layer 2์˜ output
  • kernel = 384
  • kernel size = $3 × 3 × 256$

 

 

Conv layer 4

  • kernel = 384
  • kernel size = $3 × 3 × 192$

 

 

Conv layer 5

  • kernel = 256
  • kernel size = $3 × 3 × 192$

 

  • 3, 4, 5๋ฒˆ์งธ conv layer๋Š” pooling์ด๋‚˜ normalization ์—†์ด ์—ฐ๊ฒฐ๋จ

 

 

fully-connected layers

  • ๊ฐ๊ฐ 4096 ๊ฐœ์˜ ๋‰ด๋Ÿฐ์„ ๊ฐ€์ง

 

 

 

cf) input image = $224×224×3$ or input image = $227×227×3$

์ถœ์ฒ˜: https://learnopencv.com/understanding-alexnet/

 

 

์ถœ์ฒ˜: https://www.youtube.com/watch?v=fe2Vn0mwALI&t=170s

 

  • ๋…ผ๋ฌธ์—์„œ๋Š” input์ด $224×224$์œผ๋กœ ๋‚˜์˜ค๋Š”๋ฐ ์œ„์˜ ๋‘ ์ด๋ฏธ์ง€์—์„œ๋Š” $227×227$์ด๋ผ๊ณ  ์„ค๋ช…๋˜์–ด ์žˆ์Œ
  • ๋…ผ๋ฌธ์— ์„ค์ •๋œ stride์™€ padding์„ ๊ณ ๋ คํ–ˆ์„ ๋•Œ, ์—ฌ๋Ÿฌ layer๋ฅผ ๊ฑฐ์ณ $13×13$์œผ๋กœ ๋„๋‹ฌํ•˜๋ ค๋ฉด $227×227$์ด ๋งž๋‹ค๊ณ  ํ•จ
  • => ๋…ผ๋ฌธ์˜ ๊ทธ๋ฆผ์ด ์ž˜๋ชป๋œ ๊ฒƒ

 

 

 

Reducing Overfitting

overfitting์„ ์ค„์ด๊ธฐ ์œ„ํ•ด Data Augmentation๊ณผ Dropout ์‚ฌ์šฉ

 

 

 

Data Augmentation

  • ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ overfitting์„ ์ค„์ด๋Š” ๊ฐ€์žฅ ์‰ฝ๊ณ  ์ผ๋ฐ˜์ ์ธ ๋ฐฉ๋ฒ•์€ label-preserving transformations๋ฅผ ์‚ฌ์šฉํ•ด ๋ฐ์ดํ„ฐ์…‹์„ ์ธ์œ„์ ์œผ๋กœ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ
    • label-preserving transformation
    • : ์›๋ณธ ๋ฐ์ดํ„ฐ(label)์˜ ํŠน์„ฑ์„ ๊ทธ๋Œ€๋กœ ๋ณด์กด(preserving)ํ•˜๋ฉด์„œ ๋ณ€ํ™˜(transformation)ํ•˜๋Š” ๊ฒƒ
    • ์ƒํ•˜๋ฐ˜์ „ ๊ฐ™์€ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•  ๋•Œ, ์˜๋ฏธ๊ฐ€ ์™„์ „ํžˆ ๋ฐ”๋€” ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ

 

  • ๋…ผ๋ฌธ์—์„œ๋Š” ๋‘ ๊ฐ€์ง€ ํ˜•ํƒœ์˜ data augmentation์„ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ ๋‘ ๊ฐ€์ง€ ๋ชจ๋‘ ์•„์ฃผ ์ ์€ ๊ณ„์‚ฐ์œผ๋กœ ์›๋ณธ ์ด๋ฏธ์ง€์—์„œ ๋ณ€ํ™˜๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋ณ€ํ™˜๋œ ์ด๋ฏธ์ง€๋ฅผ ๋””์Šคํฌ์— ์ €์žฅํ•  ํ•„์š”๊ฐ€ ์—†๊ฒŒ ํ•จ
  • ๋ณ€ํ™˜๋œ ์ด๋ฏธ์ง€๋“ค์€ GPU๊ฐ€ ์ด๋ฏธ์ง€์˜ ์ด์ „ batch์—์„œ ํ•™์Šตํ•˜๋Š” ๋™์•ˆ CPU์—์„œ Python ์ฝ”๋“œ๋กœ ์ƒ์„ฑ๋˜๊ฒŒ ํ•จ. ๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ data augmentation ๋ฐฉ์‹์€ ์‚ฌ์‹ค์ƒ ๊ณ„์‚ฐ์ด ํ•„์š”ํ•˜์ง€ ์•Š์Œ

 

1) ์ขŒ์šฐ ๋ฐ˜์ „ (horizontal reflection)

  • Training
    • $256 × 256$์˜ ์›๋ณธ ์ด๋ฏธ์ง€ ์ค‘ $224 × 224$ ํฌ๊ธฐ์˜ ์ด๋ฏธ์ง€๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์ถ”์ถœ, ์ถ”์ถœ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ขŒ์šฐ ๋ฐ˜์ „์‹œ์ผœ ์ด ์ถ”์ถœ๋œ ์ด๋ฏธ์ง€๋“ค์„ ํ•™์Šต์‹œํ‚ด

 

์ถœ์ฒ˜:https://89douner.tistory.com/60?category=873854

  • ์ด ๋ฐฉ๋ฒ•์œผ๋กœ 1์žฅ์˜ ์ด๋ฏธ์ง€์—์„œ $32 × 32 × 2 = 1024 × 2 = 2048$ ๊ฐœ์˜ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๊ฒŒ ๋จ

 

  • Test
    • 5๊ฐœ์˜ $224 × 224$ ์ด๋ฏธ์ง€(4๊ฐœ์˜ ๋ชจ์„œ๋ฆฌ์™€ ์ค‘์•™์—์„œ ์ถ”์ถœํ•œ ์ด๋ฏธ์ง€)์™€ ์ด๋“ค์„ ๊ฐ๊ฐ ์ขŒ์šฐ ๋ฐ˜์ „ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ถ”์ถœ
    • => ์ด 10๊ฐœ์˜ ์ด๋ฏธ์ง€

 

 

์ถœ์ฒ˜: https://oi.readthedocs.io/en/latest/computer_vision/cnn/alexnet.html

 

4๊ฐœ์˜ ๋ชจ์„œ๋ฆฌ์™€ ์ค‘์•™์—์„œ 5๊ฐœ์˜ 224x224 ์ด๋ฏธ์ง€ ์ถ”์ถœ

  • ์ด๋ ‡๊ฒŒ ์ถ”์ถœํ•œ 10๊ฐœ์˜ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด softmax layer๊ฐ€ ๋งŒ๋“  ์˜ˆ์ธก์„ ํ‰๊ท ํ™”ํ•ด์„œ ์˜ˆ์ธกํ•จ

 

 

2) ํ›ˆ๋ จ ์ด๋ฏธ์ง€์˜ RGB ์ฑ„๋„ ๊ฐ’ ๋ณ€๊ฒฝ

  • ์›๋ž˜ ํ”ฝ์…€ ๊ฐ’ + ์ด๋ฏธ์ง€์˜ RGB ํ”ฝ์…€์— ๋Œ€ํ•œ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„(PCA)ํ•œ ๊ฐ’ X (ํ‰๊ท : 0, ํ‘œ์ค€ํŽธ์ฐจ: 0.1์ธ Gaussian์—์„œ ์ถ”์ถœํ•œ) ๋žœ๋ค ๋ณ€์ˆ˜
  • ์ด ๋ฐฉ์‹์€ ์›๋ณธ ์ด๋ฏธ์ง€์˜ ์ค‘์š”ํ•œ ์†์„ฑ, ์ฆ‰ ๋ฌผ์ฒด์˜ identity๊ฐ€ ๋น›์˜ ๊ฐ•๋„์™€ ์ƒ‰์ƒ์˜ ๋ณ€ํ™”์— ๋ณ€ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์ ์„ ๋Œ€๋žต์ ์œผ๋กœ ํฌ์ฐฉํ•จ
  • ์ด ๋ฐฉ์‹์œผ๋กœ top-1 error rate๋ฅผ 1% ์ด์ƒ ๊ฐ์†Œ์‹œํ‚ด

 

 

 

Dropout

  • dropout: iteration ๋งˆ๋‹ค layer ๋…ธ๋“œ ์ค‘ ์ผ๋ถ€๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์œผ๋ฉด์„œ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•

 

์ถœ์ฒ˜: https://blog.naver.com/PostView.nhn?isHttpsRedirect=true&blogId=laonple&logNo=220818841217

 

  • ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๋Œ€์‹ , ๋ชจ๋ธ ๊ฒฐํ•ฉ(model combination)์— ์˜ํ•œ ํˆฌํ‘œ ํšจ๊ณผ(Voting)๊ณผ ๋น„์Šทํ•œ ํšจ๊ณผ๋ฅผ ๋‚ด๊ธฐ ์œ„ํ•ด ํ•™์Šต์ด ์ง„ํ–‰๋˜๋Š” ๋™์•ˆ ๋ฌด์ž‘์œ„๋กœ ์ผ๋ถ€ ๋‰ด๋Ÿฐ์„ ์ƒ๋žตํ•จ
  • ์ƒ๋žต๋œ ๋‰ด๋Ÿฐ์˜ ์กฐํ•ฉ๋งŒํผ ์ง€์ˆ˜ํ•จ์ˆ˜์ ์œผ๋กœ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ ๊ฒฐํ•ฉ์˜ ํšจ๊ณผ๋ฅผ ๋ˆ„๋ฆด ์ˆ˜ ์žˆ์Œ
  • ์ƒ๋žต๋œ ๋ชจ๋ธ๋“ค์ด ๋ชจ๋‘ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณต์œ ํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋‘ ๊ฐ๊ฐ์˜ ๋‰ด๋Ÿฐ๋“ค์ด ์กด์†ํ• (dropout ํ•˜์ง€ ์•Š์„) ํ™•๋ฅ ์„ ๊ฐ๊ฐ์˜ ๊ฐ€์ค‘์น˜์— ๊ณฑํ•ด์ฃผ๋Š” ํ˜•ํƒœ๊ฐ€ ๋จ
  • 1, 2๋ฒˆ์งธ fully-connected layer์— dropout์„ ์ ์šฉ์‹œํ‚ด
  • Dropout(0.5) -> 50%๋งŒ ์‚ฌ์šฉํ•œ๋‹ค
  • dropout์€ ์ˆ˜๋ ด์— ํ•„์š”ํ•œ iteration์˜ ์ˆ˜๋ฅผ ๋Œ€๋žต 2๋ฐฐ ์ฆ๊ฐ€์‹œํ‚ด

 

 

 

Details of learning

  • ํ•™์Šต ์‹œ, batch size = 128, momentum = 0.9, weight decay = 0.0005๋กœ ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(SGD, stochastic gradient descent) ์ ์šฉ
  • ์—ฌ๊ธฐ์„œ weight decay(๊ฐ€์ค‘์น˜ ๊ฐ์†Œ)๋Š” ๊ทธ์ € regularizer๊ฐ€ ์•„๋‹ˆ๋ผ ๋ชจ๋ธ์˜ training error๋ฅผ ์ค„์ž„
  • ํ‘œ์ค€ ํŽธ์ฐจ๊ฐ€ 0.01์ธ zero-mean Gaussian distribution์œผ๋กœ๋ถ€ํ„ฐ ๊ฐ layer์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•จ
  • 2๋ฒˆ์งธ, 4๋ฒˆ์งธ, 5๋ฒˆ์งธ conv layer์™€ fully-connected hidden layer์—์„œ ๋‰ด๋Ÿฐ ํŽธํ–ฅ์„ ์ƒ์ˆ˜ 1๋กœ ์ดˆ๊ธฐํ™”ํ•จ
  • ์ด ์ดˆ๊ธฐํ™”๋Š” ReLU์— positive input์„ ์ œ๊ณตํ•ด ํ•™์Šต์˜ ์ดˆ๊ธฐ ๋‹จ๊ณ„๋ฅผ ๊ฐ€์†ํ™”ํ•จ
  • ๋‚จ์•„์žˆ๋Š” layer์—๋Š” ๋‰ด๋Ÿฐ ํŽธํ–ฅ์„ ์ƒ์ˆ˜ 0์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•จ
  • ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋“  layer์— ๋Œ€ํ•ด ๋™์ผํ•œ ํ•™์Šต๋ฅ ์„ ์ ์šฉํ–ˆ๊ณ , ํ•™์Šต ๋‚ด๋‚ด ์ˆ˜๋™์œผ๋กœ ์กฐ์ •ํ•จ
  • validation error rate๊ฐ€ ํ˜„์žฌ์˜ ํ•™์Šต๋ฅ ๋กœ ๊ฐœ์„ ๋˜์ง€ ์•Š์„ ๋•Œ, 10์œผ๋กœ ๋‚˜๋ˆˆ ํ•™์Šต๋ฅ ์„ ์ ์šฉํ•จ
  • ํ•™์Šต๋ฅ ์€ 0.01๋กœ ์ดˆ๊ธฐํ™”ํ–ˆ๊ณ , ํ•™์Šต ์ข…๋ฃŒ ์ „๊นŒ์ง€ 3๋ฒˆ ๊ฐ์†Œํ•จ
  • 120๋งŒ ๊ฐœ์˜ ์ด๋ฏธ์ง€ train์…‹์„ ๋Œ€๋žต 90 cycle ๋™์•ˆ ํ•™์Šต์‹œ์ผฐ๊ณ , ์ด๋Š” 2๊ฐœ์˜ NVIDIA GTX 580 3GB GPU๋กœ 5~6์ผ์ด ์†Œ์š”๋จ

 

 

 

Results

  • ILSVRC-2010์˜ test์…‹์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ

 

 

  • ILSVRC-2012์˜ val์…‹๊ณผ test์…‹์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ

 

 

 

Qualitative Evaluations

 

 

  • 96๊ฐœ์˜ kernel ์ค‘ ์œ„์ชฝ์˜ 48๊ฐœ์˜ kernel์€ GPU-1์—์„œ, ์•„๋ž˜์ชฝ์˜ kernel 48๊ฐœ๋Š” GPU-2์—์„œ ํ•™์Šต๋จ
  • GPU-1์—์„œ๋Š” ์ฃผ๋กœ ์ปฌ๋Ÿฌ์™€ ์ƒ๊ด€ ์—†๋Š” ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ kernel์ด ํ•™์Šต๋˜๊ณ , GPU-2์—์„œ๋Š” ์ฃผ๋กœ ์ปฌ๋Ÿฌ์™€ ๊ด€๋ จ๋œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ kernel์ด ํ•™์Šต๋จ

 

 

 

  • 8๊ฐœ์˜ ILSVRC-2010 test ์ด๋ฏธ์ง€๋“ค๊ณผ ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ top-5 ๋ ˆ์ด๋ธ”์„ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ
  • ์˜ฌ๋ฐ”๋ฅธ ๋ ˆ์ด๋ธ”์ด ๊ฐ ์ด๋ฏธ์ง€ ์•„๋ž˜์— ์ ํ˜€์žˆ๊ณ , ์˜ฌ๋ฐ”๋ฅธ ๋ ˆ์ด๋ธ”์— ํ• ๋‹น๋œ ํ™•๋ฅ ๋„ ๋นจ๊ฐ„์ƒ‰ ๋ง‰๋Œ€๋กœ ํ‘œ์‹œ๋˜์–ด์žˆ์Œ (top 5์— ์žˆ๋Š” ๊ฒฝ์šฐ)
  • ์™ผ์ชฝ ์ƒ๋‹จ์˜ mite(์ง„๋“œ๊ธฐ) ์ด๋ฏธ์ง€์™€ ๊ฐ™์ด ์ค‘์‹ฌ์—์„œ ๋ฒ—์–ด๋‚œ ๋ฌผ์ฒด๋„ ์ž˜ ์ธ์‹๋œ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚จ
  • grille๊ณผ cherry์˜ ๊ฒฝ์šฐ, ์‚ฌ์ง„์˜ ์˜๋„๋œ ์ดˆ์ ์— ๋Œ€ํ•œ ๋ชจํ˜ธ์„ฑ์ด ์กด์žฌํ•จ
  • ํ•˜์ง€๋งŒ ์˜ˆ์ธก์ด ํ‹€๋ฆฐ ๊ฒฝ์šฐ์—๋„ ๋ณด๊ธฐ์— ๋”ฐ๋ผ ์˜ˆ์ธก์ด ๊ฐ€๋Šฅํ•œ ๋‹ต๋ณ€์„ ๋ฐ˜ํ™˜ํ–ˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Œ

 

 

 

 

  • ๋งจ ์™ผ์ชฝ์˜ ์ฒซ๋ฒˆ์งธ ์นผ๋Ÿผ์€ 5๊ฐœ์˜ ILSVRC-2010 test ์ด๋ฏธ์ง€
  • ๋‚˜๋จธ์ง€ ์นผ๋Ÿผ๋“ค์€ 5๊ฐœ์˜ test ์ด๋ฏธ์ง€์— ๊ฐ๊ฐ ๊ฐ€์žฅ ๋น„์Šทํ•˜๋‹ค๊ณ  ์˜ˆ์ธก๋˜๋Š” 6๊ฐœ์˜ training ์ด๋ฏธ์ง€
  • ๊ฐ•์•„์ง€๋‚˜ ์ฝ”๋ผ๋ฆฌ์˜ ๊ฒฝ์šฐ, ๋‹ค์–‘ํ•œ ํฌ์ฆˆ๋กœ ๋‚˜ํƒ€๋‚จ

 

 

 

Discussion

  • ํ•˜๋‚˜์˜ conv layer๋ผ๋„ ์ œ๊ฑฐ๋˜์—ˆ์„ ๋•Œ, ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋œ๋‹ค๋Š” ์ ์ด ์ฃผ๋ชฉํ•  ๋งŒํ•จ

 

 

 

 

 

์ฐธ๊ณ  ์ž๋ฃŒ 

 

https://jjuon.tistory.com/22

 

[DL - ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] ImageNet Classification with Deep Convolutional Neural Networks(AlexNet)

 ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” AlexNet์ด๋ผ๊ณ  ์•Œ๋ ค์ ธ ์žˆ๋Š” Alex Krizhevsky๊ฐ€ 2012๋…„์— ์†Œ๊ฐœํ•œ "ImageNet Classification with Deep Convolutional Neural Networks"๋ฅผ ์ฝ๊ณ  ์ •๋ฆฌํ•ด ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.  AlexNet์€ ILSVRC-..

jjuon.tistory.com

 

https://bskyvision.com/421

 

[CNN ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค] AlexNet์˜ ๊ตฌ์กฐ

LeNet-5 => https://bskyvision.com/418 AlexNet => https://bskyvision.com/421 VGG-F, VGG-M, VGG-S => https://bskyvision.com/420 VGG-16, VGG-19 => https://bskyvision.com/504 GoogLeNet(inception v1) =>..

bskyvision.com

 

https://learnopencv.com/understanding-alexnet/

 

Understanding AlexNet | LearnOpenCV #

Understand the AlexNet architecture that won the ImageNet Visual Recognition Challenge in 2012 and started the Deep Learning revolution.

learnopencv.com

 

https://oi.readthedocs.io/en/latest/computer_vision/cnn/alexnet.html

 

AlexNet — Organize everything I know documentation

AlexNet์€ Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton์˜ ๋…ผ๋ฌธ “ImageNet classification with deep convolution neural network”์—์„œ ์ œ์•ˆํ•œ ๋ชจ๋ธ์ด๋‹ค. ๋˜ํ•œ, AlexNet์€ ImageNet ILVRC-2010์˜ 120๋งŒ ๊ฐœ ์ด๋ฏธ์ง€๋ฅผ 1000๊ฐœ์˜ Class๋กœ

oi.readthedocs.io

 

https://89douner.tistory.com/60?category=873854 

 

6. AlexNet

์•ˆ๋…•ํ•˜์„ธ์š”~ ์ด์ œ๋ถ€ํ„ฐ๋Š” CNN์ด ๋ฐœ์ „ํ•ด์™”๋˜ ๊ณผ์ •์„ ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ํ†ตํ•ด ์•Œ๋ ค๋“œ๋ฆด๊นŒํ•ด์š”. ๊ทธ๋ž˜์„œ ์ด๋ฒˆ์žฅ์—์„œ๋Š” ๊ทธ ์ฒซ ๋ฒˆ์งธ ๋ชจ๋ธ์ด๋ผ ํ•  ์ˆ˜ ์žˆ๋Š” AlexNet์— ๋Œ€ํ•ด์„œ ์†Œ๊ฐœ์‹œ์ผœ๋“œ๋ฆด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค! AlexNet์˜ ๋…ผ

89douner.tistory.com

 

https://blog.naver.com/PostView.nhn?isHttpsRedirect=true&blogId=laonple&logNo=220818841217 

 

[Part โ…ฅ. CNN ํ•ต์‹ฌ ์š”์†Œ ๊ธฐ์ˆ ] 2. Dropout [1] - ๋ผ์˜จํ”ผํ”Œ ๋จธ์‹ ๋Ÿฌ๋‹ ์•„์นด๋ฐ๋ฏธ -

Part I. Machine Learning Part V. Best CNN Architecture Part VII. Semantic Segmentat...

blog.naver.com

 

 

 

 

 

 

 

 

Contents

ํฌ์ŠคํŒ… ์ฃผ์†Œ๋ฅผ ๋ณต์‚ฌํ–ˆ์Šต๋‹ˆ๋‹ค

์ด ๊ธ€์ด ๋„์›€์ด ๋˜์—ˆ๋‹ค๋ฉด ๊ณต๊ฐ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค!