Lecture 3: Loss Function and Optimization

4 minute read

What is Loss Function?

์†์‹คํ•จ์ˆ˜(loss function) ์€ ๋ถ„๋ฅ˜๊ธฐ (classifier)๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ž‘๋™ํ•˜๋Š”์ง€ ์•Œ๋ ค์ฃผ๋Š” ์ค‘์š”ํ•œ ์ง€ํ‘œ์ด๋‹ค. ์†์‹คํ•จ์ˆ˜์˜ ๊ณต์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

$L=\frac{1}{N}\sum_i L_i(f(x_i,w),y_i)$

์—ฌ๊ธฐ์„œ $x_i$๋Š” ์ด๋ฏธ์ง€, $y_i$๋Š” ๋ ˆ์ด๋ธ”์ด๋‹ค.

Multiclass SVM(Support Vector Machine) loss

Multiclass SVM loss ๋Š” ์˜ˆ์ธกํ•œ ๋ ˆ์ด๋ธ”์˜ ์ ์ˆ˜ ์™€ ์นดํ…Œ๊ณ ๋ฆฌ๋ณ„ ์ ์ˆ˜ ์˜ ์ฐจ๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๋งŒ์•ฝ 0๋ณด๋‹ค ์ž‘์œผ๋ฉด 0์ด, 0๋ณด๋‹ค ํฌ๋ฉด ๊ทธ ์ฐจ์ด๊ฐ’์ด ์ฑ„ํƒ์ด ๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ฑ„ํƒ์ด ๋œ ๊ฐ’์„ ๋ชจ๋‘ ๋”ํ•œ๋‹ค. ์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

$L_i=\sum_{j\neq y_i} max(0,s_j-s_{y_i}+1)$

์—ฌ๊ธฐ์„œ $s_j$๋Š” ์˜ˆ์ธกํ•œ ๋ ˆ์ด๋ธ”์˜ ์ ์ˆ˜, $s_{y_i}$๋Š” ํด๋ž˜์Šค๋ณ„ ์ ์ˆ˜ ์ด๋‹ค. ๋’ค์— ๋ถ™๋Š” ์ˆซ์ž 1์€ ๋‹จ์ง€ ์ž„์˜๋กœ ์„ค์ •ํ•œ ๊ฐ’์œผ๋กœ, $s_j-s_{y_i}$ ๊ฐ’์ด 0์ด ๋  ๋•Œ 0๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ ๋” ํฐ ๊ฐ’์ด ๋‚˜์˜ค๋„๋ก ํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค.

์˜ˆ์‹œ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์€ 3๊ฐœ์˜ ๋‹ค๋ฅธ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ์‚ฌ์ง„๊ณผ ํด๋ž˜์Šค๋ณ„ ์ ์ˆ˜๊ฐ€ ์žˆ๋‹ค.

์—ฌ๊ธฐ์„œ ๊ณ ์–‘์ด ์‚ฌ์ง„์˜ ์†์‹ค, ์ฆ‰ $L_1$ ์˜ ๊ฐ’์€ $max(0,5.1-3.2+1)+max(0,-1.7-3.2+1) = 2.9+0 = 2.9$ ์ด๋‹ค. ์ฐจ ์‚ฌ์ง„์˜ ์†์‹ค๊ณผ ๊ฐœ๊ตฌ๋ฆฌ ์‚ฌ์ง„์˜ ์†์‹ค๋„ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

$L_2=max(0,1.3-4.9+1)+max(0,2-4.9+1) = 0$ $L_3=max(0,2.2-(-3.1)+1)+max(0,2.5-(-3.1)+1) = 6.3+6.6 = 12.9$

$L_1, L_2, L_3$ ์˜ ํ‰๊ท ์€ $\frac{2.9+0+12.9}{3} = 5.27$ ์ด๊ณ  ์ด ์ ์ˆ˜๊ฐ€ ์ด ๋ถ„๋ฅ˜๊ธฐ์˜ ์†์‹ค๊ฐ’ ์ด๋‹ค.
Multiclass SVM loss์˜ ๋ช‡ ๊ฐ€์ง€ ํŠน์ง•์„ ์‚ดํŽด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

Features of Multiclass SVM loss

  1. ์˜ˆ์ธกํ•œ ๋ ˆ์ด๋ธ”์˜ ์ ์ˆ˜๋Š” ๊ทธ๋Œ€๋กœ์ด๊ณ  ์‹ค์ œ ๋ ˆ์ด๋ธ”์˜ ์ ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด ์†์‹ค ๊ฐ’์˜ ํฌ๊ธฐ๊ฐ€ ๊ฐ์†Œํ•œ๋‹ค. ์•„๋ž˜ ๊ทธ๋ž˜ํ”„๋Š” ์ด๋ฅผ ๋‚˜ํƒ€๋‚ด์—ˆ๋‹ค.

  1. ์†์‹ค๊ฐ’์€ ์ตœ์†Œ 0 (์™„๋ฒฝํ•˜๊ฒŒ ๋ถ„๋ฅ˜ํ•จ), ์ตœ๋Œ€ ๋ฌดํ•œ๋Œ€ (ํ•˜๋‚˜๋„ ๋ถ„๋ฅ˜๋ฅผ ๋ชปํ•จ)์˜ ๊ฐ’์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋‹ค.

  2. ์†์‹ค๊ฐ’์„ 0์œผ๋กœ ๋งŒ๋“œ๋Š” ๊ฐ€์ค‘์น˜ $W$ ์ด ์žˆ๋‹ค๊ณ  ํ•  ๋•Œ $2W$ ๋˜ํ•œ ์†์‹ค๊ฐ’์„ 0์œผ๋กœ ๋งŒ๋“ ๋‹ค.

  3. Multiclass SVM loss ๋ฅผ ํŒŒ์ด์ฌ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

def L_i_vectorized(x,y,W):
    scores = W.dot(x)
    margins = np.maximum(0, scores-scores[y]+1)
    margins[y] = 0
    loss_i = np.sum(margins)
    return loss_i

์•ž์„œ ์„ค๋ช…ํ•œ ํ•จ์ˆ˜๋ฅผ ๊ทธ๋Œ€๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ์ด๋ฏ€๋กœ ์ฝ”๋“œ์— ๋Œ€ํ•œ ๋ณ„๋„์˜ ์„ค๋ช…์€ ์ƒ๋žตํ•˜๋„๋ก ํ•˜๊ฒ ๋‹ค.

Regularization

Regularization, ์ฆ‰ ์ •์น™ํ™”๋Š” ๋ชจ๋ธ์ด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๊ณผ์ ํ•ฉํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ ํ•˜๊ณ  ๋‹จ์ˆœํ•˜๊ฒŒ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋œ๋‹ค. ์ •์น™ํ™” ํ•จ์ˆ˜๋Š” ์†์‹คํ•จ์ˆ˜ ๋’ค์— ๋ถ™๊ฒŒ๋œ๋‹ค. Normalization์ด ์ •๊ทœํ™”, Regularization์ด ์ •์น™ํ™” ์ด๋ฏ€๋กœ ์ด ๋‘๊ฐœ์˜ ๋‹จ์–ด๋ฅผ ์ž˜ ๊ตฌ๋ถ„ํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋„๋ก ํ•˜์ž.

$L=\frac{1}{N}\sum_i L_i(f(x_i,w),y_i)+\Lambda R(w)$

์ •์น™ํ™”์—๋Š” ๋Œ€ํ‘œ์ ์œผ๋กœ L2 ์ •์น™ํ™”, L1 ์ •์น™ํ™” ๊ฐ€ ์žˆ๋‹ค. ๊ฐ๊ฐ์˜ ๊ณต์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • L2 ์ •์น™ํ™”: $R(W)=\sum_k\sum _l{W^2}$
  • L1 ์ •์น™ํ™”: $R(W)=\sum_k\sum _l\vert W \vert$

L1 ์ •์น™ํ™”๋Š” ์ค‘์š”ํ•˜์ง€ ์•Š์€ ํŠน์ง•์„ 0์œผ๋กœ ๋งŒ๋“ค์–ด์ฃผ๊ณ , L2 ์ •์น™ํ™”๋Š” ์ค‘์š”ํ•˜์ง€ ์•Š์€ ํŠน์ง•์„ 0์— ๊ฐ€๊น๊ฒŒ ๋งŒ๋“ค์–ด์ฃผ๋‚˜ ์‹ค์ œ๋กœ 0์ด ๋˜์ง€๋Š” ์•Š๋Š”๋‹ค.

๋‹ค์Œ๊ณผ ๊ฐ™์ด 2๊ฐœ์˜ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ž.

$W1=[1,0,0,0]$
$W2=[0.25,0.25,0.25,0.25]$

$[1,1,1,1]$ ์˜ ๊ฐ’์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ X ์™€ $W1, W2$ ์„ ๊ฐ๊ฐ ๊ณฑํ•˜๋ฉด ์†์‹ค๊ฐ’์ด ๋™์ผํ•˜๊ฒŒ 1๋กœ ๋‚˜์˜ค์ง€๋งŒ $W1$ ์ฒ˜๋Ÿผ ๊ฐ€์ค‘์น˜๊ฐ€ ํ•œ ํŠน์ง•์—๋งŒ ์ง‘์ค‘๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ L1 ์ •์น™ํ™” ๋ฅผ, $W2$ ์ฒ˜๋Ÿผ ๊ฐ€์ค‘์น˜๊ฐ€ ๋ชจ๋“  ํŠน์ง•์— ๊ณจ๊ณ ๋ฃจ ๋ถ„๋ฐฐ๋˜์–ด ์žˆ์œผ๋ฉด L2 ์ •์น™ํ™” ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.

Softmax Classifier (Multinomial Logistic Regression)

Softmax Classifier ์˜ ์†์‹ค ๊ฐ’์€ ๊ฐ ํด๋ž˜์Šค์— ์ •๊ทœํ™” ๋œ ๋กœ๊ทธ ํ™•๋ฅ  ์„ ์ ์šฉํ•œ ๊ฐ’์ด๋‹ค. Softmax Classifier ์˜ ์†์‹คํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„ํ•œ๋‹ค.

$L_i=-log(\frac{e^{s_{y_i}}}{\sum _je^{s_j}})$

์‹์ด ์ดํ•ดํ•˜๊ธฐ ์–ด๋ ค์šฐ๋ฏ€๋กœ ์•„๊นŒ์™€ ๊ฐ™์€ ์‚ฌ์ง„์„ ์˜ˆ์‹œ๋กœ ๋“ค์–ด ์„ค๋ช…ํ•˜๋„๋ก ํ•˜๊ฒ ๋‹ค.

์•„๊นŒ์™€ ๋‹ค๋ฅธ์ ์€ ํด๋ž˜์Šค๋ณ„ ์ ์ˆ˜๊ฐ€ ์ •๊ทœํ™” ๋˜์ง€ ์•Š์€ ๋กœ๊ทธ ํ™•๋ฅ  ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋จผ์ € ๊ณ ์–‘์ด ์‚ฌ์ง„๋ถ€ํ„ฐ ์‚ดํŽด๋ณด์ž. ๊ณ ์–‘์ด์˜ ์†์‹คํ•จ์ˆ˜ $L_1$ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ˆœ์„œ๋กœ ๊ตฌํ•œ๋‹ค.

  • ํด๋ž˜์Šค๋ณ„ ์ ์ˆ˜์— ์ž์—ฐํ•จ์ˆ˜$e$ ๋ฅผ ์ทจํ•œ๋‹ค.
  • ์ž์—ฐํ•จ์ˆ˜๋ฅผ ์ทจํ•œ ๊ฐ’์„ ์ •๊ทœํ™” ์‹œํ‚จ๋‹ค.
  • ์ •๊ทœํ™” ์‹œํ‚จ ๊ฐ’์— ๋กœ๊ทธ๋ฅผ ์ทจํ•œ๋‹ค.
  • ๋กœ๊ทธ๋ฅผ ์ทจํ•œ ๊ฐ’ ์•ž์— ๋งˆ์ด๋„ˆ์Šค ๋ถ€ํ˜ธ๋ฅผ ๋ถ™์ด๋ฉด ์ด๊ฒƒ์ด ์ •๊ทœํ™” ๋œ ๋กœ๊ทธ ํ™•๋ฅ  ์ด๋‹ค.

์•„๋ž˜ ์‚ฌ์ง„์ด ์œ„ ๊ณผ์ •์„ ํ•œ๋ฒˆ์— ๋ณด์—ฌ์ค€๋‹ค.

์œ„ ๊ณผ์ •์„ ์ฐจ ์‚ฌ์ง„, ๊ฐœ๊ตฌ๋ฆฌ ์‚ฌ์ง„์— ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ ์šฉํ•˜๊ณ  ๊ฐ ์†์‹ค ๊ฐ’์˜ ํ‰๊ท ์— ์ •์น™ํ™” ํ•จ์ˆ˜๋ฅผ ๋”ํ•œ ๊ฐ’์ด ๊ณง ์ด ๋ถ„๋ฅ˜๊ธฐ์˜ ์†์‹ค ๊ฐ’์ด ๋œ๋‹ค.

$L=\frac{1}{N}\sum_{i=1} L_i+R(w)$

Softmax Classifier ์˜ ๋ช‡๊ฐ€์ง€ ํŠน์ง•์„ ์‚ดํŽด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

Features of Softmax Classifier

  1. ์†์‹ค ๊ฐ’์˜ ์ตœ์†Œ ๊ฐ’์€ 0, ์ตœ๋Œ€ ๊ฐ’์€ ๋ฌดํ•œ๋Œ€๊ฐ€ ๋œ๋‹ค. ๋‹จ, ์†์‹ค ๊ฐ’์ด ์‹ค์ œ๋กœ 0์ด ๋˜์ง€๋Š” ์•Š๋Š”๋‹ค.
  2. ๋ชจ๋“  ํด๋ž˜์Šค๋ณ„ ์ ์ˆ˜๋Š” ์ดˆ๊ธฐ์—๋Š” 0์— ๊ฐ€๊นŒ์šฐ๋ฏ€๋กœ ์ดˆ๊ธฐ ์†์‹ค ๊ฐ’์€ $log(C)$ ์˜ ํ˜•ํƒœ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

Difference and Similarity between Multiclass SVM and Softmax Classifier

  • Multiclass SVM์€ ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์ด๋ฏธ์ง€๋ฅผ ์•Œ๋งž๊ฒŒ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์— ์„ฑ๊ณตํ–ˆ์œผ๋ฉด ๋” ์ข‹์€ ๊ฐ€์ค‘์น˜์˜ ๊ฐ’์„ ์ฐพ๋Š” ๊ณ„์‚ฐ์„ ์ค‘๋‹จํ•œ๋‹ค.

  • ๋ฐ˜๋ฉด Softmax Classifier์€ ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์ด๋ฏธ์ง€๋ฅผ ์•Œ๋งž๊ฒŒ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์— ์„ฑ๊ณตํ–ˆ์–ด๋„ ๋” ์ข‹์€ ๊ฐ€์ค‘์น˜์˜ ๊ฐ’์„ ์ฐพ๊ธฐ ์œ„ํ•ด ๊ณ„์‚ฐ์„ ๋ฐ˜๋ณตํ•ด ๋‚˜๊ฐ„๋‹ค.

  • ๊ทธ๋Ÿฌ๋‚˜ ๋‘ ๋ถ„๋ฅ˜๊ธฐ ๋ชจ๋‘ ์„ฑ๋Šฅ์€ ๋น„์Šทํ•˜๋‹ค.

Optimization

๋“ฑ์‚ฐ์„ ํ•˜๊ณ  ์žˆ๋Š” ์‚ฌ๋žŒ์ด ์žˆ๋‹ค๊ณ  ํ•˜์ž. ๊ทธ ์‚ฌ๋žŒ์€ ์‚ฐ์—์„œ ๊ฐ€์žฅ ๊ณ ๋„๊ฐ€ ๋‚ฎ์€ ์ง€์ ์„ ์ฐพ์•„ ๋‚ด๋ ค๊ฐ€๋Š” ์‹œ๋„๋ฅผ ํ•˜๊ณ  ์žˆ๋‹ค. ์‚ฐ์„ ๊ทธ๋ž˜ํ”„, ์‚ฌ๋žŒ์„ ์†์‹ค๊ฐ’ ์œผ๋กœ ์น˜ํ™˜ํ•˜๋ฉด ์ด๊ฒƒ์ด ๋ฐ”๋กœ ์ตœ์ ํ™” (optimization) ๊ณผ์ •์ด๋‹ค. ์ฆ‰, ๊ฐ€์žฅ ์ž‘์€ ์†์‹ค๊ฐ’์„ ์ฐพ๋Š” ๊ณผ์ •์ด ์ตœ์ ํ™” ๊ณผ์ •์ด๋‹ค.

์ตœ์ ํ™” ๊ณผ์ •์—๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์žˆ๋‹ค.

  1. Random search: ๋ง ๊ทธ๋Œ€๋กœ ๋žœ๋คํ•˜๊ฒŒ ํฌ์ธํŠธ๋ฅผ ์ฐ์–ด ๊ทธ๊ฒƒ์ด ์ตœ์†Œ๊ฐ’์ด๊ธธ ๋ฐ”๋ผ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ๋‚˜์˜๊ณ  ๋ฌด์‹ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด์—ฌ์„œ ์ž˜ ์“ฐ์ด์ง€ ์•Š๋Š”๋‹ค.

  2. Gradient descent: ์†์‹คํ•จ์ˆ˜์˜ ํ•œ ์ ์—์„œ์˜ ๋ฏธ๋ถ„๊ฐ’์„ ๊ตฌํ•˜๋ฉฐ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ฐพ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ๊ธฐ์šธ๊ธฐ๋ฅผ ์•Œ๋ฉด ํ•จ์ˆ˜๊ฐ€ ์ฆ๊ฐ€/๊ฐ์†Œ ํ•˜๋Š”์ง€ ์•Œ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๊ณ„์† ๋ฐ˜๋ณตํ•˜๋ฉฐ ๊ธฐ์šธ๊ธฐ๋ฅผ ๋”ฐ๋ผ๊ฐ€๋‹ค ๋ณด๋ฉด ๊ฒฐ๊ตญ ์ตœ์†Œ ์†์‹ค๊ฐ’์„ ์ฐพ์„ ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค. Gradient descent ์—๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค.

    • Numerical gradient: ๋ฏธ๋ถ„ํ•จ์ˆ˜ ๊ณต์‹ $\lim_{h\rightarrow0 }\frac{f(w+h)-f(w)}{h}$ ์— ์ง์ ‘ ์ˆซ์ž๋ฅผ ๋Œ€์ž…ํ•ด๊ฐ€๋ฉฐ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๊ณ„์‚ฐ ๊ณผ์ •์ด ๋Š๋ฆฌ๊ณ  ์ •ํ™•ํ•˜์ง€ ์•Š์ง€๋งŒ ์ตœ์ ํ™” ๊ณผ์ •์„ ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค.
    • Analytic gradient: ๋„ํ•จ์ˆ˜, ์ฆ‰ $fโ€™(x)$๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๊ณ„์‚ฐ ๊ณผ์ •์ด ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•˜์ง€๋งŒ ์ตœ์ ํ™” ๊ณผ์ •์„ ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.

    Gradient descent ๋ฅผ ํŒŒ์ด์ฌ ์ฝ”๋“œ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

     while True:
         weights_grad = evaluate_gradient(loss_fun, data, weights)
         weights += -step_size * weights_grad    #step size๋Š” ํฌ์ธํŠธ๋ฅผ ์–ด๋Š์ •๋„ ์›€์ง์ผ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.
    
  3. Stochastic Gradient Descent(SGD): ์‚ฌ์ง„์˜ ๋ชจ๋“  ํ”ฝ์…€์— ๋Œ€ํ•ด ๊ฐ€์ค‘์น˜ ๊ฐ’์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์€ ๋„ˆ๋ฌด๋‚˜๋„ ๋ณต์žกํ•˜๊ณ  ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๋Š” ์ž‘์—…์ด๋‹ค. ๋”ฐ๋ผ์„œ ์ผ๋ถ€ ํ”ฝ์…€๋“ค๋งŒ ์„ ํƒํ•ด์„œ ๊ทธ ํ”ฝ์…€๋“ค๋ผ๋ฆฌ์˜ ๊ฐ€์ค‘์น˜ ๊ฐ’์„ ๊ตฌํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ ์„ ํƒํ•œ ํ”ฝ์…€๋“ค์„ Minibatch ๋ผ๊ณ  ํ•œ๋‹ค. Minibatch์— Gradient descent๋ฅผ ์ ์šฉํ•œ ๊ฒƒ์ด Stochastic Gradient Descent ์ด๋‹ค.

์ด์ „์— ๋‚ด๊ฐ€ ์ˆ˜๊ฐ•ํ•˜์˜€๋˜ Mathmatics for Machine Learning์— Gradient descent์˜ ์›๋ฆฌ๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๊ฒŒ interactive page๋กœ ๋‚˜ํƒ€๋‚˜ ์žˆ๋‹ค.

Sandpit game

Image Features

์ธ๊ณต ์‹ ๊ฒฝ๋ง์ด ๊ฐœ๋ฐœ๋˜๊ธฐ ์ „์—๋Š” ์ด๋ฏธ์ง€์˜ ํŠน์ •ํ•œ ํŠน์ง•๋งŒ ์„ ํƒํ•ด์„œ ์ด๋ฏธ์ง€๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š”๋ฐ ๊ทธ ํŠน์ง•๋งŒ์ด ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์•˜๋‹ค. ์‚ฌ์šฉ๋œ ๋Œ€ํ‘œ์ ์ธ ํŠน์ง•์€ Color Spectrum(RGB), Histogram of Oriented Gradient, Bag of Words ๊ฐ€ ์žˆ๋‹ค

๋˜ํ•œ ์™ผ์ชฝ ๊ทธ๋ฆผ์— ํ‘œํ˜„๋œ ์ง๊ต ์ขŒํ‘œ๊ณ„์— ๋‚˜ํƒ€๋‚œ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋“ค์˜ ๊ฒฝ์šฐ ๋นจ๊ฐ• ํฌ์ธํŠธ์™€ ํŒŒ๋ž‘ ํฌ์ธํŠธ๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” ์„ ํ˜• ๊ตฌ๋ถ„์„ ์„ ๊ธ‹๊ธฐ๋ž€ ์‰ฝ์ง€ ์•Š๋‹ค. ์ด๋•Œ, ์˜ค๋ฅธ์ชฝ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋“ค์„ ๊ทน์ขŒํ‘œ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ์‰ฝ๊ฒŒ ์„ ํ˜• ๊ตฌ๋ถ„์„ ์„ ๊ทธ์–ด ๋นจ๊ฐ• ํฌ์ธํŠธ์™€ ํŒŒ๋ž‘ ํฌ์ธํŠธ๋ฅผ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋‹ค.

Comments