<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>NLP &#8211; KGG Studio</title>
	<atom:link href="https://blog.kggstudio.com/category/dev/nlp/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.kggstudio.com</link>
	<description>개발자 테크 블로그</description>
	<lastBuildDate>Tue, 05 May 2026 22:52:00 +0000</lastBuildDate>
	<language>ko-KR</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.kggstudio.com/wp-content/uploads/2025/05/cropped-K-1-32x32.png</url>
	<title>NLP &#8211; KGG Studio</title>
	<link>https://blog.kggstudio.com</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">244941309</site>	<item>
		<title>NLP (1) &#8211; seq2seq</title>
		<link>https://blog.kggstudio.com/nlp-1-seq2seq/</link>
					<comments>https://blog.kggstudio.com/nlp-1-seq2seq/#respond</comments>
		
		<dc:creator><![CDATA[TimTam]]></dc:creator>
		<pubDate>Tue, 05 May 2026 22:51:59 +0000</pubDate>
				<category><![CDATA[Dev]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[seq2seq]]></category>
		<guid isPermaLink="false">https://blog.kggstudio.com/?p=530</guid>

					<description><![CDATA[seq2seq 이 수식을 기억하시나요? p(y1,…,yT′∣x1,…,xT)=Πt=1T′p(yt∣v,y1,…,yt−1)p(y1​,…,yT′​∣x1​,…,xT​)=Πt=1T′​p(yt​∣v,y1​,…,yt−1​) Encoder가 생성한 컨텍스트 벡터 v 를 Embedding 레이어를 거친 y 값에 Concatnate하여 위 수식을 비로소 만족하게 됩니다. 우리가 Seq2seq를 완성한 거죠! LSTM Encoder Embedding 레이어를 단어 사이즈와 Embedding 차원에 대해 선언을 한 후, 논문에서 소개한 대로 torch.nn.LSTM(enc_units)으로 LSTM을 정의합니다. Pytorch 속 LSTM 모듈의 기본 반환 값은 최종 State 값이므로 return_sequences ... <a title="NLP (1) &#8211; seq2seq" class="read-more" href="https://blog.kggstudio.com/nlp-1-seq2seq/" aria-label="NLP (1) &#8211; seq2seq에 대해 더 자세히 알아보세요">더 읽기</a>]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">seq2seq</h2>



<figure class="wp-block-image size-full"><img fetchpriority="high" decoding="async" width="490" height="600" data-attachment-id="531" data-permalink="https://blog.kggstudio.com/nlp-1-seq2seq/image-91/#main" data-orig-file="https://blog.kggstudio.com/wp-content/uploads/2026/05/image.png" data-orig-size="490,600" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-large-file="https://blog.kggstudio.com/wp-content/uploads/2026/05/image.png" src="https://blog.kggstudio.com/wp-content/uploads/2026/05/image.png" alt="" class="wp-image-531" srcset="https://blog.kggstudio.com/wp-content/uploads/2026/05/image.png 490w, https://blog.kggstudio.com/wp-content/uploads/2026/05/image-245x300.png 245w" sizes="(max-width: 490px) 100vw, 490px" /></figure>



<p>이 수식을 기억하시나요?</p>



<p><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi><mrow><mo fence="true">(</mo><msub><mi>y</mi><mn>1</mn></msub><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msub><mi>y</mi><msup><mi>T</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup></msub><mo>∣</mo><msub><mi>x</mi><mn>1</mn></msub><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msub><mi>x</mi><mi>T</mi></msub><mo fence="true">)</mo></mrow><mo>=</mo><msubsup><mi mathvariant="normal">Π</mi><mrow><mi>t</mi><mo>=</mo><mn>1</mn></mrow><msup><mi>T</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup></msubsup><mi>p</mi><mrow><mo fence="true">(</mo><msub><mi>y</mi><mi>t</mi></msub><mo>∣</mo><mi>v</mi><mo separator="true">,</mo><msub><mi>y</mi><mn>1</mn></msub><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msub><mi>y</mi><mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow></msub><mo fence="true">)</mo></mrow></mrow></semantics></math><em>p</em>(<em>y</em>1​,…,<em>y</em><em>T</em>′​∣<em>x</em>1​,…,<em>x</em><em>T</em>​)=Π<em>t</em>=1<em>T</em>′​<em>p</em>(<em>y</em><em>t</em>​∣<em>v</em>,<em>y</em>1​,…,<em>y</em><em>t</em>−1​)</p>



<p><code>Encoder</code>가 생성한 컨텍스트 벡터 v 를 Embedding 레이어를 거친 y 값에 Concatnate하여 위 수식을 비로소 만족하게 됩니다. 우리가 Seq2seq를 완성한 거죠!</p>



<h3 class="wp-block-heading">LSTM Encoder</h3>



<pre class="wp-block-code"><code>import torch.nn as nn

class Encoder(nn.Module):
    def __init__(self, input_dim, emb_dim, hidden_dim):
        super().__init__()

        self.embedding = nn.Embedding(input_dim, emb_dim)
        self.rnn = nn.LSTM(emb_dim, hidden_dim, batch_first=True)

    def forward(self, src):
        print("입력 Shape:", src.size())

        embedded = self.embedding(src)
        print("Embedding Layer를 거친 Shape:", embedded.size())

        outputs, (h_0, c_0) = self.rnn(embedded)
        print("LSTM Layer의 Output Shape:", outputs.size())
        print("LSTM Layer의 Hidden State Shape:", h_0.size())
        print("LSTM Layer의 Cell State Shape:", c_0.size())

        return outputs, h_0, c_0</code></pre>



<p>Embedding 레이어를 단어 사이즈와 Embedding 차원에 대해 선언을 한 후, 논문에서 소개한 대로 <code>torch.nn.LSTM(enc_units)</code>으로 LSTM을 정의합니다. <em>Pytorch</em> 속 LSTM 모듈의 기본 반환 값은 <strong>최종 State 값</strong>이므로 <code>return_sequences</code> 나 <code>return_state</code> 값은 따로 조정하지 않습니다 (기본: False). 즉, 우리가 정의해 준 <code>Encoder</code> 클래스의 반환 값이 곧 <strong>컨텍스트 벡터(Context Vector)</strong> 가 되는 겁니다. 추가적인 옵션이 궁금하시다면 아래의 Pytorch LSTM 공식 문서를 참조하시면 좋습니다.</p>



<pre class="wp-block-code"><code>vocab_size = 30000
emb_size = 256
lstm_size = 512
batch_size = 1
sample_seq_len = 3

print("Vocab Size: {0}".format(vocab_size))
print("Embedidng Size: {0}".format(emb_size))
print("LSTM Size: {0}".format(lstm_size))
print("Batch Size: {0}".format(batch_size))
print("Sample Sequence Length: {0}\n".format(sample_seq_len))</code></pre>



<pre class="wp-block-code"><code>Vocab Size: 30000
Embedidng Size: 256
LSTM Size: 512
Batch Size: 1
Sample Sequence Length: 3</code></pre>



<pre class="wp-block-code"><code>import torch

encoder = Encoder(vocab_size, emb_size, lstm_size)
sample_input = torch.randint(0, vocab_size, (batch_size, sample_seq_len))

sample_output, hidden, cell = encoder(sample_input)</code></pre>



<h3 class="wp-block-heading">LSTM Decoder</h3>



<p>class Decoder(nn.Module):<br>def <strong>init</strong>(self, vocab_size, embedding_dim, hidden_dim):<br>super(Decoder, self).<strong>init</strong>()<br>self.embedding = nn.Embedding(vocab_size, embedding_dim)<br>self.lstm = nn.LSTM(embedding_dim + hidden_dim, hidden_dim, batch_first=True)<br>self.fc = nn.Linear(hidden_dim, vocab_size)</p>



<pre class="wp-block-code"><code>def forward(self, x, hidden, cell, context):
    print("입력 Shape:", x.size())

    embedded = self.embedding(x)
    print("Embedding Layer를 거친 Shape:", embedded.size())

    embedded = torch.cat((embedded, context), dim=2)
    print("Context Vector가 더해진 Shape:", embedded.size())

    output, (hidden, cell) = self.lstm(embedded, (hidden, cell))
    print("LSTM Layer의 Output Shape:", output.size())

    output = self.fc(output)
    print("Decoder 최종 Output Shape:", output.size())

    return output, hidden, cell</code></pre>



<p><code>Decoder</code>는 <code>Encoder</code>와 구조적으로 유사하지만 결과물을 생성해야 하므로 Fully Connected 레이어가 추가되었고, 출력값을 확률로 변환해 주는 Softmax 함수도 추가되었습니다 (Softmax는 모델 내부에 포함시키지 않아도 훈련 과정에서 포함시키는 방법도 있습니다). 그리고 <code>Decoder</code>가 매 스텝 생성하는 출력은 우리가 원하는 번역 결과에 해당하므로 LSTM 레이어의 <code>return_sequences</code> 변수를 <code>True</code>로 설정하여 State 값이 아닌 Sequence 값을 출력으로 받습니다.</p>



<pre class="wp-block-code"><code>print("Vocab Size: {0}".format(vocab_size))
print("Embedidng Size: {0}".format(emb_size))
print("LSTM Size: {0}".format(lstm_size))
print("Batch Size: {0}".format(batch_size))
print("Sample Sequence Length: {0}\n".format(sample_seq_len))</code></pre>



<pre class="wp-block-code"><code>decoder_input = torch.randint(0, vocab_size, (batch_size, sample_seq_len))  # (batch_size, seq_length)

decoder = Decoder(vocab_size, emb_size, lstm_size)

dec_output, hidden, cell = decoder(decoder_input, hidden, cell, sample_output)</code></pre>



<p></p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.kggstudio.com/nlp-1-seq2seq/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">530</post-id>	</item>
	</channel>
</rss>
