[2018] Deep contextualized word representations(ELMo)

0. Abstract
1. Intoduction
2. ELMo: Embeddings from Langauges Models

2018년도에 자연어 처리와 관련해서 중요한 개념을 소개한 논문이 여러개 발표되었다. 그 중 하나인 ELMo에 대해 간단히 정리하고자 한다. 논문은 여기서 확인할 수 있다.

핵심 개념: contextualized word representaion

0. Abstract

자연어를 token 단위로 임베딩하는 많은 방법들 중 Word2Vec, BoW 등이 유행하였다. 이중 Word2Vec은 단어들 간의 co-occurence 정보를 반영하는 방법을 통해 '주변 단어의 정보'를 활용하는 개념을 적용하였다. 하지만 어떤 token의 임베딩이 문맥과 의미와 관계 없이 동일하다는 단점이 있었다. 때문에 동음이의어를 처리할 때 문제가 발생한다.

ELMo 논문에서는 이러한 한계점을 탈피하는 '문맥 정보를 반영한 임베딩'을 제안하였다.

deep contextualized word representation: 언어학적 문맥 정보와 다양하고 복잡한 단어의 사용성 반영
bidirectional LM으로 word vector를 계산하고 대규모 데이터로 pre-train 후 task에 fine-tuning

1. Intoduction

pre-training word representation은 아주 좋은 아이디어로 평가받고 있다. 하지만 'good representation'을 얻는 것이 어렵다. 저자들은 이 'good representation'이 무엇인지 고민했다고 한다.

high-quality word representation은 무엇일까?

- it models complex characteristics of word use(e.g. syntax, semantics)

- it models how these uses vary across linguistic contexts

어떤 단어의 의미가 문맥에 따라 달라지는 정보를 반영한다는 것은 word representation이 input sentence에 대한 함수에 의해 계산되는 것이라고 생각할 수 있다. 이를 적용하기 위해 아주 큰 corpus에서 LM objective로 학습된 bi-directional LSTM output vector를 사용하였으며, 때문에 ELMo(Embeddings from Language Models) representation이라 부르게 되었다.

2. ELMo: Embeddings from Langauges Models

ELMo word representation을 얻기 위해서는 context를 반영해야 하는데, 이런 저런 이유로 bidirectional LSM을 사용했다고 한다.

- 각 토큰 $t_{k}$ 에 대해서 $L$ 개의 biLM이 $2 L + 1$ 개의 set of representation을 계산

$R_{k} = {x_{k}^{L M}, {\vec{h}}_{k, h}^{L M}, {\overset{\leftarrow}{h}}_{k, h}^{L M} | j = 1,,, L} = {h_{k, j}^{L M} | j = 0, 1,,, L}$

$h_{k, 0}^{L M}$ = token layer

$h_{k, j}^{L M} = [{\vec{h}}_{k, h}^{L M}; {\overset{\leftarrow}{h}}_{k, h}^{L M}]$ for each bi-LSTM layer

-이를 종합하여 하나의 word representation 얻음

$E L M o_{k} = E (R_{k}; Θ_{e})$

$E L M o_{k}^{t a s k} = E (R_{k}; Θ^{t a s k}) = γ^{t a s k} \sum_{j = 0}^{L} s_{j}^{t a s k} h_{k, j}^{L M}$

728x90

저작자표시 비영리 동일조건

'논문 및 개념 정리' 카테고리의 다른 글

[2017] On Calibration of Modern Neural Networks (0)	2021.05.26
[2018] Universal Language Model Fine-tuning for Text Classification(ULMfiT) (0)	2021.03.15
[2017] Attention is All you Need (0)	2021.03.15
Big Bird Implementation details (0)	2021.02.16
[2019] Big Bird: Transformers for Longer Sequences (0)	2021.02.15

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[2018] Deep contextualized word representations(ELMo)

0. Abstract

1. Intoduction

2. ELMo: Embeddings from Langauges Models

'논문 및 개념 정리' 카테고리의 다른 글

0. Abstract

1. Intoduction

2. ELMo: Embeddings from Langauges Models

'논문 및 개념 정리' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역