์ƒˆ์†Œ์‹

๐Ÿ“‘ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ/NLP

Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing

2022. 7. 23. 00:36

  • -



Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing

 

 

๐Ÿ’ก๋“ค์–ด๊ฐ€๊ธฐ ์ „ ๊ฐœ๋… ์ •๋ฆฌ

 

  • Semantic parsing
    • ์ž์—ฐ์–ด ๋ฐœํ™”(NLU)๋ฅผ ๊ธฐ๊ณ„๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” formal meaning representation(MR)๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ

์ถœ์ฒ˜: https://kilian.evang.name/sp/lectures/intro.pdf

                  -> a~c: NLU, d: MR

 

  • MR(meaning representation)
    • ์–ธ์–ด์  input์˜ ์˜๋ฏธ๋ฅผ ํฌ์ฐฉํ•˜๋Š” ํ˜•์‹์  ๊ตฌ์กฐ(formal structure)
    • ๋ฏธ๋ฌ˜ํ•œ ์–ธ์–ด์  ๋‰˜์•™์Šค์™€ ์„ธ์ƒ์— ๋Œ€ํ•œ ๋น„์–ธ์–ด์  ์ƒ์‹ ์‚ฌ์ด์˜ ๋‹ค๋ฆฌ(bridge)๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Œ
    • ex) ์ƒ๋Œ€๋ฐฉ์ด ๋‚˜๋ฅผ ์นญ์ฐฌํ•œ ๊ฑด์ง€ ์š•ํ•œ ๊ฑด์ง€ ์•„๋Š” ๋ฐฉ๋ฒ•
    • -> ์–ธ์–ด์  input(์ƒ๋Œ€๋ฐฉ์˜ ๋ง)์„ meaningful structure๋กœ ๋ถ„ํ•ดํ•˜๊ณ , ์ด๋ฅผ ์‹ค์ œ ์„ธ๊ณ„์— ๋Œ€ํ•œ ์ง€์‹(์ƒ๋Œ€๋ฐฉ์— ๋Œ€ํ•œ ์ •๋ณด, ์ƒ๋Œ€๋ฐฉ๊ณผ์˜ ๊ด€๊ณ„, ์ด์ „ ๊ฒฝํ—˜ ๋“ฑ)๊ณผ ์—ฐ๊ฒฐํ•จ์œผ๋กœ์จ ์ƒ๋Œ€๋ฐฉ์˜ ์˜๋„๋ฅผ ์•Œ ์ˆ˜ ์žˆ์Œ
    • meaning representation ๋ฐฉ์‹
      • First Order Logic(1์ฐจ ๋…ผ๋ฆฌ)
      • Abstract Meaning Representation(AMR) using a directed graph
      • Abstract Meaning Representation(AMR) using the textual form
      • Frame-Based or Slot filter representation
    • → ์ด 4๊ฐ€์ง€ ๋ฐฉ์‹ ๋ชจ๋‘ meaning representation์€ ๋Œ€์ƒ์— ํ•ด๋‹นํ•˜๋Š” ๊ตฌ์กฐ, ์†์„ฑ๊ณผ ๋Œ€์ƒ๋“ค ๊ฐ„์˜ ๊ด€๊ณ„(relation)๋กœ ๊ตฌ์„ฑ๋œ๋‹ค๋Š” ์ ์„ ๊ณต์œ ํ•จ 
     

์ถœ์ฒ˜: https://towardsdatascience.com/meaning-representation-and-srl-assuming-there-is-some-meaning-741f35bfdd6

 

 

  •  representation
    • ์‹ค์ œ ํ…์ŠคํŠธ๋ฅผ ์–ธ์–ด ๋ชจ๋ธ์ด ์—ฐ์‚ฐํ•  ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“  ํ˜•ํƒœ
    • ๋“ฑ์žฅ ํšŸ์ˆ˜ ๊ธฐ๋ฐ˜๊ณผ ๋ถ„ํฌ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‚˜๋ˆ ์ง

 

 

 

Abstract

 

  • Open-text semantic parser๋Š” MR(meaning representation)์„ ์ถ”๋ก ํ•ด ์ž์—ฐ์–ด์˜ ๋ชจ๋“  ๋ฌธ์žฅ์„ ํ•ด์„ํ•˜๋„๋ก ์„ค๊ณ„๋จ
  • ๋Œ€๊ทœ๋ชจ ์‹œ์Šคํ…œ๋“ค์€ ์ง€๋„ ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๋ถ€์กฑ ๋•Œ๋ฌธ์— ์‰ฝ๊ฒŒ machine-learned(๊ธฐ๊ณ„ ํ•™์Šต)๋˜๊ธฐ ํž˜๋“ฆ

 

  • ๋…ผ๋ฌธ์—์„œ๋Š” WordNet๊ณผ ๊ฐ™์€ knowledge base learning๊ณผ ์›์‹œ ํ…์ŠคํŠธ(raw text)๋ฅผ ์‚ฌ์šฉํ•œ learning์„ ๊ฒฐํ•ฉํ•œ training scheme ๋•์— ๊ด‘๋ฒ”์œ„ํ•œ ํ…์ŠคํŠธ(40,000๊ฐœ ์ด์ƒ์˜ entity์— ๋งคํ•‘๋œ 70,000๊ฐœ ์ด์ƒ์˜ ๋‹จ์–ด ์‚ฌ์ „ ์‚ฌ์šฉ)์— MR์„ ํ• ๋‹นํ•˜๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆ
    • WordNet
      • ์˜์–ด์˜ ์˜๋ฏธ ์–ดํœ˜ ๋ชฉ๋ก
      • ์˜์–ด ๋‹จ์–ด๋ฅผ 'synset'์ด๋ผ๋Š” ์œ ์˜์–ด ์ง‘๋‹จ์œผ๋กœ ๋ถ„๋ฅ˜ํ•ด ๊ฐ„๋žตํ•˜๊ณ  ์ผ๋ฐ˜์ ์ธ ์ •์˜๋ฅผ ์ œ๊ณตํ•˜๊ณ , ์ด๋Ÿฌํ•œ ์–ดํœ˜ ๋ชฉ๋ก ์‚ฌ์ด์˜ ๋‹ค์–‘ํ•œ ์˜๋ฏธ ๊ด€๊ณ„๋ฅผ ๊ธฐ๋ก
      • => ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด ํŠนํ™”๋œ ์‚ฌ์ „

 

  • ๋…ผ๋ฌธ์˜ ๋ชจ๋ธ์€ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค์—์„œ ์ž‘๋™ํ•˜๋Š” multi-task training process๋ฅผ ํ†ตํ•ด ๋‹จ์–ด, entity, MR์˜ ํ‘œํ˜„์„ ๊ณต๋™์œผ๋กœ(jointly) ํ•™์Šต
    • Multi-Task Learning
      • ์—ฐ๊ด€์žˆ๋Š” task๋“ค์„ ์—ฐ๊ฒฐ์‹œ์ผœ ๋™์‹œ์— ํ•™์Šต์‹œํ‚ด์œผ๋กœ์จ ๋ชจ๋“  task์—์„œ์˜ ์„ฑ๋Šฅ์„ ์ „๋ฐ˜์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚ค๋ ค๋Š” ํ•™์Šต ํŒจ๋Ÿฌ๋‹ค์ž„
      • ๋งŽ์€ labeled data๊ฐ€ ํ•„์š”ํ•œ๋ฐ ๋ฐ์ดํ„ฐ๋ฅผ ํ™•๋ณดํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฒฝ์šฐ Multi-task learning์ด ์ข‹์€ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์ด ๋  ์ˆ˜ ์žˆ์Œ
      • ์ธ๊ฐ„์ด ์ƒˆ๋กœ์šด ๊ฒƒ์„ ํ•™์Šตํ•  ๋•Œ ์ด์ „์— ํ•™์Šตํ–ˆ๋˜ ์œ ์‚ฌ๊ฒฝํ—˜์— ์ ‘๋ชฉ์‹œ์ผœ ๋” ๋นจ๋ฆฌ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์—์„œ ์˜๊ฐ์„ ์–ป์€ ๋ฐฉ์‹
      • https://velog.io/@riverdeer/Multi-task-Learning
  • ํ•˜๋‚˜์˜ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ semantic parsing์˜ ๋งฅ๋ฝ ๋‚ด์—์„œ knowledge acquisition๊ณผ word-sense disambiguation๋ฅผ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ์ œ๊ณต
    • knowledge acquisition
      • ์ง€์‹ ์Šต๋“
      • ์ง€์‹ ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ์— ํ•„์š”ํ•œ ๊ทœ์น™๊ณผ ์˜จํ†จ๋กœ์ง€๋ฅผ ์ •์˜ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ํ”„๋กœ์„ธ์Šค
      • https://en.wikipedia.org/wiki/Knowledge_acquisition
    • word-sense disambiguation(WSD): ๋‹จ์–ด ์˜๋ฏธ ์ค‘์˜์„ฑ ํ•ด์†Œ
      • ํ•ด๋‹น ๋ฌธ๋งฅ์—์„œ ํŠน์ • ๋‹จ์–ด๊ฐ€ ์‚ฌ์ „์  ์˜๋ฏธ ์ค‘ ์–ด๋””์— ํ•ด๋‹นํ•˜๋Š”์ง€ ์ฐพ์•„๋‚ด๋Š” ์ž‘์—…
      • ex) 1๋ฒˆ ๋ฐค->๋ฐค01, 2๋ฒˆ ๋ฐค->๋ฐค02, 3๋ฒˆ ๋ฐค->๋ฐค01 ์ฒ˜๋Ÿผ ๊ฐ๊ฐ์˜ ๋‹จ์–ด์— ๋Œ€ํ•ด ์‚ฌ์ „ ์ƒ์˜ ์˜๋ฏธ์™€ ์—ฐ๊ฒฐ ์ง€์Œ
      • ํ•ด๋‹น ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ์‚ฌ์ „์˜ ๊ฐ ์˜๋ฏธ์™€ ์—ฐ๊ฒฐํ•˜๋Š” ์ž‘์—…์ด ํ•„์ˆ˜์ ์ด๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์ „ ์ž๋ฃŒ๋‚˜ ๊ธฐํƒ€ ์ง€์‹ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ํ•„์š”๋กœ ํ•จ
      • ๋”ฐ๋ผ์„œ ๋Œ€๊ฒŒ ์ง€์‹ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•(Knowledge-based approach)์ด๋‚˜ ์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•(Supervised approach) ์‚ฌ์šฉ
      • https://bab2min.tistory.com/576

 

 

 

Introduction

 

semantic parsing์— ๊ด€ํ•œ ์—ฐ๊ตฌ๋Š” ๋Œ€๋žต 2๊ฐœ์˜ ํŠธ๋ž™์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Œ

 

  • 1) in-domain
    • ๊ณ ๋„๋กœ ์ง„ํ™”๋˜๊ณ  ํฌ๊ด„์ ์ธ MR์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ํ•™์Šต์ด ๋ชฉํ‘œ
    • ์ด๋Š” ๊ณ ๋„๋กœ annotated(์ฃผ์„์ด ๋‹ฌ๋ฆฐ)๋œ train ๋ฐ์ดํ„ฐ ์™€/๋˜๋Š” ํ•˜๋‚˜์˜ ๋„๋ฉ”์ธ์„ ์œ„ํ•ด์„œ๋งŒ ๊ตฌ์ถ•๋œ MR์„ ํ•„์š”๋กœ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋Ÿฌํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์€ ๋ณดํ†ต ์ œํ•œ๋œ ์–ดํœ˜(๋ช‡๋ฐฑ ๊ฐœ์˜ ๋‹จ์–ด)์™€ ๊ทธ์— ๋”ฐ๋ผ ์ œํ•œ๋œ MR representation์„ ๊ฐ€์ง
  • 2) open-domain ๋˜๋Š” open-text
    • ๋ชจ๋“  ์ข…๋ฅ˜์˜ ์ž์—ฐ์–ด ๋ฌธ์žฅ์— MR์„ ์—ฐ๊ด€์‹œํ‚ค๊ธฐ ์œ„ํ•œ ํ•™์Šต์ด ๋ชฉํ‘œ
    • ์‹ฌ์ธต์ ์ธ ์˜๋ฏธ ๊ตฌ์กฐ๋ฅผ ํฌ์ฐฉํ•˜๋Š” MR๋กœ ๋งŽ์€ ์–‘์˜ free text์— ๋ ˆ์ด๋ธ”์„ ์ง€์ •ํ•˜๋Š” ๊ฒƒ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋” weakํ•œ supervision(๋ถ„๋ฅ˜)
    • ๊ทธ ๊ฒฐ๊ณผ, ๋ชจ๋ธ์€ ๋” ์‹ฌํ”Œํ•œ MR์„ ์ถ”๋ก ํ•จ → shallow semantic parsing์ด๋ผ๊ณ ๋„ ํ•จ
  • ๋…ผ๋ฌธ์—์„œ๋Š” open-domain์— ๋Œ€ํ•ด ๋‹ค๋ฃธ

 

  • ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์— ๋Œ€ํ•ด 2๋‹จ๊ณ„๋กœ MR์„ ์ถ”๋ก ํ•จ
    • (1) semantic role labeling step → ์˜๋ฏธ ๊ตฌ์กฐ ์˜ˆ์ธก
    • (2) disambiguation step → ํ•™์Šต๋œ energy function์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ๊ฐ์˜ ๊ด€๋ จ ๋‹จ์–ด์— ํ•ด๋‹น entity ํ• ๋‹น

 

  • strong supervision์˜ ๋ถ€์กฑ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ์†Œ์Šค์˜ ๋ฐ์ดํ„ฐ๋กœ ํ›ˆ๋ จ๋œ energy-based model๋กœ ๊ตฌ์„ฑ๋จ
    • energy-based model
      • ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ๋‚ด์— ์žˆ๋Š” ์ž…๋ ฅ X์— ๋‚ฎ์€ ์—๋„ˆ์ง€, ๊ทธ ์™ธ์˜ ์ž…๋ ฅ์— ๋†’์€ ์—๋„ˆ์ง€๋ฅผ ์ฃผ๋Š” energy function์„ ํ•™์Šตํ•ด ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜๋Š” ๋ถ€๋ถ„์˜ ๋ถ„ํฌ๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๊ฒƒ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๋ถ€๋ถ„์˜ ๋ถ„ํฌ๋ฅผ ๋‚ฎ์ถ”๋Š” ์ƒ์„ฑ ๋ชจ๋ธ
      • https://post.naver.com/viewer/postView.naver?volumeNo=31743752&memberNo=52249799
  • ๋…ผ๋ฌธ์˜ energy-based model์€ ๋‹จ์–ด, entity, ๊ทธ๋ฆฌ๊ณ  ์ด๋“ค์˜ ์กฐํ•ฉ๋“ค ๊ฐ„์˜ ์˜๋ฏธ ์ •๋ณด๋ฅผ ๊ณต๋™์œผ๋กœ(jointly) ํฌ์ฐฉํ•˜๋„๋ก ํ•™์Šต๋จ

 

  • ๊ฐ symbol์— ๋Œ€ํ•ด ์ €์ฐจ์› ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๊ฐ€ ํ•™์Šต๋˜๋Š” distributed representation์œผ๋กœ ์ธ์ฝ”๋”ฉ
    • distributed representation: ๋ถ„ํฌ ๊ธฐ๋ฐ˜์˜ ๋‹จ์–ด ํ‘œํ˜„
      • ํƒ€๊ฒŸ ๋‹จ์–ด ์ฃผ๋ณ€์— ์žˆ๋Š” ๋‹จ์–ด ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฒกํ„ฐํ™”
      • '๋น„์Šทํ•œ ์œ„์น˜์—์„œ ๋“ฑ์žฅํ•˜๋Š” ๋‹จ์–ด๋“ค์€ ๋น„์Šทํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„๋‹ค'๋ผ๋Š” ๋ถ„ํฌ ๊ฐ€์„ค์— ๊ธฐ๋ฐ˜ํ•ด ์ฃผ๋ณ€ ๋‹จ์–ด ๋ถ„ํฌ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋‹จ์–ด์˜ ๋ฒกํ„ฐ ํ‘œํ˜„์ด ๊ฒฐ์ •๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋ถ„์‚ฐ ํ‘œํ˜„(distributed representation)์ด๋ผ๊ณ  ๋ถ€๋ฆ„
      • ex) Word2Vec, fastText
  • ๋…ผ๋ฌธ์˜ semantic matching energy function์€ ๊ทธ๋Ÿด๋“ฏํ•œ ์กฐํ•ฉ์— ๋‚ฎ์€ ์—๋„ˆ์ง€ ๊ฐ’์„ ํ• ๋‹นํ•˜๊ธฐ ์œ„ํ•ด ์ด๋Ÿฌํ•œ ์ž„๋ฒ ๋”ฉ์„ blend ํ•˜๋„๋ก ์„ค๊ณ„๋จ

 

  • WordNet, ConceptNet๊ณผ ๊ฐ™์€ ๋ฆฌ์†Œ์Šค๋Š” entity ๊ฐ„์˜ ๊ด€๊ณ„ ํ˜•ํƒœ๋กœ ์ƒ์‹(common-sense knowledge)์„ ์ธ์ฝ”๋”ฉ ํ•˜์ง€๋งŒ(ex: ~has ~part( ~car, ~wheel) ) ์ด ์ง€์‹์„ ์›์‹œ ํ…์ŠคํŠธ (๋ฌธ์žฅ๋“ค)์— ์—ฐ๊ฒฐํ•˜์ง€ ์•Š์Œ
  • ๋ฐ˜๋ฉด, Wikipedia์™€ ๊ฐ™์€ ํ…์ŠคํŠธ ๋ฆฌ์†Œ์Šค๋Š” entity๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์ง€ ์•Š์Œ
  • ๋…ผ๋ฌธ์˜ ํ•™์Šต ์ ˆ์ฐจ๋Š” ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ multi-task learning์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•จ

 

  • ์ด๋Ÿฐ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜๋ฉด ํ…์ŠคํŠธ์™€ entity ๊ฐ„์˜ ๊ด€๊ณ„์—์„œ ์œ ๋„๋œ MR์€ ๋™์ผํ•œ ๊ณต๊ฐ„์— embedded(๊ทธ๋ฆฌ๊ณ  integrated)๋จ
  • ์ด๋ฅผ ํ†ตํ•ด ๋งŽ์€ ์–‘์˜ indirect supervision๊ณผ ์ ์€ ์–‘์˜ direct supervision์„ ์‚ฌ์šฉํ•ด์„œ ์›์‹œ ํ…์ŠคํŠธ์— ๋Œ€ํ•ด disambiguation(๋ช…ํ™•ํ™”)์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ์‹์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์—ˆ์Œ
  • ๋ชจ๋ธ์€ ๋‹จ์–ด์— ๋Œ€ํ•œ ์˜ฌ๋ฐ”๋ฅธ WordNet sense(์˜๋ฏธ)๋ฅผ ์„ ํƒํ•˜๋„๋ก ์ƒ์‹(common-sense knowledge) (ex. entity ๊ฐ„์˜ WordNet relation)์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ํ•™์Šตํ•จ

 

  • open-text semantic parsing์„ ์œ„ํ•œ standard evaluation(ํ‘œ์ค€ ํ‰๊ฐ€)๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š์•„์„œ ๋ชจ๋ธ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด ๋‹ค๋ฅธ ํ‰๊ฐ€ ๋ฐฉ์‹ ์‚ฌ์šฉ
  • ๊ฒฐ๊ณผ๋Š” ๋‘ ๊ฐ€์ง€ ๋ฒค์น˜๋งˆํฌ์ธ WSD(word sense disambiguation)๊ณผ (WordNet) knowledge acquisition์„ ๊ณ ๋ คํ•จ
  • ์›์‹œ ํ…์ŠคํŠธ๋กœ multi-tasking์„ ์ˆ˜ํ–‰ํ•ด์„œ WordNet์— ์กด์žฌํ•˜์ง€ ์•Š๋Š” ์ƒˆ๋กœ์šด common-sense relations๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ธ knowledge extraction์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ๋„ ์ž…์ฆํ•จ

 

 

 

Semantic Parsing Framework

 

2.1 WordNet-based Representations (MRs)

 

  • semantic parsing์„ ์œ„ํ•ด ๊ณ ๋ คํ•œ MR์€ $REL(A_0, . . . , A_n)$ ํ˜•์‹์˜ ๊ฐ„๋‹จํ•œ ๋…ผ๋ฆฌ์‹
    • $REL$: relation symbol
    • $A_0, ..., A_n$: arguments

 

  • ๋…ผ๋ฌธ์—์„œ๋Š” open-domain ์›์‹œ ํ…์ŠคํŠธ๋ฅผ ๊ตฌ๋ฌธ ๋ถ„์„ํ•˜๊ธฐ๋ฅผ ์›ํ•˜๋ฏ€๋กœ ๋งŽ์€ relation types์™€ arguments๋ฅผ ๊ณ ๋ คํ•ด์•ผ ํ–ˆ์Œ
  • $REL$๊ณผ $A_i$ arguments๋ฅผ ์ •์˜ํ•˜๊ธฐ ์œ„ํ•ด WordNet ์‚ฌ์šฉ

 

  • WordNet์€ synset๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” node๊ฐ€ ์˜๋ฏธ(sense)์— ํ•ด๋‹นํ•˜๊ณ , edge๊ฐ€ ์ด๋Ÿฌํ•œ ์˜๋ฏธ๋“ค ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ์ •์˜ํ•˜๋Š” ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ ์•ˆ์—์„œ comprehensive knowledge(ํฌ๊ด„์ ์ธ ์ง€์‹)์„ ํฌํ•จํ•จ
    • synset: ์œ ์˜์–ด ์ง‘๋‹จ

 

  • synset์€ ์ผ๋ฐ˜์ ์œผ๋กœ 8-digits codes๋กœ ์‹๋ณ„๋˜์ง€๋งŒ ๋ช…ํ™•์„ฑ์„ ์œ„ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” synset์„ ๋‹จ์–ด + ํ’ˆ์‚ฌ ํƒœ๊ทธ(POS tag - NN: ๋ช…์‚ฌ, VB: ๋™์‚ฌ, JJ: ํ˜•์šฉ์‚ฌ, RB: ๋ถ€์‚ฌ) + ์ˆซ์ž( ๋ช‡ ๋ฒˆ์งธ ์˜๋ฏธ์ธ์ง€)๋กœ ํ‘œํ˜„ํ•จ
    • ex)
      • _score_NN_1: ๋ช…์‚ฌ "score"์˜ ์ฒซ ๋ฒˆ์งธ ์˜๋ฏธ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” synset. "mark"์™€ "grade"๋ผ๋Š” ๋‹จ์–ด๋„ ํฌํ•จ ⇒ ์ ์ˆ˜
      • _score_NN_2: ๋ช…์‚ฌ "score"์˜ ๋‘ ๋ฒˆ์งธ ์˜๋ฏธ ⇒ ์•…๋ณด

 

  • triplets $(lhs, rel, rhs)$์„ ์‚ฌ์šฉํ•ด์„œ WordNet์˜ relations instances๋ฅผ ๋‚˜ํƒ€๋ƒ„
    • $lhs$: relation์˜ ์™ผ์ชฝ(left-hand side)
    • $rel$: relation์˜ type
    • $rhs$: relation์˜ ์˜ค๋ฅธ์ชฝ(right-hand side)
  • ex)
    • (_score_NN_1, _hypernym, _evaluation_NN_1)
    • (_score NN_2, _has_part, _musical_notation_NN_1)
  • hypernym: ์ƒ์˜์–ด
    • ๋‚ฑ๋ง๋“ค ์ค‘์—์„œ ๋‚ฑ๋ง์ด ๋‹ค๋ฅธ ๋‚ฑ๋ง์„ ํฌํ•จํ•˜๋Š” ๊ฒฝ์šฐ
    • <-> ํ•˜์˜์–ด: ์ƒ์˜์–ด์— ํฌํ•จ๋˜๋Š” ๋‚ฑ๋ง
    • ex) ์ƒ์˜์–ด: ์•…๊ธฐ, ํ•˜์˜์–ด: ํ”ผ์•„๋…ธ
  • has_part: ์ „์ฒด์—์„œ ๋ถ€๋ถ„์œผ๋กœ

์ถœ์ฒ˜: https://www.semanticscholar.org/paper/Exploiting-links-in-WordNet-hierarchy-for-word-of-Kolte-Bhirud/6bfe45b416900ebba6957c7be4d41233047bcb59

 

  • ์ตœ์ข… MR์˜ ๊ฒฝ์šฐ, $REL$ ๊ณผ $A_i$ arguments๋ฅผ WordNet synsets์˜ ํŠœํ”Œ๋กœ ํ‘œ์‹œ
  • → $REL$์€ ์•„๋ฌด ๋™์‚ฌ๋‚˜ ๋‹ค ๋  ์ˆ˜ ์žˆ๊ณ , 18๊ฐœ์˜ WordNet relations ์ค‘ ํ•˜๋‚˜๋กœ ์ œํ•œ๋˜์ง€ ์•Š์Œ

 

 

 

2.2 Inference Procedure (์ถ”๋ก  ์ ˆ์ฐจ)

 

 

Open-text semantic parsing

  • step 0) input
  • step 1) ์ „์ฒ˜๋ฆฌ (lemmatization, POS, chunking, SRL)
  • step 2) ๊ฐ lemma(ํ‘œ์ œ์–ด)๊ฐ€ ํ•ด๋‹นํ•˜๋Š” WordNet synset์— ํ• ๋‹น๋จ
  • step 3) ์™„์ „ํ•œ MR(meaning representation) ์ •์˜

 

ํ…์ŠคํŠธ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •(lemmatization, POS, chunking, SRL)

 

  • lemmatization: ํ‘œ์ œ์–ด ์ถ”์ถœ
    • ๊ธฐ๋ณธ ์‚ฌ์ „ํ˜• ๋‹จ์–ด ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜
    • ๋ณต์ˆ˜ -> ๋‹จ์ˆ˜, ๋™์‚ฌ -> ํƒ€๋™์‚ฌ

์ถœ์ฒ˜:&nbsp;https://medium.com/sciforce/text-preprocessing-for-nlp-and-machine-learning-tasks-3e077aa4946e

 

  • POS tagging: ํ’ˆ์‚ฌ ํƒœ๊น…

์ถœ์ฒ˜:&nbsp;https://byteiota.com/pos-tagging/

 

  • chunking = shallow parsing
    • ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํ’ˆ์‚ฌ๋กœ ๊ตฌ(pharase)๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ
    • ๋ฌธ์žฅ์„ ๊ฐ ํ’ˆ์‚ฌ๋กœ ๊ตฌ๋ถ„ํ•˜๊ณ , chunking์— ์˜ํ•ด ๊ตฌ๋กœ ๊ตฌ๋ถ„ํ•˜๋ฉด ๋ฌธ์žฅ์˜ ์˜๋ฏธ๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์šฉ์ดํ•ด์ง

์ถœ์ฒ˜:&nbsp;https://jynee.github.io/NLP%EA%B8%B0%EC%B4%88_3/

 

  • SRL: ์˜๋ฏธ์—ญ ๊ฒฐ์ •(Semantic Role Labeling)

์ถœ์ฒ˜:&nbsp;https://paperswithcode.com/task/semantic-role-labeling

 

 

 

  • ๋…ผ๋ฌธ์—์„œ์˜ semantic parsing์€ ๋‘ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋จ → step 1)๊ณผ step2)

 

Step (1): MR structure inference

 

  • ํ…์ŠคํŠธ๋ฅผ ์ „์ฒ˜๋ฆฌํ•˜๊ณ , MR์˜ ๊ตฌ์กฐ๋ฅผ ์ถ”๋ก ํ•˜๋Š” ๋‹จ๊ณ„
  • ์ด ๋‹จ๊ณ„์—์„œ๋Š” ์ด๋ฏธ ์กด์žฌํ•˜๋Š” ํ‘œ์ค€ ๋ฐฉ์‹ ์‚ฌ์šฉ
  • SENNA software๋ฅผ ์‚ฌ์šฉํ•ด์„œ POS tagging, chunking, lemmatization, semantic role labeling(SRL) ์ˆ˜ํ–‰

 

  • ๋…ผ๋ฌธ์—์„œ๋Š” ํ‘œ์ œ์–ด ์ถ”์ถœ๋œ(lemmatized) ๋‹จ์–ด์™€ POS tag์˜ ์—ฐ๊ฒฐ์„ ‘lemma’๋ผ๊ณ  ํ‘œํ˜„ํ•จ
    • lemma: ํ‘œ์ œ์–ด
  • lemma์™€ synset์„ ๊ตฌ๋ณ„ํ•˜๋Š” ์ •์ˆ˜ ์ ‘๋ฏธ์‚ฌ๊ฐ€ ์—†๋Š” ๊ฒƒ์— ์ฃผ์˜ → lemma๊ฐ€ ์˜๋ฏธ์ƒ ๋ชจํ˜ธํ•  ์ˆ˜ ์žˆ์Œ

 

  • SRL๋Š” ๊ฐ proposition์— ๋Œ€ํ•œ ๋™์‚ฌ์™€ ๊ด€๋ จ๋œ ๊ฐ๊ฐ์˜ grammatical argument์— semantic role label์„ ํ• ๋‹นํ•˜๋Š” ๊ฒƒ
  • → MR์˜ ๊ตฌ์กฐ๋ฅผ ์ถ”๋ก ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์ค‘์š”

 

  • ๋…ผ๋ฌธ์—์„œ๋Š” (subject_์ฃผ์–ด, verb_๋™์‚ฌ, direct object_์ง์ ‘๋ชฉ์ ์–ด)์˜ ํ…œํ”Œ๋ฆฟ๊ณผ ์ผ์น˜ํ•˜๋Š” ๋ฌธ์žฅ๋“ค๋งŒ ๊ณ ๋ คํ•จ
  • ์ด 3๊ฐ€์ง€ ์š”์†Œ๋“ค์€ ํ‘œ์ œ์–ด ์ถ”์ถœ๋œ ๋‹จ์–ด๋“ค์˜ ํŠœํ”Œ(→ multi-word phrase)๊ณผ ๊ด€๋ จ๋จ
  • SRL์€ ๋ฌธ์žฅ์„ ($lhs$ = subject, $rel$ = verb, $rhs$ = object)์˜ ํ…œํ”Œ๋ฆฟ์œผ๋กœ ๊ตฌ์กฐํ™”ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋จ
  • ์›์‹œ ํ…์ŠคํŠธ์—์„œ ์ˆœ์„œ๊ฐ€ ๋ฐ˜๋“œ์‹œ ์ฃผ์–ด/๋™์‚ฌ/์ง์ ‘ ๋ชฉ์ ์–ด์ผ ํ•„์š”๋Š” ์—†์Œ → ex) ์ˆ˜๋™ํƒœ ๋ฌธ์žฅ
  • semantic parse(๋˜๋Š” MR)์„ ์™„๋ฃŒํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” lemma๊ฐ€ ๋ฐ˜๋“œ์‹œ synset์œผ๋กœ ๋ณ€ํ™˜๋˜์–ด์•ผ ํ•จ -> step (2)์˜ disambiguation

 

 

Step (2): Detection of MR entities

 

  • ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋ฌธ์žฅ์— ํ‘œํ˜„๋œ ๊ฐ๊ฐ์˜ semantic entity๋ฅผ ์‹๋ณ„ํ•˜๋Š” ๊ฒƒ
  • ๊ฐ ์š”์†Œ๊ฐ€ lemma์˜ ํŠœํ”Œ๊ณผ ๊ด€๋ จ๋œ relation triplet $(lhs^{lem}, rel^{lem}, rhs^{lem})$์ด ์ฃผ์–ด์ง€๋ฉด lemma๊ฐ€ synset๋กœ ๋Œ€์ฒด๋œ corresponding triplet $(lhs^{syn}, rel^{syn}, rhs^{syn})$์ด ์ƒ์„ฑ๋จ
  • lemma์— ๋”ฐ๋ผ ๊ฐ„๋‹จํ•˜๊ฑฐ๋‚˜
    • _television_program_NN ๋˜๋Š” _world_war_ii_NN๊ณผ ๊ฐ™์€ ์ผ๋ถ€ lemma๋Š” ๋‹จ์ผ synset์— ํ•ด๋‹น
  • ๋งค์šฐ ์–ด๋ ค์šธ ์ˆ˜ ์žˆ์Œ
    • _run_VB๋Š” 33๊ฐœ์˜ ๋‹ค๋ฅธ synset์—, _run_NN์€ 10๊ฐœ์˜ synset์— ๋งคํ•‘๋  ์ˆ˜ ์žˆ์Œ
  • ๊ทธ๋ž˜์„œ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ semantic parsing framework์—์„œ๋Š” MR์ด $rel^{syn} (lhs^{syn}, rhs^{syn})$ ํ˜•์‹์œผ๋กœ ์žฌ๊ตฌ์„ฑ๋  ์ˆ˜ ์žˆ๋Š” synsets์˜ triplets์ธ $(lhs^{syn}, rel^{syn}, rhs^{syn})$ ์— ํ•ด๋‹นํ•จ
  • ๋ชจ๋ธ์ด relation triplets๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— MR๊ณผ WordNet relations๋Š” ๋™์ผํ•œ scheme์œผ๋กœ ๋ณด๋‚ด์ง 
    • ex) WordNet relation ( _score_NN_2 , _has_part, _musical_notation_NN_1) ๋Š” WordNet relation type _has_part ๊ฐ€ ๋™์‚ฌ์˜ ์—ญํ• ์„ ํ•˜๋Š” MR๊ณผ ๋™์ผํ•œ ํŒจํ„ด์— fitํ•จ

 

 

 

Semantic Matching Energy

 

  • ์ด ๋…ผ๋ฌธ์˜ main contribution
  • -> lemma์™€ WordNet entity๋“ค์„ ๋™์ผํ•œ ๋ฒกํ„ฐ ๊ณต๊ฐ„์— ์ž„๋ฒ ๋“œํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•œ energy function
  • semantic matching energy function์€ lemma๊ฐ€ ์ฃผ์–ด์ง„ ์ ์ ˆํ•œ synset์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋จ

 

 

 

3.1 Framework

 

key concepts

 

  • 1) symbolic entities (synsets, relation types, lemmas)๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๊ฒƒ๋“ค์€ ๋ชจ๋‘ neural language model์˜ ์ด์ „ ์ž‘์—…์— ๋”ฐ๋ผ "์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„"์ด๋ผ๊ณ  ํ•˜๋Š” ๊ณต๋™์˜ d-์ฐจ์› ๋ฒกํ„ฐ ๊ณต๊ฐ„๊ณผ ๊ด€๋ จ๋จ
    • ์ด ๋ฒกํ„ฐ๋“ค์€ ๋ชจ๋ธ์˜ parameter๋“ค์ด๋ฉฐ semantic parsing ์ž‘์—…์—์„œ ์ž˜ ์ˆ˜ํ–‰๋˜๋„๋ก ๊ณต๋™์œผ๋กœ ํ•™์Šต๋จ
  • 2) ํŠน์ • triplet $(lhs, rel, rhs)$๊ณผ ๊ด€๋ จ๋œ semantic matching energy value๋Š” ๋ชจ๋“  symbol์„ ๊ทธ๋“ค์˜ ์ž„๋ฒ ๋”ฉ์— ๋งคํ•‘ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋งค๊ฐœ ๋ณ€์ˆ˜ํ™”๋œ ํ•จ์ˆ˜ $ε$์— ์˜ํ•ด ๊ณ„์‚ฐ๋จ
    • $ε$๋Š” variable-size arguments๋„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•จ
  • 3) energy function $ε$๋Š” ๊ฐ€๋Šฅํ•œ ๋‹ค๋ฅธ symbols์˜ configurations ๋ณด๋‹ค training ์˜ˆ์ œ์— ๋Œ€ํ•ด ๋” ๋‚ฎ๊ฒŒ ์ตœ์ ํ™”๋จ
    • ๋”ฐ๋ผ์„œ lemma์— ๋Œ€ํ•œ ๊ฐ€์žฅ ๊ทธ๋Ÿด๋“ฏํ•œ ์˜๋ฏธ๋ฅผ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•ด์„œ semantic matching energy function์ด entity์˜ ๊ทธ๋Ÿด๋“ฏํ•œ ์กฐํ•ฉ๊ณผ ๊ทธ๋Ÿด๋“ฏํ•˜์ง€ ์•Š์€ ์กฐํ•ฉ์„ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์žˆ์Œ

 

 

 

3.2 Parametrization

 

 

Semantic matching energy function

 

  • 1) ํŠœํ”Œ $(lhs, rel, rhs)$์˜ triplet์€ ๋จผ์ € ๊ฐ๊ฐ์˜ ์ž„๋ฒ ๋”ฉ์ธ $E_{lhs}$, $E_{rel}$, $E_{rhs}$์— ๋งคํ•‘๋จ 
    • ํ•˜๋‚˜ ์ด์ƒ์˜ symbol์„ ํฌํ•จํ•˜๋Š” ํŠœํ”Œ์— ๋Œ€ํ•ด ์ง‘๊ณ„ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์„œ
  • 2) $E_{lhs}$์™€ $E_{rel}$๋Š” $g_{left}(.)$๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๊ฒฐํ•ฉ๋˜์–ด output์œผ๋กœ $E_{lhs(rel)}$๋ฅผ ์ถœ๋ ฅ
    • $E_{rhs(rel)} = g_{right}(E_{rhs}, E_{rel})$
  • 3) $ε((lhs, rel, rhs))$ ์—๋„ˆ์ง€๋Š” $E_{lhs(rel)}$์™€ $E_{rhs(rel)}$๋ฅผ $h(.)$ ํ•จ์ˆ˜์™€ ํ•ฉ์ณ์„œ ์–ป์–ด์ง

 

  • semantic matching energy function์€ ๋ณ‘๋ ฌ ๊ตฌ์กฐ(parallel structure)๋ฅผ ๊ฐ€์ง
    • ๋จผ์ €, $(lhs, rel)$๊ณผ $(rel, rhs)$ ์Œ์ด ๋”ฐ๋กœ๋”ฐ๋กœ ๊ฒฐํ•ฉ
    • ๊ทธ๋Ÿฐ ๋‹ค์Œ, ์ด๋Ÿฌํ•œ semantic combinations๊ฐ€ ๋งค์น˜๋จ

 

 

 

3.3 Training Objective

 

  • $C$: ๋ชจ๋“  entity๋“ค(relation types, lemmas, synsets)์„ ํฌํ•จํ•œ dictionary
  • $C^∗$: ์š”์†Œ๋“ค์ด $C$์—์„œ ์ทจํ•ด์ง„ ํŠœํ”Œ(๋˜๋Š” ์‹œํ€€์Šค)์˜ ์ง‘ํ•ฉ

 

 

3.4 Disambiguation of Lemma Triplets

 

  • disambiguation: ๋ช…ํ™•ํ™”
  • semantic matching energy function์€ Step (2): Detection of MR entities๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์›์‹œ ํ…์ŠคํŠธ์— ์‚ฌ์šฉ๋จ
  • → ์ฆ‰ word-sense disambiguation ๋‹จ๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ
  • lemma์˜ triplet $((lhs_1^{lem}, lhs_2^{lem}, . . .),(rel_1^{lem}, . . .),(rhs_1^{lem}, . . .))$์€ ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์˜ lemma์”ฉ greedy ๋ฐฉ์‹์œผ๋กœ synsets์™€ ๋ ˆ์ด๋ธ” ๋จ
  • ์˜ˆ๋ฅผ ๋“ค์–ด, $lhs_2^{lem}$๋ฅผ ๋ผ๋ฒจ๋ง ํ•˜๋ ค๋ฉด triplet์˜ ๋‚˜๋จธ์ง€ ๋ชจ๋“  ์š”์†Œ๋ฅผ lemma๋“ค๋กœ ๊ณ ์ •ํ•˜๊ณ , ๊ฐ€์žฅ ๋‚ฎ์€ ์—๋„ˆ์ง€๋กœ ์ด์–ด์ง€๋Š” synset์„ ์„ ํƒํ•จ

  • $C(syn|lem)$: $lhs_2^{lem}$์ด ๋งคํ•‘๋  ์ˆ˜ ์žˆ๋Š” ํ—ˆ์šฉ๋œ synset์˜ ์ง‘ํ•ฉ
  • ์ด๊ฑธ ๋ชจ๋“  lemma๋“ค์— ๋Œ€ํ•ด ๋ฐ˜๋ณต
  • ๋…ผ๋ฌธ์—์„œ๋Š” ํ•ญ์ƒ lemma๋ฅผ context๋กœ ์‚ฌ์šฉํ•จ (์ด๋ฏธ ํ• ๋‹น๋œ synset๋Š” ์ ˆ๋Œ€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ)

 

  • ์ด ๋ฐฉ์‹์€ ๋ฌธ์žฅ์˜ ๊ฐ ์œ„์น˜์— ๋Œ€ํ•ด์„œ lemma์˜ ์˜๋ฏธ๋“ค์˜ ๊ฐœ์ˆ˜์™€ ๋™์ผํ•œ ์ ์€ ์ˆ˜์˜ ์—๋„ˆ์ง€๋งŒ ๊ณ„์‚ฐํ•˜๋ฉด ๋˜๋ฏ€๋กœ ํšจ์œจ์ ์ธ ํ”„๋กœ์„ธ์Šค์ž„
  • ํ•˜์ง€๋งŒ ์ด ๋ฐฉ์‹์€ ์ด ์ค‘์š”ํ•œ ๋‹จ๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๊ณต๋™์œผ๋กœ ํ•จ๊ป˜ ์‚ฌ์šฉ๋˜๊ธฐ ๋•Œ๋ฌธ์— synset๊ณผ lemma์— ๋Œ€ํ•œ good representations( = ์ข‹์€ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ $E_i$)๊ฐ€ ์š”๊ตฌ๋จ
  • ๊ทธ๋ž˜์„œ multi-tasking training์ด synset๊ณผ lemma(๊ทธ๋ฆฌ๊ณ  $g$ functions๋ฅผ ์œ„ํ•œ ์ข‹์€ parameters)์— ๋Œ€ํ•ด ๊ณต๋™์œผ๋กœ ์ข‹์€ ์ž„๋ฒ ๋”ฉ์„ ํ•™์Šตํ•˜๋ ค๊ณ  ์‹œ๋„ํ•จ

 

 

 

Multi-Task Training

 

4.1 Multiple Data Resources

 

๊ฐ€๋Šฅํ•œ ํ•œ ๋งŽ์€ ์ƒ์‹(common-sense knowledge)์„ ๋ชจ๋ธ์— ๋ถ€์—ฌํ•˜๊ธฐ ์œ„ํ•ด์„œ ์—ฌ๋Ÿฌ ๋‹ค๋ฅธ ์ข…๋ฅ˜๋“ค๋กœ ์ด๋ค„์ง„ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋“ค์„ ๊ฒฐํ•ฉํ•ด์„œ ์‚ฌ์šฉํ•จ

 

  • 1) WordNet v3.0 (WN)
    • ๋ฉ”์ธ ๋ฆฌ์†Œ์Šค
    • WordNet์€ synset ๊ฐ„์˜ relation๋งŒ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”๋ฐ disambiguation process๋ฅผ ์œ„ํ•ด์„œ๋Š” synset๊ณผ lemma์— ๋Œ€ํ•œ ์ž„๋ฒ ๋”ฉ์ด ํ•„์š”ํ•จ
    • ๊ทธ๋ž˜์„œ lemma ์ž„๋ฒ ๋”ฉ ๋˜ํ•œ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ๋‹ค๋ฅธ ๋ฒ„์ „์˜ ๋ฐ์ดํ„ฐ ์…‹์„ ๋งŒ๋“ฆ
      • “Ambiguated” WN
        • ๊ฐ triplet์˜ synset entities๊ฐ€ ํ•ด๋‹น๋˜๋Š” lemma ์ค‘ ํ•˜๋‚˜๋กœ ๋Œ€์ฒด๋จ
        • ๊ทธ๋ž˜์„œ lemma๋ฅผ ์œ ์˜์–ด(synonym)๋กœ ๋Œ€์ฒดํ•˜๋Š” ๊ฒƒ๊ณผ ์œ ์‚ฌํ•œ ๋งŽ์€ ์˜ˆ์ œ๋“ค๋กœ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•จ
      • “Bridge” WN
        • ๋ชจ๋ธ์— synset๊ณผ lemma ์ž„๋ฒ ๋”ฉ ๊ฐ„์˜ ์—ฐ๊ฒฐ์— ๋Œ€ํ•ด ํ•™์Šต์‹œํ‚ค๋„๋ก ์„ค๊ณ„๋จ
        • relation ํŠœํ”Œ์—์„œ $lhs$ ๋˜๋Š” $rhs$ synset์€ ํ•ด๋‹น๋˜๋Š” lemma๋กœ ๋Œ€์ฒด๋จ (๋‹ค๋ฅธ argument๋Š” synset์œผ๋กœ ์œ ์ง€๋จ)
    • 221,017 triplets
    • → val์…‹: 5,000 triplets / test์…‹: 5,000 triplets

 

  • 2) ConceptNet v2.1 (CN)
    • ์ƒ์‹(common-sense knowledge) ๊ธฐ๋ฐ˜
    • lemma ๋˜๋Š” lemma ๊ทธ๋ฃน๋“ค์ด ํ’๋ถ€ํ•œ semantic relations(์˜๋ฏธ ๊ด€๊ณ„)์™€ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์Œ
    • synset์ด ์•„๋‹Œ lemma๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์„œ๋กœ ๋‹ค๋ฅธ ๋‹จ์–ด ์˜๋ฏธ์˜ ์ฐจ์ด๋ฅผ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š์Œ
    • WN ์‚ฌ์ „์˜ lemma๋ฅผ ํฌํ•จํ•˜๋Š” triplet๋งŒ ์‚ฌ์šฉํ•จ
    • 11,332 training triplets

 

  • 3) Wikipedia (Wk)
    • ๋‹จ์ˆœํžˆ ๋น„์ง€๋„ ๋ฐฉ์‹์œผ๋กœ ๋ชจ๋ธ์— ์ง€์‹(knowledge)์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•œ ์›์‹œ ํ…์ŠคํŠธ๋กœ ์‚ฌ์šฉ
    • 5๋งŒ ๊ฐœ์˜ ๊ธฐ์‚ฌ๋กœ 3๋ฐฑ๋งŒ ๊ฐœ ์ด์ƒ์˜ ์˜ˆ์ œ ์ƒ์„ฑ

 

  • 4) EXtended WordNet (XWN)
    • WordNet glosses(→ definitions)๋กœ๋ถ€ํ„ฐ ๊ตฌ์ถ•๋˜๊ณ , ๊ตฌ๋ฌธ ๋ถ„์„(syntactically parsed)๋˜์—ˆ๊ณ , WN synset์— ์˜๋ฏธ์ ์œผ๋กœ ์—ฐ๊ฒฐ๋œ content word๋“ค๋กœ ๊ตฌ์„ฑ
    • 776,105 training triplets
    • val์…‹: 10,000 triplets

 

  • 5) Unambiguous Wikipedia (Wku)
    • lemma ์ค‘ ํ•˜๋‚˜๊ฐ€ ๋ช…ํ™•ํ•˜๊ฒŒ synset์— ํ•ด๋‹นํ•˜๊ณ , ์ด synset์ด ๋‹ค๋ฅธ ambiguous(๋ชจํ˜ธํ•œ) lemma์— ๋งคํ•‘๋˜๋ฉด unambiguous(๋ชจํ˜ธํ•˜์ง€ ์•Š์€) lemma๋ฅผ ambiguous lemma๋กœ ๋Œ€์ฒดํ•ด์„œ ์ƒˆ๋กœ์šด triplet์„ ์ƒ์„ฑํ•จ
    • -> ์ด ๋ฐฉ์‹์œผ๋กœ ์ˆ˜์ •๋œ Wikipedia ๋ง๋ญ‰์น˜์—์„œ ์ถ”์ถœํ•œ triplet์œผ๋กœ train์…‹์„ ์ถ”๊ฐ€์ ์œผ๋กœ ๋งŒ๋“ฆ
    • ์ € ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจํ˜ธํ•œ context์—์„œ true synset์„ ์•Œ ์ˆ˜ ์žˆ์Œ
    • 981,841 supervision triplets

 

 

 

4.2 Training Algorithmenergy function

 

  • $ε$์˜ parameter๋ฅผ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ ๋ชจ๋“  ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ๋ฆฌ์†Œ์Šค๋ฅผ ๋ฐ˜๋ณตํ–ˆ๊ณ , ํ™•๋ฅ ์  ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•(stochastic gradient descent)์„ ์‚ฌ์šฉํ–ˆ์Œ
    • ํ™•๋ฅ ์  ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•(stochastic gradient descent, SGD)
      • ์กฐ๊ธˆ๋งŒ ํ›‘์–ด๋ณด๊ณ (Mini batch) ๋น ๋ฅด๊ฒŒ ๊ฐ€๋ณด์ž

์ถœ์ฒ˜:&nbsp;https://seamless.tistory.com/38

 

์ถœ์ฒ˜:&nbsp;https://seamless.tistory.com/38

 

 

๋‹ค์Œ ๋‹จ๊ณ„์— ๋”ฐ๋ผ ํ•™์Šต์„ ๋ฐ˜๋ณต์‹œํ‚ด

  • 1. ์œ„์˜ ์˜ˆ์ œ ์†Œ์Šค ์ค‘ ํ•˜๋‚˜์—์„œ ๋ฌด์ž‘์œ„๋กœ positive training triplet $x_i$๋ฅผ ์„ ํƒ (synset, lemma ๋˜๋Š” ๋‘˜ ๋‹ค๋กœ ๊ตฌ์„ฑ๋œ triplet)
  • 2. ์ œ์•ฝ ์กฐ๊ฑด(constraint) (1), (2), (3) ์ค‘ ๋ฌด์ž‘์œ„๋กœ ์„ ํƒ

constraint (1), (2), (3)

  • 3. $lhs_{xi}$, $rel_{xi}$ ๋˜๋Š” $rhs_{xi}$๋ฅผ ๊ฐ๊ฐ ๋Œ€์ฒดํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋“  entity $C$ ์…‹์—์„œ entity๋ฅผ ์ƒ˜ํ”Œ๋งํ•ด์„œ negative triplet $\tilde{x}$๋ฅผ ๋งŒ๋“ฆ
  • 4. $ε(x_i) > ε(\tilde{x}) − 1$ ์ด๋ฉด ๊ธฐ์ค€(criterion) (4)๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด์„œ ํ™•๋ฅ ์  ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•(stochastic gradient descent, SGD) ๋‹จ๊ณ„๋ฅผ ์ˆ˜ํ–‰

criterion (4)

  • 5. ๊ฐ๊ฐ์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๊ฐ€ ์ •๊ทœํ™”๋œ๋‹ค๋Š” ์ œ์•ฝ ์กฐ๊ฑด(constraint)์„ ์ ์šฉ. $||E_i|| = 1$, $∀i$

 

  • ๊ฒฝ์‚ฌํ•˜๊ฐ• ๋‹จ๊ณ„์—์„œ๋Š” $λ$์˜ ํ•™์Šต๋ฅ ์ด ์š”๊ตฌ๋จ
  • ์œ„์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ XWN์™€ Wku ๋ฐ์ดํ„ฐ๋ฅผ ์ œ์™ธํ•œ ๋ชจ๋“  ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•จ
  • entity์˜ ๋ชจ๋“  representation์„ ํฌํ•จํ•˜๋Š” ํ–‰๋ ฌ $E$๋Š” ๋ณต์žกํ•œ multi-task learning ์ ˆ์ฐจ๋ฅผ ํ†ตํ•ด ํ•™์Šต๋จ
  • -> ๋ชจ๋“  relation๊ณผ ๋ชจ๋“  ๋ฐ์ดํ„ฐ ์†Œ์Šค์— ๋Œ€ํ•ด ๋‹จ์ผ ์ž„๋ฒ ๋”ฉ ํ–‰๋ ฌ์ด ์‚ฌ์šฉ๋˜๊ธฐ ๋•Œ๋ฌธ
  • ๊ทธ ๊ฒฐ๊ณผ, entity์˜ ์ž„๋ฒ ๋”ฉ์—๋Š” entity๊ฐ€ $lhs$, $rhs$ ๋˜๋Š” $rel$ (๋™์‚ฌ์˜ ๊ฒฝ์šฐ)๋กœ ํฌํ•จ๋˜์–ด ์žˆ๋Š” ๋ชจ๋“  relation๊ณผ ๋ฐ์ดํ„ฐ ์†Œ์Šค์—์„œ ์˜ค๋Š” ์ธ์ˆ˜๋ถ„ํ•ด๋œ(factorized) ์ •๋ณด๊ฐ€ ํฌํ•จ๋จ
  • ๋ชจ๋ธ์€ ๊ฐ entity์— ๋Œ€ํ•ด ๋‹ค๋ฅธ entity๋“ค๊ณผ ๋‹ค์–‘ํ•œ ๋ฐฉ์‹์œผ๋กœ ์ƒํ˜ธ ์ž‘์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•˜๋„๋ก ๊ฐ•์š”๋จ

 

 

 

Experiments

 

6.1 Benchmarks

 

  • benchmark
    • ์—ฌ๋Ÿฌ ์‹คํ—˜ ๋˜๋Š” ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋Š” ํ‘œ์ค€
    • https://ifdean.tistory.com/3
  • multi-task joint training๊ณผ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋กœ ์ˆ˜ํ–‰๋œ ๋ชจ๋ธ๋“ค์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ๋ฒค์น˜๋งˆํฌ task์—์„œ ๋ฐ์ดํ„ฐ ์†Œ์Šค์˜ ์—ฌ๋Ÿฌ ์กฐํ•ฉ๋“ค๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ๋“ค์„ ํ‰๊ฐ€ํ•จ
    • WordNet knowledge encoding
    • WSD(Word Sense Disambiguation)

 

 

WordNet Knowledge Acquisition (cols. 2-3) and Word Sense Disambiguation (cols. 4-5)

  • WN: WordNet์œผ๋กœ๋งŒ ํ•™์Šต๋œ ๋ชจ๋ธ → “Ambiguated” WordNet๊ณผ “Bridge” WordNet
  • WN+CN+Wk: WordNet, ConceptNet, Wikipedia ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ
  • All: ๋ชจ๋“  ๋ฐ์ดํ„ฐ ์†Œ์Šค๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ
  • MFS: Most Frequent Sense ์‚ฌ์šฉ, WordNet frequency(๋นˆ๋„)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•จ
  • All+MFS: ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ์ข‹๊ฒŒ ๋‚˜์˜จ ๋ชจ๋ธ
  • SE (Bordeset al., 2011) : Structured Embeddings

 

 

1) Knowledge Acquisition

 

  • ์ฃผ์–ด์ง„ ์ง€์‹(knowledge → training relations)์—์„œ ์ƒˆ๋กœ์šด relation์„ ์ผ๋ฐ˜ํ™”(generalize)ํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์€ ๋‹ค์Œ ์ ˆ์ฐจ๋กœ ์ธก์ •๋จ
    • ๊ฐ๊ฐ์˜ test WordNet triplet์— ๋Œ€ํ•ด ์™ผ์ชฝ ๋˜๋Š” ์˜ค๋ฅธ์ชฝ entity๊ฐ€ ์ œ๊ฑฐ๋˜๊ณ , ๊ฐ๊ฐ ์ฐจ๋ก€์ฐจ๋ก€ ์‚ฌ์ „(dictionary)์˜ 41,024๊ฐœ์˜ synset์œผ๋กœ ๋Œ€์ฒด๋จ
    • ์ด triplet๋“ค์˜ ์—๋„ˆ์ง€๋Š” ๋ชจ๋ธ์— ์˜ํ•ด ๊ณ„์‚ฐ๋˜๊ณ , ์˜ค๋ฆ„์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌ๋˜๋ฉฐ ์˜ฌ๋ฐ”๋ฅธ synset์˜ ์ˆœ์œ„(rank)๊ฐ€ ์ €์žฅ๋จ
    • ๊ทธ๋Ÿฐ ๋‹ค์Œ ํ‰๊ท  ์˜ˆ์ธก ์ˆœ์œ„(→ ํ•ด๋‹น ์ˆœ์œ„๋“ค์˜ ํ‰๊ท ), WordNet ์ˆœ์œ„์™€ precision@10( = p@10 → 1๊ณผ 10 ๋‚ด์— ์žˆ๋Š” ์ˆœ์œ„์˜ ๋น„์œจ์„ 10์œผ๋กœ ๋‚˜๋ˆˆ ๊ฐ’), WordNet p@10์„ ์ธก์ •
      • P@10 = Precision at 10
      • precision: ์ •๋ฐ€๋„
      • -> ๋ชจ๋ธ์ด True๋ผ๊ณ  ๋ถ„๋ฅ˜ํ•œ ๊ฒƒ ์ค‘ ์‹ค์ œ True์ธ ๊ฒƒ์˜ ๋น„์œจ
      • Precision at K
      • -> Top K๊ฐœ์˜ ๊ฒฐ๊ณผ๋กœ Precision(์ •๋ฐ€๋„)๋ฅผ ๊ณ„์‚ฐ
  • generalize: ์ผ๋ฐ˜ํ™”

 

 

WordNet Knowledge Acquisition

  • WordNet์œผ๋กœ๋งŒ ํ•™์Šต๋œ ๋ชจ๋ธ(WN)์˜ ์„ฑ๋Šฅ์€ SE๋ณด๋‹ค ์‚ด์ง ๋‚ฎ์Œ
  • SE (Bordes et al. (2011))๋Š” ์˜ˆ์ธก์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ structured embeddings ์œ„์— KDE(Kernel Density Estimator)๋ฅผ ์Œ“์Œ

์ถœ์ฒ˜:&nbsp;https://en.wikipedia.org/wiki/Kernel_density_estimation

 

  • KDE๊ฐ€ ์—†๋Š” SE (no KDE) (Bordes et al., 2011)์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ๋Š” WN์˜ ์„ฑ๋Šฅ์ด ๋” ๋†’์Œ
  • ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์™€ multi-taskingํ•œ WN+CN+Wk ๋ชจ๋ธ๊ณผ All ๋ชจ๋ธ์€ WordNet๋งŒ ํ•™์Šต์‹œํ‚จ WN ๋ชจ๋ธ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ์กฐ๊ธˆ ๋–จ์–ด์ง€์ง€๋งŒ ๊ทธ๋ž˜๋„ WordNet knowledge๋ฅผ ์ž˜ ์ธ์ฝ”๋”ฉํ•จ
  • ์›์‹œ ํ…์ŠคํŠธ๋กœ multi-taskingํ–ˆ์„ ๋•Œ, relation type์˜ ๊ฐœ์ˆ˜๋Š” 18๊ฐœ์—์„œ ์ˆ˜์ฒœ ๊ฐœ๋กœ ๋Š˜์–ด๋‚จ
  • ๋ชจ๋ธ์€ ๋„ˆ๋ฌด ๋งŽ์€ relation์œผ๋กœ ์ธํ•ด์„œ ๋” ๋ณต์žกํ•œ ์œ ์‚ฌ์„ฑ(similarity)์„ ํ•™์Šตํ•จ
  • → text relation์„ ์ถ”๊ฐ€ํ•˜๋ฉด WordNet์—์„œ ์ง€์‹(knowledge)์„ ์ถ”์ถœํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋” ์–ด๋ ค์›Œ์ง
  • ์ด๋Ÿฌํ•œ ์ €ํ•˜ ํšจ๊ณผ๋Š” ์œ„ ์ด๋ฏธ์ง€์— ๋‚˜์™€์žˆ๋Š” ์ˆœ์œ„๊ฐ€ 41,024๊ฐœ ์ด์ƒ์˜ entity์— ๋Œ€ํ•œ ๊ฒƒ์ด๋ผ๋Š” ์ ์„ ์—ผ๋‘์— ๋‘๊ณ  ๋ณด๋ฉด ์„ฑ๋Šฅ์ด ์—ฌ์ „ํžˆ ๋งค์šฐ ์šฐ์ˆ˜ํ•œ ํŽธ์ด๋”๋ผ๋„ multi-tasking process์˜ ์ œํ•œ ์‚ฌํ•ญ(limitation)์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Œ
  • ๊ฒŒ๋‹ค๊ฐ€ ์ด๋Š” WSD์™€ semantic parsing์— ์ค‘์š”ํ•œ ์—ฌ๋Ÿฌ training ์†Œ์Šค๋“ค์„ ๊ฒฐํ•ฉํ•˜๋Š” ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•จ

 

 

 

2) Word Sense Disambiguation(WSD)

 

  • SensEval-3 ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋Š” ์•ž์—์„œ ์„ค๋ช…๋œ Inference Procedure(์ถ”๋ก  ์ ˆ์ฐจ)๋ฅผ ์‚ฌ์šฉํ•ด ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ , ๋ชจ๋“  lemma๊ฐ€ WordNet์—์„œ ์ •์˜ํ•œ ์–ดํœ˜์— ์†ํ•˜๋Š” triplet(์ฃผ์–ด, ๋™์‚ฌ, ์ง์ ‘ ๋ชฉ์ ์–ด)๋งŒ ์œ ์ง€

 

  • F1 score๋กœ ์ธก์ •
  • WN ๋ชจ๋ธ๊ณผ WN+CN+Wk ๋ชจ๋ธ์˜ ์ฐจ์ด์ ์€ direct supervision ์—†์ด๋„ ๋ชจ๋ธ์ด ํ…์ŠคํŠธ์—์„œ ์˜๋ฏธ ์žˆ๋Š” ์ •๋ณด๋ฅผ ์ถ”์ถœํ•ด ์ผ๋ถ€ ๋‹จ์–ด๋ฅผ disambiguate ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ (WN+CN+Wk ๋ชจ๋ธ์ด Random ๋ชจ๋ธ๊ณผ WN ๋ชจ๋ธ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ํ›จ์”ฌ ๋†’์Œ)
  • All+MFS ๋ชจ๋ธ์ด ์‹œ๋„ํ–ˆ๋˜ ๋ชจ๋“  ๋ฐฉ๋ฒ•๋“ค ์ค‘์—์„œ ์ œ์ผ ๋†’์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑ

 

 

 

6.2 Representations

 

1) Entity Embeddings

 

Embeddings

  • -> All ๋ชจ๋ธ์— ์˜ํ•ด ์ •์˜๋œ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์—์„œ ๋ช‡๋ช‡ entity์— ๋Œ€ํ•œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ๋“ค
  • ์˜ˆ์ƒํ–ˆ๋˜ ๋Œ€๋กœ, ์ด์›ƒ๋“ค์€ lemma์™€ synset์˜ ํ˜ผํ•ฉ์œผ๋กœ ๊ตฌ์„ฑ๋จ
  • lemma์— ํ•ด๋‹นํ•˜๋Š” ์ด์›ƒ์€ ๋‹ค๋ฅธ generic(ํฌ๊ด„์ ์ธ) lemma๋“ค๋กœ ๊ตฌ์„ฑ๋˜๋Š” ๋ฐ˜๋ฉด, ๋‘ ๊ฐœ์˜ ๋‹ค๋ฅธ synsets์— ๋Œ€ํ•œ ์ด์›ƒ์€ ์ฃผ๋กœ ๋ถ„๋ช…ํžˆ ๋‹ค๋ฅธ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„ synsets๋กœ ๊ตฌ์„ฑ๋จ
  • ๋‘ ๋ฒˆ์งธ ํ–‰์€ common lemmas (์ฒซ ๋ฒˆ์งธ ์—ด)์˜ ๊ฒฝ์šฐ ์ด์›ƒ ๋˜ํ•œ generic(ํฌ๊ด„์ ์ธ) lemma์ด์ง€๋งŒ, precise ones (๋‘ ๋ฒˆ์งธ ์—ด)๋Š” ์˜ˆ๋ฆฌํ•œ ์˜๋ฏธ๋ฅผ ์ •์˜ํ•˜๋Š” synset์— ๊ฐ€๊นŒ์›€
  • _different_JJ_1์— ๋Œ€ํ•œ ์ด์›ƒ ๋ฆฌ์ŠคํŠธ(์„ธ ๋ฒˆ์งธ ์—ด)๋Š” ํ•™์Šต๋œ ์ž„๋ฒ ๋”ฉ์ด antonymy(๋ฐ˜์˜์„ฑ → ๋ฐ˜์˜์–ด)์„ ์ธ์ฝ”๋”ฉํ•˜์ง€ ์•Š์Œ์„ ๋‚˜ํƒ€๋ƒ„

 

 

2) WordNet Enrichment

 

  • WordNet๊ณผ ConceptNet์€ ์ œํ•œ๋œ ๊ฐœ์ˆ˜์˜ relation type์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— (→ 20๊ฐœ ๋ฏธ๋งŒ, ex. _has_part, _hypernym) ๋Œ€๋ถ€๋ถ„์˜ ๋™์‚ฌ๋ฅผ relation์œผ๋กœ ๊ฐ„์ฃผํ•˜์ง€ ์•Š์Œ
  • multi-task training๊ณผ MR, WordNet/ConceptNet์˜ relation์— ๋Œ€ํ•œ ํ†ตํ•ฉ๋œ representation ๋•๋ถ„์— ๋ชจ๋ธ์ด ์ž ์žฌ์ ์œผ๋กœ WordNet์— ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ทธ๋Ÿฌํ•œ relation๋กœ ์ผ๋ฐ˜ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•จ

 

Predicted relations

  • -> ๋‘ knowledge bases(WordNet๊ณผ ConceptNet)์— ์กด์žฌํ•˜์ง€ ์•Š๋Š” relation type์— ๋Œ€ํ•œ ์˜ˆ์ธก๋œ synset ๋ฆฌ์ŠคํŠธ
  • TextRunner (Yates et al., 2007) : ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ 50,000๊ฐœ์˜ Wikipedia ๊ธฐ์‚ฌ์™€ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด 1์–ต ๊ฐœ์˜ ์›นํŽ˜์ด์ง€์—์„œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•œ ์ •๋ณด ์ถ”์ถœ ๋„๊ตฌ
  • ๋…ผ๋ฌธ์˜ All ๋ชจ๋ธ๊ณผ TextRunner์˜ ๊ฒฐ๊ณผ ๋ชจ๋‘ ์ƒ์‹์„ ๋ฐ˜์˜ํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ž„
  • ํ•˜์ง€๋งŒ ๋…ผ๋ฌธ์˜ All ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ TextRunner๋Š” lemma์˜ ๋‹ค๋ฅธ ์˜๋ฏธ๋ฅผ disambiguateํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ ๊ทธ ์ง€์‹์„ ๊ธฐ์กด ๋ฆฌ์†Œ์Šค์— ์—ฐ๊ฒฐํ•ด์„œ ํ’๋ถ€ํ•˜๊ฒŒ(enrich) ๋งŒ๋“ค์ง€ ๋ชปํ•จ

 

 

 

Conclusion

 

  • ์ด ๋…ผ๋ฌธ์€ ์›์‹œ ํ…์ŠคํŠธ๋ฅผ ๋ช…ํ™•ํ•œ(disambiguated) MR์— ๋งคํ•‘ํ•˜๋Š” semantic parsing์„ ์œ„ํ•œ ๋Œ€๊ทœ๋ชจ ์‹œ์Šคํ…œ์„ ์ œ์‹œํ•จ
  • key contributions
    1. ๋ชจํ˜ธํ•œ lemma์™€ ๋ชจํ˜ธํ•˜์ง€ ์•Š์€ entities(synsets) ์‚ฌ์ด์˜ ๊ด€๊ณ„๋“ค(relation)์˜ triplet์„ ํ‰๊ฐ€ํ•˜๋Š” energy-based model
    2. ์ƒ๋Œ€์ ์œผ๋กœ ์ œํ•œ๋œ supervision์œผ๋กœ ์›์‹œ ํ…์ŠคํŠธ์—์„œ ๋ช…ํ™•ํ•œ(disambiguated) MRs๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ์—ฌ๋Ÿฌ ๋ฆฌ์†Œ์Šค๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์˜ ํ•™์Šต์„ multi-taskingํ•œ ๊ฒƒ
  • ์ตœ์ข… ์‹œ์Šคํ…œ์€ ์—ฌ๋Ÿฌ ๋ฆฌ์†Œ์Šค์— ๋Œ€ํ•œ ์ง€์‹์„ ์ผ๋ฐ˜ํ™”ํ•˜๊ณ  ์ด๋ฅผ ์›์‹œ ํ…์ŠคํŠธ์— ์—ฐ๊ฒฐํ•˜๋Š” ๊ฒƒ์„ ํ†ตํ•ด์„œ energy function ์•ˆ์—์„œ ๋ฌธ์žฅ์˜ ๊นŠ์€ ์˜๋ฏธ๋ฅผ ์ž ์žฌ์ ์œผ๋กœ ํฌ์ฐฉํ•  ์ˆ˜ ์žˆ์Œ

 

 

 

์ฐธ๊ณ  ์ž๋ฃŒ 

 

https://kilian.evang.name/sp/lectures/intro.pdf

 

https://towardsdatascience.com/meaning-representation-and-srl-assuming-there-is-some-meaning-741f35bfdd6

 

Meaning Representation and SRLโ€‹: assuming there is some meaning

What is meaning Representation

towardsdatascience.com

 

https://excelsior-cjh.tistory.com/64

 

Chap01-2 : WordNet, Part-Of-Speech(POS)

1. Looking up Synsets for a word in WordNet WordNet(์›Œ๋“œ๋„ท)์€ ์˜์–ด์˜ ์˜๋ฏธ ์–ดํœ˜๋ชฉ๋ก์ด๋‹ค. WordNet์€ ์˜์–ด ๋‹จ์–ด๋ฅผ 'synset'์ด๋ผ๋Š” ์œ ์˜์–ด ์ง‘๋‹จ(๋™์˜์–ด ์ง‘ํ•ฉ)์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜์—ฌ ๊ฐ„๋žตํ•˜๊ณ  ์ผ๋ฐ˜์ ์ธ ์ •์˜๋ฅผ ์ œ๊ณตํ•˜๊ณ ,..

excelsior-cjh.tistory.com

 

https://velog.io/@riverdeer/Multi-task-Learning

 

Multi-task Learning

Multi-task Learning์— ๋Œ€ํ•œ ์—ฌ๋Ÿฌ ์ž๋ฃŒ๋ฅผ ๋ชจ์•„๋†“์€ ํฌ์ŠคํŒ…์ž…๋‹ˆ๋‹ค.

velog.io

 

https://en.wikipedia.org/wiki/Knowledge_acquisition

 

Knowledge acquisition - Wikipedia

Process used to define the rules and ontologies required for a knowledge-based system Knowledge acquisition is the process used to define the rules and ontologies required for a knowledge-based system. The phrase was first used in conjunction with expert s

en.wikipedia.org

 

https://bab2min.tistory.com/576

 

๋‹จ์–ด ์˜๋ฏธ ์ค‘์˜์„ฑ ํ•ด์†Œ(Word Sense Disambiguation) ๊ธฐ์ˆ ๋“ค

์–ธ์–ด์—๋Š” ๋‹ค๋ฅธ ๋‹จ์–ด์ด์ง€๋งŒ ํ˜•ํƒœ๊ฐ€ ๊ฐ™์€ ๋™์ฒ ์ด์˜์–ด(๋˜๋Š” ์†Œ๋ฆฌ๊ฐ€ ๊ฐ™์ง€๋งŒ ๋‹ค๋ฅธ ๋‹จ์–ด์ธ ๋™์Œ์ด์˜์–ด)๋„ ๋งŽ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฐ™์€ ๋‹จ์–ด๋ผ ํ• ์ง€๋ผ๋„ ๋งฅ๋ฝ์— ๋”ฐ๋ผ ์“ฐ์ด๋Š” ์˜๋ฏธ๊ฐ€ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํ•ด๋‹น

bab2min.tistory.com

 

https://post.naver.com/viewer/postView.naver?volumeNo=31743752&memberNo=52249799

 

[ICLR 2021] 2ํŽธ: ICLR 2021 ์† Generative model ํŠธ๋ Œ๋“œ

[BY LG AI์—ฐ๊ตฌ์›] ๋ฉ”ํƒ€๋ฒ„์Šค๋ฅผ ์ด์šฉํ•œ ์˜จ๋ผ์ธ ํ•™ํšŒ์ด๋ฒˆ ICLR 2021์€ ์ฝ”๋กœ๋‚˜๋กœ ์ธํ•ด ์ž‘๋…„๊ณผ ๊ฐ™์ด virtual ...

m.post.naver.com

 

https://www.semanticscholar.org/paper/Exploiting-links-in-WordNet-hierarchy-for-word-of-Kolte-Bhirud/6bfe45b416900ebba6957c7be4d41233047bcb59

 

Exploiting links in WordNet hierarchy for word sense disambiguation of nouns | Semantic Scholar

Sense's definitions of the specific word, "Synset" definitions, the "Hypernymy" relation, and definitions ofThe context features (words in the same sentence) are retrieved from the WordNet database and used as an input of the Disambiguation algorithm. Word

www.semanticscholar.org

 

https://medium.com/sciforce/text-preprocessing-for-nlp-and-machine-learning-tasks-3e077aa4946e

 

Text Preprocessing for NLP and Machine Learning Tasks

We go into detail of text preprocessing for NLP. We talk about such steps as segmentation, cleaning, normalization, annotation and analysis.

medium.com

 

https://byteiota.com/pos-tagging/

 

Part Of Speech Tagging – POS Tagging in NLP | byteiota

Part of Speech Tagging deals with automatic assignment of POS tag to the words in a given sentence. POS tagging is achieved using NLP techniques.

byteiota.com

 

https://jynee.github.io/NLP%EA%B8%B0%EC%B4%88_3/

 

(NLP ๊ธฐ์ดˆ) ๋ฌธ์„œ ์ •๋ณด ์ถ”์ถœ

NLP ์ •๊ทœํ‘œํ˜„์‹ ์ฒญํ‚น ์นญํ‚น ๋ฌธ์„œ ์ •๋ณด ์ถ”์ถœ ์ •ํ•ด์ง„ ํŒจํ„ด์„ ์‚ฌ์šฉํ•ด์„œ ํŒจํ„ด์— ์ผ์น˜ํ•˜๋Š” ๋ฐ์ดํ„ฐ ๊ฒ€์ƒ‰์„ ์ง€์›ํ•˜๋Š” ํ‘œํ˜„์‹ ์ •๊ทœํ‘œํ˜„์‹์— ์“ฐ์ด๋Š” ํŠน์ˆ˜๋ฌธ์ž : ์•„๋ฌด ๋ฌธ์ž๋‚˜ ์—ฌ๋Ÿฌ ๊ฐœ : } { ์•ˆ์˜ ๋‚ด์šฉ ์ œ์™ธ =

jynee.github.io

 

https://paperswithcode.com/task/semantic-role-labeling

 

Papers with Code - Semantic Role Labeling

Semantic role labeling aims to model the predicate-argument structure of a sentence and is often described as answering "Who did what to whom". BIO notation is typically used for semantic role labeling. Example: | Housing | starts | are | expected | to | q

paperswithcode.com

 

https://velog.io/@contea95/%ED%83%90%EC%9A%95%EB%B2%95%EA%B7%B8%EB%A6%AC%EB%94%94-%EC%95%8C%EA%B3%A0%EB%A6%AC%EC%A6%98

 

ํƒ์š•๋ฒ•(๊ทธ๋ฆฌ๋””) ์•Œ๊ณ ๋ฆฌ์ฆ˜

ํƒ์š•๋ฒ•(์ดํ•˜ '๊ทธ๋ฆฌ๋””') ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋ž€ ํ˜„์žฌ ์ƒํ™ฉ์—์„œ ๊ฐ€์žฅ ์ข‹์€ ๊ฒƒ(์ตœ์„ ์˜ ์„ ํƒ)์„ ๊ณ ๋ฅด๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๋”” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋™์  ํ”„๋กœ๊ทธ๋ž˜๋ฐ์„ ๊ฐ„๋‹จํ•œ ๋ฌธ์ œ ํ•ด๊ฒฐ์— ์‚ฌ์šฉํ•˜๋ฉด ์ง€๋‚˜์น˜๊ฒŒ ๋งŽ

velog.io

 

https://seamless.tistory.com/38

 

๋”ฅ๋Ÿฌ๋‹(Deep learning) ์‚ดํŽด๋ณด๊ธฐ 2ํƒ„

์ง€๋‚œ ํฌ์ŠคํŠธ์— Deep learning ์‚ดํŽด๋ณด๊ธฐ 1ํƒ„์„ ํ†ตํ•ด ๋”ฅ๋Ÿฌ๋‹์˜ ๊ฐœ์š”์™€ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ, ๊ทธ๋ฆฌ๊ณ  Underfitting์˜ ๋ฌธ์ œ์ ๊ณผ ํ•ด๊ฒฐ๋ฐฉ๋ฒ•์— ๊ด€ํ•ด ์•Œ์•„๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿผ ์˜ค๋Š˜์€ ์ด์–ด์„œ Deep learning์—์„œ ํ•™์Šต์ด ๋Š๋ฆฐ

seamless.tistory.com

 

https://ifdean.tistory.com/3

 

NLP ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ์…‹ - ์˜์–ด์™€ ํ•œ๊ตญ์–ด

์ž์—ฐ์–ด์ฒ˜๋ฆฌ ํƒœ์Šคํฌ์— ํ™œ์šฉ๋˜๋Š” ์ฃผ์š” ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ์…‹์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ ์–ธ์–ด๊ถŒ์—์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ตฌ์ถ•๋˜๊ณ  ์žˆ๋Š”๋ฐ, ๊ทธ ์ค‘์—์„œ๋„ ์˜์–ด์™€ ํ•œ๊ตญ์–ด๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์•Œ์•„๋ด…๋‹ˆ๋‹ค. ##์ฐธ๊ณ  ์•„๋ž˜๋Š” ๊ตฌ๋ฌธ๋ถ„์„

ifdean.tistory.com

 

https://velog.io/@raqoon886/StructuredEmbeddings

 

SE ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ - Learning Structured Embeddings of Knowledge Bases

ํ•ด๋‹น ๋…ผ๋ฌธ์€ Knowledge Base์˜ Structured Embedding ๋ฐฉ๋ฒ•์— ๊ด€ํ•œ ๊ธ€์ด๋‹ค.

velog.io

 

https://glanceyes.tistory.com/entry/Deep-Learning-%EC%B5%9C%EC%A0%81%ED%99%94Optimization

 

๋”ฅ ๋Ÿฌ๋‹์—์„œ์˜ ์ผ๋ฐ˜ํ™”(Generalization)์™€ ์ตœ์ ํ™”(Optimization)

2022๋…„ 2์›” 7์ผ(์›”)๋ถ€ํ„ฐ 11์ผ(๊ธˆ)๊นŒ์ง€ ๋„ค์ด๋ฒ„ ๋ถ€์ŠคํŠธ์บ ํ”„(boostcamp) AI Tech ๊ฐ•์˜๋ฅผ ๋“ค์œผ๋ฉด์„œ ๊ฐœ์ธ์ ์œผ๋กœ ์ค‘์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐ๋˜๊ฑฐ๋‚˜ ์งš๊ณ  ๋„˜์–ด๊ฐ€์•ผ ํ•  ํ•ต์‹ฌ ๋‚ด์šฉ๋“ค๋งŒ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋ฉ”๋ชจํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. ํ‹€๋ฆฌ๊ฑฐ

glanceyes.tistory.com

 

https://ddiri01.tistory.com/321

 

precision at K, MAP, recall at K

ranking system ๋˜๋Š” recommander ์‹œ์Šคํ…œ์—์„œ ์ข‹์€ ์ถ”์ฒœ(๋žญํฌ)๋ฅผ ํ–ˆ๋Š”์ง€ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ precision at K, recall at K ์„ ์‚ดํŽด๋ณด์ž. Top K๊ฐœ์˜ ๊ฒฐ๊ณผ๋กœ Precision(์ •๋ฐ€๋„)๋ฅผ ๊ณ„์‚ฐ -> Precision at K ์ถ”์ฒœ ๋œ ๊ฒฐ..

ddiri01.tistory.com

 

https://seongkyun.github.io/study/2019/02/03/KDE/

 

Kernel Density Estimation (์ปค๋„ ๋ฐ€๋„ ์ถ”์ •) · Seongkyun Han's blog

Kernel Density Estimation (์ปค๋„ ๋ฐ€๋„ ์ถ”์ •) 03 Feb 2019 | kernel density estimation KDE ์ปค๋„ ๋ฐ€๋„ ์ถ”์ • Kernel Density Estimation (์ปค๋„ ๋ฐ€๋„ ์ถ”์ •) CNN์„ ์ด์šฉํ•œ ์‹คํ—˜์„ ํ–ˆ๋Š”๋ฐ ์ง๊ด€์ ์œผ๋กœ๋Š” ๊ฒฐ๊ณผ๊ฐ€ ์ข‹์•„์กŒ์ง€๋งŒ ์™œ ์ข‹์•„

seongkyun.github.io

 

https://wdprogrammer.tistory.com/35

 

[NLP] ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ํ•„์ˆ˜ ๊ฐœ๋… ์ •๋ฆฌ: Language model, Representation

2018-01-20-nlp-1 Language Model(์–ธ์–ด ๋ชจ๋ธ) [์ •์˜] ๋‹จ์–ด ์‹œํ€€์Šค์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ๋กœ, ์‹œํ€€์Šค1 ๋‚ด ๋‹จ์–ด ํ† ํฐ๋“ค์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ํ• ๋‹นํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค. m๊ฐœ์˜ ๋‹จ์–ด๊ฐ€ ์ฃผ์–ด์งˆ ๋•Œ, m๊ฐœ์˜ ๋‹จ์–ด ์‹œํ€€์Šค๊ฐ€ ๋‚˜ํƒ€๋‚  ํ™•

wdprogrammer.tistory.com

 

 

'๐Ÿ“‘ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ > NLP' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Word2Vec] Distributed Representations of Words and Phrases and their Compositionality  (0) 2022.08.11
Contents

ํฌ์ŠคํŒ… ์ฃผ์†Œ๋ฅผ ๋ณต์‚ฌํ–ˆ์Šต๋‹ˆ๋‹ค

์ด ๊ธ€์ด ๋„์›€์ด ๋˜์—ˆ๋‹ค๋ฉด ๊ณต๊ฐ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค!