Skip to content

Latest commit

ย 

History

History
265 lines (197 loc) ยท 17.7 KB

huggingface_pytorch-transformers.md

File metadata and controls

265 lines (197 loc) ยท 17.7 KB
layout background-class body-class title summary category image author tags github-link github-id featured_image_1 featured_image_2 accelerator order demo-model-link
hub_detail
hub-background
hub
PyTorch-Transformers
PyTorch implementations of popular NLP Transformers
researchers
huggingface-logo.png
HuggingFace Team
nlp
huggingface/transformers
no-image
no-image
cuda-optional
10

๋ชจ๋ธ ์„ค๋ช…

PyTorch-Transformers (์ด์ „์—” pytorch-pretrained-bert์œผ๋กœ ์•Œ๋ ค์ง) ๋Š” ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP)๋ฅผ ์œ„ํ•œ ์ตœ์‹ ์‹ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ๋“ค์„ ๋ชจ์•„๋†“์€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค.

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ํ˜„์žฌ ๋‹ค์Œ ๋ชจ๋ธ๋“ค์— ๋Œ€ํ•œ ํŒŒ์ดํ† ์น˜ ๊ตฌํ˜„๊ณผ ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜, ์‚ฌ์šฉ ์Šคํฌ๋ฆฝํŠธ, ๋ณ€ํ™˜ ์œ ํ‹ธ๋ฆฌํ‹ฐ๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

  1. BERT ๋Š” Google์—์„œ ๋ฐœํ‘œํ•œ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๊ณต๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. (์ €์ž: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova)
  2. GPT ๋Š” OpenAI์—์„œ ๋ฐœํ‘œํ•œ Improving Language Understanding by Generative Pre-Training ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๊ณต๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. (์ €์ž: Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever)
  3. GPT-2 ๋Š” OpenAI์—์„œ ๋ฐœํ‘œํ•œ Language Models are Unsupervised Multitask Learners ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๊ณต๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. (์ €์ž: Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei**, Ilya Sutskever**)
  4. Transformer-XL ๋Š” Google/CMU์—์„œ ๋ฐœํ‘œํ•œ Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๊ณต๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. (์ €์ž: Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov)
  5. XLNet ๋Š” Google/CMU์—์„œ ๋ฐœํ‘œํ•œ โ€‹XLNet: Generalized Autoregressive Pretraining for Language Understanding ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๊ณต๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. (์ €์ž: Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le)
  6. XLM ๋Š” Facebook์—์„œ ๋ฐœํ‘œํ•œ Cross-lingual Language Model Pretraining ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๊ณต๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. (์ €์ž: Guillaume Lample, Alexis Conneau)
  7. RoBERTa ๋Š” Facebook์—์„œ ๋ฐœํ‘œํ•œ Robustly Optimized BERT Pretraining Approach ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๊ณต๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. (์ €์ž: Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov)
  8. DistilBERT ๋Š” HuggingFace์—์„œ ๊ฒŒ์‹œํ•œ Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version ofย BERT ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŒ…๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. (์ €์ž: Victor Sanh, Lysandre Debut, Thomas Wolf)

์—ฌ๊ธฐ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๊ตฌ์„ฑ์š”์†Œ๋“ค์€ pytorch-transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ์žˆ๋Š” AutoModel ๊ณผ AutoTokenizer ํด๋ž˜์Šค๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์š”๊ตฌ ์‚ฌํ•ญ

ํŒŒ์ดํ† ์น˜ ํ—ˆ๋ธŒ์— ์žˆ๋Š” ๋Œ€๋ถ€๋ถ„์˜ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค๊ณผ ๋‹ค๋ฅด๊ฒŒ, BERT๋Š” ๋ณ„๋„์˜ ํŒŒ์ด์ฌ ํŒจํ‚ค์ง€๋“ค์„ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

pip install tqdm boto3 requests regex sentencepiece sacremoses

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฉ”์†Œ๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • config: ์ง€์ •ํ•œ ๋ชจ๋ธ ๋˜๋Š” ๊ฒฝ๋กœ์— ํ•ด๋‹นํ•˜๋Š” ์„ค์ •๊ฐ’(configuration)์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • tokenizer: ์ง€์ •ํ•œ ๋ชจ๋ธ ๋˜๋Š” ๊ฒฝ๋กœ์— ํ•ด๋‹นํ•˜๋Š” ํ† ํฌ๋‚˜์ด์ €(tokenizer)๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • model: ์ง€์ •ํ•œ ๋ชจ๋ธ ๋˜๋Š” ๊ฒฝ๋กœ์— ํ•ด๋‹นํ•˜๋Š” ๋ชจ๋ธ์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • modelForCausalLM: ์ง€์ •ํ•œ ๋ชจ๋ธ ๋˜๋Š” ๊ฒฝ๋กœ์— ํ•ด๋‹นํ•˜๋Š”, ์–ธ์–ด ๋ชจ๋ธ๋ง ํ—ค๋“œ(language modeling head)๊ฐ€ ์ถ”๊ฐ€๋œ ๋ชจ๋ธ์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • modelForSequenceClassification: ์ง€์ •ํ•œ ๋ชจ๋ธ ๋˜๋Š” ๊ฒฝ๋กœ์— ํ•ด๋‹นํ•˜๋Š”, ์‹œํ€€์Šค ๋ถ„๋ฅ˜๊ธฐ(sequence classifier)๊ฐ€ ์ถ”๊ฐ€๋œ ๋ชจ๋ธ์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • modelForQuestionAnswering: ์ง€์ •ํ•œ ๋ชจ๋ธ ๋˜๋Š” ๊ฒฝ๋กœ์— ํ•ด๋‹นํ•˜๋Š”, ์งˆ์˜ ์‘๋‹ต ํ—ค๋“œ(question answering head)๊ฐ€ ์ถ”๊ฐ€๋œ ๋ชจ๋ธ์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์˜ ๋ชจ๋“  ๋ฉ”์†Œ๋“œ๋“ค์€ ๋‹ค์Œ ์ธ์ž๋ฅผ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค: pretrained_model_or_path ๋Š” ๋ฐ˜ํ™˜ํ•  ์ธ์Šคํ„ด์Šค์— ๋Œ€ํ•œ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ ๋˜๋Š” ๊ฒฝ๋กœ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฌธ์ž์—ด์ž…๋‹ˆ๋‹ค. ๊ฐ ๋ชจ๋ธ์— ๋Œ€ํ•ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค์–‘ํ•œ ์ฒดํฌํฌ์ธํŠธ(checkpoint)๊ฐ€ ์žˆ๊ณ , ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์•„๋ž˜์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ์€ pytorch-transformers ๋ฌธ์„œ์˜ pre-trained models ์„น์…˜์— ๋‚˜์—ด๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฌธ์„œ

๋‹ค์Œ์€ ๊ฐ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฉ”์†Œ๋“œ๋“ค์˜ ์‚ฌ์šฉ๋ฒ•์„ ์ž์„ธํžˆ ์„ค๋ช…ํ•˜๋Š” ๋ช‡ ๊ฐ€์ง€ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.

ํ† ํฌ๋‚˜์ด์ €

ํ† ํฌ๋‚˜์ด์ € ๊ฐ์ฒด๋กœ ๋ฌธ์ž์—ด์„ ๋ชจ๋ธ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ† ํฐ์œผ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ๋ชจ๋ธ๋งˆ๋‹ค ๊ณ ์œ ํ•œ ํ† ํฌ๋‚˜์ด์ €๊ฐ€ ์žˆ๊ณ , ์ผ๋ถ€ ํ† ํฐํ™” ๋ฉ”์†Œ๋“œ๋Š” ํ† ํฌ๋‚˜์ด์ €์— ๋”ฐ๋ผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์ „์ฒด ๋ฌธ์„œ๋Š” ์—ฌ๊ธฐ์—์„œ ํ™•์ธํ•ด๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

import torch
tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-uncased')    # S3 ๋ฐ ์บ์‹œ์—์„œ ์–ดํœ˜(vocabulary)๋ฅผ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', './test/bert_saved_model/')  # `save_pretrained('./test/saved_model/')`๋ฅผ ํ†ตํ•ด ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์ €์žฅํ•œ ๊ฒฝ์šฐ์— ๋กœ๋”ฉํ•˜๋Š” ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.

๋ชจ๋ธ

๋ชจ๋ธ ๊ฐ์ฒด๋Š” nn.Module ๋ฅผ ์ƒ์†ํ•˜๋Š” ๋ชจ๋ธ์˜ ์ธ์Šคํ„ด์Šค์ž…๋‹ˆ๋‹ค. ๊ฐ ๋ชจ๋ธ์€ ๋กœ์ปฌ ํŒŒ์ผ ํ˜น์€ ๋””๋ ‰ํ„ฐ๋ฆฌ๋‚˜ ์‚ฌ์ „ ํ•™์Šตํ•  ๋•Œ ์‚ฌ์šฉ๋œ ์„ค์ •๊ฐ’(์•ž์„œ ์„ค๋ช…ํ•œ config)์œผ๋กœ๋ถ€ํ„ฐ ์ €์žฅ/๋กœ๋”ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•จ๊ป˜ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ๊ฐ ๋ชจ๋ธ์€ ๋‹ค๋ฅด๊ฒŒ ๋™์ž‘ํ•˜๋ฉฐ, ์—ฌ๋Ÿฌ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์˜ ์ „์ฒด ๊ฐœ์š”๋Š” ์—ฌ๊ธฐ์—์„œ ํ™•์ธํ•ด๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

import torch
model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-uncased')    # S3์™€ ์บ์‹œ๋กœ๋ถ€ํ„ฐ ๋ชจ๋ธ๊ณผ ์„ค์ •๊ฐ’์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
model = torch.hub.load('huggingface/pytorch-transformers', 'model', './test/bert_model/')  # `save_pretrained('./test/saved_model/')`๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์„ ์ €์žฅํ•œ ๊ฒฝ์šฐ์— ๋กœ๋”ฉํ•˜๋Š” ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.
model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-uncased', output_attentions=True)  # ์„ค์ •๊ฐ’์„ ์—…๋ฐ์ดํŠธํ•˜์—ฌ ๋กœ๋”ฉํ•ฉ๋‹ˆ๋‹ค.
assert model.config.output_attentions == True
# ํŒŒ์ดํ† ์น˜ ๋ชจ๋ธ ๋Œ€์‹  ํ…์„œํ”Œ๋กœ์šฐ ์ฒดํฌํฌ์ธํŠธ ํŒŒ์ผ๋กœ๋ถ€ํ„ฐ ๋กœ๋”ฉํ•ฉ๋‹ˆ๋‹ค. (๋Š๋ฆผ)
config = AutoConfig.from_json_file('./tf_model/bert_tf_model_config.json')
model = torch.hub.load('huggingface/pytorch-transformers', 'model', './tf_model/bert_tf_checkpoint.ckpt.index', from_tf=True, config=config)

์–ธ์–ด ๋ชจ๋ธ๋ง ํ—ค๋“œ๊ฐ€ ์ถ”๊ฐ€๋œ ๋ชจ๋ธ

์•ž์„œ ์–ธ๊ธ‰ํ•œ, ์–ธ์–ด ๋ชจ๋ธ๋ง ํ—ค๋“œ๊ฐ€ ์ถ”๊ฐ€๋œ model ์ธ์Šคํ„ด์Šค์ž…๋‹ˆ๋‹ค.

import torch
model = torch.hub.load('huggingface/transformers', 'modelForCausalLM', 'gpt2')    # huggingface.co์™€ ์บ์‹œ๋กœ๋ถ€ํ„ฐ ๋ชจ๋ธ๊ณผ ์„ค์ •๊ฐ’์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
model = torch.hub.load('huggingface/transformers', 'modelForCausalLM', './test/saved_model/')  # `save_pretrained('./test/saved_model/')`๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์„ ์ €์žฅํ•œ ๊ฒฝ์šฐ์— ๋กœ๋”ฉํ•˜๋Š” ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.
model = torch.hub.load('huggingface/transformers', 'modelForCausalLM', 'gpt2', output_attentions=True)  # ์„ค์ •๊ฐ’์„ ์—…๋ฐ์ดํŠธํ•˜์—ฌ ๋กœ๋”ฉํ•ฉ๋‹ˆ๋‹ค.
assert model.config.output_attentions == True
# ํŒŒ์ดํ† ์น˜ ๋ชจ๋ธ ๋Œ€์‹  ํ…์„œํ”Œ๋กœ์šฐ ์ฒดํฌํฌ์ธํŠธ ํŒŒ์ผ๋กœ๋ถ€ํ„ฐ ๋กœ๋”ฉํ•ฉ๋‹ˆ๋‹ค. (๋Š๋ฆผ)
config = AutoConfig.from_pretrained('./tf_model/gpt_tf_model_config.json')
model = torch.hub.load('huggingface/transformers', 'modelForCausalLM', './tf_model/gpt_tf_checkpoint.ckpt.index', from_tf=True, config=config)

์‹œํ€€์Šค ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์ถ”๊ฐ€๋œ ๋ชจ๋ธ

์•ž์„œ ์–ธ๊ธ‰ํ•œ, ์‹œํ€€์Šค ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์ถ”๊ฐ€๋œ model ์ธ์Šคํ„ด์Šค์ž…๋‹ˆ๋‹ค.

import torch
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForSequenceClassification', 'bert-base-uncased')    # S3์™€ ์บ์‹œ๋กœ๋ถ€ํ„ฐ ๋ชจ๋ธ๊ณผ ์„ค์ •๊ฐ’์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForSequenceClassification', './test/bert_model/')  # `save_pretrained('./test/saved_model/')`๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์„ ์ €์žฅํ•œ ๊ฒฝ์šฐ์— ๋กœ๋”ฉํ•˜๋Š” ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForSequenceClassification', 'bert-base-uncased', output_attention=True)  # ์„ค์ •๊ฐ’์„ ์—…๋ฐ์ดํŠธํ•˜์—ฌ ๋กœ๋”ฉํ•ฉ๋‹ˆ๋‹ค.
assert model.config.output_attention == True
# ํŒŒ์ดํ† ์น˜ ๋ชจ๋ธ ๋Œ€์‹  ํ…์„œํ”Œ๋กœ์šฐ ์ฒดํฌํฌ์ธํŠธ ํŒŒ์ผ๋กœ๋ถ€ํ„ฐ ๋กœ๋”ฉํ•ฉ๋‹ˆ๋‹ค. (๋Š๋ฆผ)
config = AutoConfig.from_json_file('./tf_model/bert_tf_model_config.json')
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForSequenceClassification', './tf_model/bert_tf_checkpoint.ckpt.index', from_tf=True, config=config)

์งˆ์˜ ์‘๋‹ต ํ—ค๋“œ๊ฐ€ ์ถ”๊ฐ€๋œ ๋ชจ๋ธ

์•ž์„œ ์–ธ๊ธ‰ํ•œ, ์งˆ์˜ ์‘๋‹ต ํ—ค๋“œ๊ฐ€ ์ถ”๊ฐ€๋œ model ์ธ์Šคํ„ด์Šค์ž…๋‹ˆ๋‹ค.

import torch
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForQuestionAnswering', 'bert-base-uncased')    # S3์™€ ์บ์‹œ๋กœ๋ถ€ํ„ฐ ๋ชจ๋ธ๊ณผ ์„ค์ •๊ฐ’์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForQuestionAnswering', './test/bert_model/')  # `save_pretrained('./test/saved_model/')`๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์„ ์ €์žฅํ•œ ๊ฒฝ์šฐ์— ๋กœ๋”ฉํ•˜๋Š” ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForQuestionAnswering', 'bert-base-uncased', output_attention=True)  # ์„ค์ •๊ฐ’์„ ์—…๋ฐ์ดํŠธํ•˜์—ฌ ๋กœ๋”ฉํ•ฉ๋‹ˆ๋‹ค.
assert model.config.output_attention == True
# ํŒŒ์ดํ† ์น˜ ๋ชจ๋ธ ๋Œ€์‹  ํ…์„œํ”Œ๋กœ์šฐ ์ฒดํฌํฌ์ธํŠธ ํŒŒ์ผ๋กœ๋ถ€ํ„ฐ ๋กœ๋”ฉํ•ฉ๋‹ˆ๋‹ค. (๋Š๋ฆผ)
config = AutoConfig.from_json_file('./tf_model/bert_tf_model_config.json')
model = torch.hub.load('huggingface/pytorch-transformers', 'modelForQuestionAnswering', './tf_model/bert_tf_checkpoint.ckpt.index', from_tf=True, config=config)

์„ค์ •๊ฐ’

์„ค์ •๊ฐ’์€ ์„ ํƒ ์‚ฌํ•ญ์ž…๋‹ˆ๋‹ค. ์„ค์ •๊ฐ’ ๊ฐ์ฒด๋Š” ๋ชจ๋ธ์— ๊ด€ํ•œ ์ •๋ณด, ์˜ˆ๋ฅผ ๋“ค์–ด ํ—ค๋“œ๋‚˜ ๋ ˆ์ด์–ด์˜ ๊ฐœ์ˆ˜, ๋ชจ๋ธ์ด ์–ดํ…์…˜(attentions) ๋˜๋Š” ์€๋‹‰ ์ƒํƒœ(hidden states)๋ฅผ ์ถœ๋ ฅํ•ด์•ผ ํ•˜๋Š”์ง€, ๋˜๋Š” ๋ชจ๋ธ์ด TorchScript์— ๋งž๊ฒŒ ์กฐ์ •๋˜์–ด์•ผ ํ•˜๋Š”์ง€ ์—ฌ๋ถ€์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ๋ชจ๋ธ์— ๋”ฐ๋ผ ๋‹ค์–‘ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ „์ฒด ๋ฌธ์„œ๋Š” ์—ฌ๊ธฐ์—์„œ ํ™•์ธํ•ด๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

import torch
config = torch.hub.load('huggingface/pytorch-transformers', 'config', 'bert-base-uncased')  # S3์™€ ์บ์‹œ๋กœ๋ถ€ํ„ฐ ๋ชจ๋ธ๊ณผ ์„ค์ •๊ฐ’์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
config = torch.hub.load('huggingface/pytorch-transformers', 'config', './test/bert_saved_model/')  # `save_pretrained('./test/saved_model/')`๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์„ ์ €์žฅํ•œ ๊ฒฝ์šฐ์— ๋กœ๋”ฉํ•˜๋Š” ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.
config = torch.hub.load('huggingface/pytorch-transformers', 'config', './test/bert_saved_model/my_configuration.json')
config = torch.hub.load('huggingface/pytorch-transformers', 'config', 'bert-base-uncased', output_attention=True, foo=False)
assert config.output_attention == True
config, unused_kwargs = torch.hub.load('huggingface/pytorch-transformers', 'config', 'bert-base-uncased', output_attention=True, foo=False, return_unused_kwargs=True)
assert config.output_attention == True
assert unused_kwargs == {'foo': False}

# ์„ค์ •๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋กœ๋”ฉํ•ฉ๋‹ˆ๋‹ค.
config = torch.hub.load('huggingface/pytorch-transformers', 'config', 'bert-base-uncased')
config.output_attentions = True
config.output_hidden_states = True
model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-uncased', config=config)
# ๋ชจ๋ธ์€ ์ด์ œ ์–ดํ…์…˜๊ณผ ์€๋‹‰ ์ƒํƒœ๋„ ์ถœ๋ ฅํ•˜๋„๋ก ์„ค์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉ ์˜ˆ์‹œ

๋‹ค์Œ์€ ์ž…๋ ฅ ํ…์ŠคํŠธ๋ฅผ ํ† ํฐํ™”ํ•œ ํ›„ BERT ๋ชจ๋ธ์— ์ž…๋ ฅ์œผ๋กœ ๋„ฃ์–ด์„œ ๊ณ„์‚ฐ๋œ ์€๋‹‰ ์ƒํƒœ๋ฅผ ๊ฐ€์ ธ์˜ค๊ฑฐ๋‚˜, ์–ธ์–ด ๋ชจ๋ธ๋ง BERT ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ ๋งˆ์Šคํ‚น๋œ ํ† ํฐ๋“ค์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.

๋จผ์ €, ์ž…๋ ฅ์„ ํ† ํฐํ™”ํ•˜๊ธฐ

import torch
tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-cased')

text_1 = "Who was Jim Henson ?"
text_2 = "Jim Henson was a puppeteer"

# ์ฃผ์œ„์— ํŠน์ˆ˜ ํ† ํฐ์ด ์žˆ๋Š” ์ž…๋ ฅ์„ ํ† ํฐํ™”ํ•ฉ๋‹ˆ๋‹ค. (BERT์—์„œ๋Š” ์ฒ˜์Œ๊ณผ ๋์— ๊ฐ๊ฐ [CLS]์™€ [SEP] ํ† ํฐ์ด ์žˆ์Šต๋‹ˆ๋‹ค.)
indexed_tokens = tokenizer.encode(text_1, text_2, add_special_tokens=True)

BertModel์„ ์‚ฌ์šฉํ•˜์—ฌ, ์ž…๋ ฅ ๋ฌธ์žฅ์„ ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด ์€๋‹‰ ์ƒํƒœ์˜ ์‹œํ€€์Šค๋กœ ์ธ์ฝ”๋”ฉํ•˜๊ธฐ

# ์ฒซ๋ฒˆ์งธ ๋ฌธ์žฅ A์™€ ๋‘๋ฒˆ์งธ ๋ฌธ์žฅ B์˜ ์ธ๋ฑ์Šค๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. (๋…ผ๋ฌธ ์ฐธ์กฐ)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]

# ์ž…๋ ฅ๊ฐ’์„ PyTorch tensor๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
segments_tensors = torch.tensor([segments_ids])
tokens_tensor = torch.tensor([indexed_tokens])

model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-cased')

with torch.no_grad():
    encoded_layers, _ = model(tokens_tensor, token_type_ids=segments_tensors)

modelForMaskedLM์„ ์‚ฌ์šฉํ•˜์—ฌ, BERT๋กœ ๋งˆ์Šคํ‚น๋œ ํ† ํฐ ์˜ˆ์ธกํ•˜๊ธฐ

# `BertForMaskedLM`๋ฅผ ํ†ตํ•ด ์˜ˆ์ธกํ•  ํ† ํฐ์„ ๋งˆ์Šคํ‚น(๋งˆ์Šคํฌ ํ† ํฐ์œผ๋กœ ๋ณ€ํ™˜)ํ•ฉ๋‹ˆ๋‹ค.
masked_index = 8
indexed_tokens[masked_index] = tokenizer.mask_token_id
tokens_tensor = torch.tensor([indexed_tokens])

masked_lm_model = torch.hub.load('huggingface/pytorch-transformers', 'modelForMaskedLM', 'bert-base-cased')

with torch.no_grad():
    predictions = masked_lm_model(tokens_tensor, token_type_ids=segments_tensors)

# ์˜ˆ์ธก๋œ ํ† ํฐ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
predicted_index = torch.argmax(predictions[0][0], dim=1)[masked_index].item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
assert predicted_token == 'Jim'

modelForQuestionAnswering์„ ์‚ฌ์šฉํ•˜์—ฌ, BERT๋กœ ์งˆ์˜ ์‘๋‹ตํ•˜๊ธฐ

question_answering_model = torch.hub.load('huggingface/pytorch-transformers', 'modelForQuestionAnswering', 'bert-large-uncased-whole-word-masking-finetuned-squad')
question_answering_tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-large-uncased-whole-word-masking-finetuned-squad')

# ํ˜•์‹์€ ๋‹จ๋ฝ์ด ๋จผ์ € ์ฃผ์–ด์ง€๊ณ , ๊ทธ ๋‹ค์Œ์— ์งˆ๋ฌธ์ด ์ฃผ์–ด์ง€๋Š” ํ˜•์‹์ž…๋‹ˆ๋‹ค.
text_1 = "Jim Henson was a puppeteer"
text_2 = "Who was Jim Henson ?"
indexed_tokens = question_answering_tokenizer.encode(text_1, text_2, add_special_tokens=True)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
segments_tensors = torch.tensor([segments_ids])
tokens_tensor = torch.tensor([indexed_tokens])

# ์‹œ์ž‘ ๋ฐ ์ข…๋ฃŒ ์œ„์น˜์— ๋Œ€ํ•œ ๋กœ์ง“(logits)์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
with torch.no_grad():
    out = question_answering_model(tokens_tensor, token_type_ids=segments_tensors)

# ๊ฐ€์žฅ ๋†’์€ ๋กœ์ง“์„ ๊ฐ€์ง„ ์˜ˆ์ธก์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
answer = question_answering_tokenizer.decode(indexed_tokens[torch.argmax(out.start_logits):torch.argmax(out.end_logits)+1])
assert answer == "puppeteer"

# ๋˜๋Š” ์‹œ์ž‘ ๋ฐ ์ข…๋ฃŒ ์œ„์น˜์— ๋Œ€ํ•œ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค์˜ ์ดํ•ฉ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. (์ด ์ฝ”๋“œ๊ฐ€ ํ•™์Šต ์‹œ์— ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ ๋ฏธ๋ฆฌ ๋ชจ๋ธ์„ ํ•™์Šต ๋ชจ๋“œ๋กœ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.)
start_positions, end_positions = torch.tensor([12]), torch.tensor([14])
multiple_choice_loss = question_answering_model(tokens_tensor, token_type_ids=segments_tensors, start_positions=start_positions, end_positions=end_positions)

modelForSequenceClassification์„ ์‚ฌ์šฉํ•˜์—ฌ, BERT๋กœ ํŒจ๋Ÿฌํ”„๋ ˆ์ด์ฆˆ(paraphrase) ๋ถ„๋ฅ˜ํ•˜๊ธฐ

sequence_classification_model = torch.hub.load('huggingface/pytorch-transformers', 'modelForSequenceClassification', 'bert-base-cased-finetuned-mrpc')
sequence_classification_tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-cased-finetuned-mrpc')

text_1 = "Jim Henson was a puppeteer"
text_2 = "Who was Jim Henson ?"
indexed_tokens = sequence_classification_tokenizer.encode(text_1, text_2, add_special_tokens=True)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
segments_tensors = torch.tensor([segments_ids])
tokens_tensor = torch.tensor([indexed_tokens])

# ์‹œํ€€์Šค ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ ๋กœ์ง“์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
with torch.no_grad():
    seq_classif_logits = sequence_classification_model(tokens_tensor, token_type_ids=segments_tensors)

predicted_labels = torch.argmax(seq_classif_logits[0]).item()

assert predicted_labels == 0  # MRPC ๋ฐ์ดํ„ฐ์…‹์—์„œ, ์ด๋Š” ๋‘ ๋ฌธ์žฅ์ด ์„œ๋กœ ๋ฐ”๊พธ์–ด ํ‘œํ˜„ํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๊ฒƒ์„ ๋œปํ•ฉ๋‹ˆ๋‹ค.

# ๋˜๋Š” ์‹œํ€€์Šค ๋ถ„๋ฅ˜์— ๋Œ€ํ•œ ์†์‹ค์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. (์ด ์ฝ”๋“œ๊ฐ€ ํ•™์Šต ์‹œ์— ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ ๋ฏธ๋ฆฌ ๋ชจ๋ธ์„ ํ•™์Šต ๋ชจ๋“œ๋กœ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.)
labels = torch.tensor([1])
seq_classif_loss = sequence_classification_model(tokens_tensor, token_type_ids=segments_tensors, labels=labels)