Skip to content

Latest commit

ย 

History

History
351 lines (288 loc) ยท 29.2 KB

README.md

File metadata and controls

351 lines (288 loc) ยท 29.2 KB

LBox Open

A multi-task benchmark for Korean legal language understanding and judgement prediction by LBox

Authors

Updates

  • Dec 2, 2022: We release additional 1024 examples of drunk driving cases for ljp_criminal task. Compared to ljp_criminal data, it includes the parses extracted from the facts (blood alchol level, driving distance, types of car, previous criminal history) and the suspension of exeuction period. See also this issue. The data shall be integrated to ljp_criminal in the next release.

  • Dec 2, 2022: We will present our recent work "Data-efficient End-to-end Information Extraction for Statistical Legal Analysis" at NLLP workshop @ EMNLP22!

  • Nov 8, 2022: We release [legal-mt5-small], a domain adapted mt5-small using precedent_corpus. We also release the legal-mt5-small fine-tuned on the summarization dataset. Both models can be download from here! To use the models, cd [project-dir]; tar xvfz legal-mt5-small.tar.gz.

  • Oct 25, 2022: act_on_special_cases_concerning_the_settlement_of_traffic_accidents_corpus corpus (๊ณ ํ†ต์‚ฌ๊ณ ์ฒ˜๋ฆฌํŠน๋ก€๋ฒ•์œ„๋ฐ˜(์น˜์ƒ)) has been released. The corpus consists of 768 criminal cases. The corpus will be integrated into precedent corpus in the future (the overlap between precedent corpus and defamation corpus-v0.1 is expected). See also this issue.

  • Oct 18, 2022: We release three new datasets casename_classification_plus, statute_classification_plus, and summarization_plus!

  • Oct 2, 2022: defamation corpus-v0.1 has been added. The corpus consists of 1,536 criminal cases related to "defamation (๋ช…์˜ˆํ›ผ์†)". The corpus will be integrated into precedent corpus in the future (at the moment, there can be some overlap between precedent corpus and defamation corpus-v0.1). See also this issue.

  • Sep 2022: Our paper is accepted for publication in NeurIPS 2022 Datasets and Benchmarks track! There will be major updates on the paper, the dataets, and the models soon! Meanwile, one can check the most recent version of our paper from OpenReview

  • Jun 2022: We release lbox-open-v0.2!

    • Two legal judgement prediction tasks, ljp_criminal, ljp-civil, are added to LBox Open.
    • LCube-base, a LBox Legal Language model with 124M parameters, is added.
    • The baseline scores and its training/test scripts are added.
    • Other updates
      • Some missing values in facts fields of casename_classification and statute_classification are updated.
      • case_corpus is renamed to precedent_corpus
  • Mar 2022: We release lbox-open-v0.1!

Paper

A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction

Benchmarks

  • Last updated at Oct 18 2022
Model casename statute ljp-criminal ljp-civil summarization
EM EM F1-fine
F1-imprisonment w/ labor
F1-imprisonment w/o labor
EM R1
R2
RL
KoGPT2 $78.5 \pm 0.3$ $85.7 \pm 0.8$ $49.9 \pm 1.7$
$67.5 \pm 1.1$
$69.2 \pm 1.6$
$66.0 \pm 0.5$ $47.2$
$39.1$
$45.7$
KoGPT2 + d.a. $81.9 \pm 0.2$ $89.4 \pm 0.5$ $49.8$
$65.4$
$70.1$
$64.7 \pm 1.1$ $49.2$
$40.9$
$47.7$
LCube-base (ours) $81.1 \pm 0.3$ $87.6 \pm 0.5$ $46.4 \pm 2.8$
$69.3 \pm 0.3$
$70.3 \pm 0.7$
$67.6 \pm 1.3$ $46.0$
$37.7$
$44.5$
LCube-base + d.a. (ours) $82.7 \pm 0.6$ $89.3 \pm 0.4$ $48.1 \pm 1.2$
$67.4 \pm 1.5$
$69.9 \pm 1.1$
$60.9 \pm 1.1$ $47.8$
$39.5$
$46.4$
mt5-small $81.0 \pm 1.3$ $87.2 \pm 0.3$ $49.1 \pm 1.3$
$66.6 \pm 0.6$
$69.8 \pm 1.0$
$68.9 \pm 0.8$ $56.2$
$47.8$
$54.7$
mt5-small + d.a. $82.2 \pm 0.2$ $88.8 \pm 0.5$ $51.8 \pm 0.7$
$68.9 \pm 0.3$
$70.3 \pm 0.7$
$69.1 \pm 0.1$ $56.2$
$47.7$
$54.8$
  • The errors are estimated from three independent experiments performed with different random seeds.
  • ROUGE scores are computed at word level.
  • d.a. stands for domain adaptation, an additional pre-trainig with Precedent corpus only.

Dataset

How to use the dataset

We use datasets library from HuggingFace.

# !pip install datasets
from datasets import load_dataset

# casename classficiation task
data_cn = load_dataset("lbox/lbox_open", "casename_classification")
ata_cn_plus = load_dataset("lbox/lbox_open", "casename_classification_plus")

# statutes classification task
data_st = load_dataset("lbox/lbox_open", "statute_classification")
data_st_plus = load_dataset("lbox/lbox_open", "statute_classification_plus")

# Legal judgement prediction tasks
data_ljp_criminal = load_dataset("lbox/lbox_open", "ljp_criminal")
data_ljp_civil = load_dataset("lbox/lbox_open", "ljp_civil")

# case summarization task
data_summ = load_dataset("lbox/lbox_open", "summarization")
data_summ_plus = load_dataset("lbox/lbox_open", "summarization_plus")

# precedent corpus
data_corpus = load_dataset("lbox/lbox_open", "precedent_corpus")

Dataset Description

precedent_corpus

  • Korean legal precedent corpus.

  • The corpus consists of 150k cases.

  • About 80k from LAW OPEN DATA and 70k from LBox database.

  • Example

{
  "id": 99990,
  "precedent": "์ฃผ๋ฌธ\nํ”ผ๊ณ ์ธ์„ ์ง•์—ญ 6๊ฐœ์›”์— ์ฒ˜ํ•œ๋‹ค.\n๋‹ค๋งŒ, ์ด ํŒ๊ฒฐ ํ™•์ •์ผ๋กœ๋ถ€ํ„ฐ 1๋…„๊ฐ„ ์œ„ ํ˜•์˜ ์ง‘ํ–‰์„ ์œ ์˜ˆํ•œ๋‹ค.\n\n์ด์œ \n๋ฒ” ์ฃ„ ์‚ฌ ์‹ค\n1. ์‚ฌ๊ธฐ\nํ”ผ๊ณ ์ธ์€ 2020. 12. 15. 16:00๊ฒฝ ๊ฒฝ๋ถ ์น ๊ณก๊ตฐ B์— ์žˆ๋Š” ํ”ผํ•ด์ž C์ด ์šด์˜ํ•˜๋Š” โ€˜Dโ€™์—์„œ, ๋งˆ์น˜ ์ •์ƒ์ ์œผ๋กœ ๋Œ€๊ธˆ์„ ์ง€๊ธ‰ํ•  ๊ฒƒ์ฒ˜๋Ÿผ ํ–‰์„ธํ•˜๋ฉด์„œ ํ”ผํ•ด์ž์—๊ฒŒ ์ˆ ์„ ์ฃผ๋ฌธํ•˜์˜€๋‹ค.\n๊ทธ๋Ÿฌ๋‚˜ ์‚ฌ์‹ค ํ”ผ๊ณ ์ธ์€ ์ˆ˜์ค‘์— ์ถฉ๋ถ„ํ•œ ํ˜„๊ธˆ์ด๋‚˜ ์‹ ์šฉ์นด๋“œ ๋“ฑ ๊ฒฐ์ œ ์ˆ˜๋‹จ์„ ๊ฐ€์ง€๊ณ  ์žˆ์ง€ ์•Š์•„ ์ •์ƒ์ ์œผ๋กœ ๋Œ€๊ธˆ์„ ์ง€๊ธ‰ํ•  ์˜์‚ฌ๋‚˜ ๋Šฅ๋ ฅ์ด ์—†์—ˆ๋‹ค.\n๊ทธ๋Ÿผ์—๋„ ํ”ผ๊ณ ์ธ์€ ์œ„์™€ ๊ฐ™์ด ํ”ผํ•ด์ž๋ฅผ ๊ธฐ๋งํ•˜์—ฌ ์ด์— ์†์€ ํ”ผํ•ด์ž๋กœ๋ถ€ํ„ฐ ์ฆ‰์„์—์„œ ํ•ฉ๊ณ„ 8,000์› ์ƒ๋‹น์˜ ์ˆ ์„ ๊ต๋ถ€๋ฐ›์•˜๋‹ค.\n2. ๊ณต๋ฌด์ง‘ํ–‰๋ฐฉํ•ด\nํ”ผ๊ณ ์ธ์€ ์ œ1ํ•ญ ๊ธฐ์žฌ ์ผ์‹œยท์žฅ์†Œ์—์„œ, โ€˜์†๋‹˜์ด ์ˆ ๊ฐ’์„ ์ง€๋ถˆํ•˜์ง€ ์•Š๊ณ  ์žˆ๋‹คโ€™๋Š” ๋‚ด์šฉ์˜ 112์‹ ๊ณ ๋ฅผ ์ ‘์ˆ˜ํ•˜๊ณ  ํ˜„์žฅ์— ์ถœ๋™ํ•œ ์น ๊ณก๊ฒฝ์ฐฐ์„œ E์ง€๊ตฌ๋Œ€ ์†Œ์† ๊ฒฝ์ฐฐ๊ด€ F๋กœ๋ถ€ํ„ฐ ์ˆ ๊ฐ’์„ ์ง€๋ถˆํ•˜๊ณ  ๊ท€๊ฐ€ํ•  ๊ฒƒ์„ ๊ถŒ์œ ๋ฐ›์ž, โ€œ์ง•์—ญ๊ฐ€๊ณ  ์‹ถ์€๋ฐ ๋ฌด์ „์ทจ์‹ํ–ˆ์œผ๋‹ˆ ์œ ์น˜์žฅ์— ๋„ฃ์–ด ๋‹ฌ๋ผโ€๊ณ  ๋งํ•˜๋ฉด์„œ ์ˆœ์ฐฐ์ฐจ์— ํƒ€๋ ค๊ณ  ํ•˜์˜€๋‹ค. ์ด์— ๊ฒฝ์ฐฐ๊ด€๋“ค์ด ์ˆ˜ํšŒ ๊ท€๊ฐ€ ํ•  ๊ฒƒ์„ ์žฌ์ฐจ ์ข…์šฉํ•˜์˜€์œผ๋‚˜, ํ”ผ๊ณ ์ธ์€ ๊ฒฝ์ฐฐ๊ด€๋“ค์„ ํ–ฅํ•ด โ€œ๋‚ด๊ฐ€ ๋Œ๋กœ ์ˆœ์ฐฐ์ฐจ๋ฅผ ์ฐ์œผ๋ฉด ์ง•์—ญ๊ฐ‘๋‹ˆ๊นŒ?, ๋‚ด์—ฌ๊ฒฝ ์—‰๋ฉ์ด ๋ฐœ๋กœ ์ฐจ๋ฉด ๋“ค์–ด๊ฐˆ ์ˆ˜ ์žˆ๋‚˜?โ€๋ผ๊ณ  ๋งํ•˜๊ณ , ์ด๋ฅผ ์ œ์ง€ํ•˜๋Š” F์˜ ๊ฐ€์Šด์„ ํŒ”๊ฟˆ์น˜๋กœ ์ˆ˜ํšŒ ๋ฐ€์ณ ํญํ–‰ํ•˜์˜€๋‹ค.\n์ด๋กœ์จ ํ”ผ๊ณ ์ธ์€ ๊ฒฝ์ฐฐ๊ด€์˜ 112์‹ ๊ณ ์‚ฌ๊ฑด ์ฒ˜๋ฆฌ์— ๊ด€ํ•œ ์ •๋‹นํ•œ ์ง๋ฌด์ง‘ํ–‰์„ ๋ฐฉํ•ดํ•˜์˜€๋‹ค. ์ฆ๊ฑฐ์˜ ์š”์ง€\n1. ํ”ผ๊ณ ์ธ์˜ ํŒ์‹œ ์ œ1์˜ ์‚ฌ์‹ค์— ๋ถ€ํ•ฉํ•˜๋Š” ๋ฒ•์ •์ง„์ˆ \n1. ์ฆ์ธ G, F์— ๋Œ€ํ•œ ๊ฐ ์ฆ์ธ์‹ ๋ฌธ์กฐ์„œ\n1. ์˜์ˆ˜์ฆ\n1. ํ˜„์žฅ ์‚ฌ์ง„\n๋ฒ•๋ น์˜ ์ ์šฉ\n1. ๋ฒ”์ฃ„์‚ฌ์‹ค์— ๋Œ€ํ•œ ํ•ด๋‹น๋ฒ•์กฐ ๋ฐ ํ˜•์˜ ์„ ํƒ\nํ˜•๋ฒ• ์ œ347์กฐ ์ œ1ํ•ญ, ์ œ136์กฐ ์ œ1ํ•ญ, ๊ฐ ์ง•์—ญํ˜• ์„ ํƒ\n1. ๊ฒฝํ•ฉ๋ฒ”๊ฐ€์ค‘\nํ˜•๋ฒ• ์ œ37์กฐ ์ „๋‹จ, ์ œ38์กฐ ์ œ1ํ•ญ ์ œ2ํ˜ธ, ์ œ50์กฐ\n1. ์ง‘ํ–‰์œ ์˜ˆ\nํ˜•๋ฒ• ์ œ62์กฐ ์ œ1ํ•ญ\n์–‘ํ˜•์˜ ์ด์œ \n1. ๋ฒ•๋ฅ ์ƒ ์ฒ˜๋‹จํ˜•์˜ ๋ฒ”์œ„: ์ง•์—ญ 1์›”โˆผ15๋…„\n2. ์–‘ํ˜•๊ธฐ์ค€์— ๋”ฐ๋ฅธ ๊ถŒ๊ณ ํ˜•์˜ ๋ฒ”์œ„\n๊ฐ€. ์ œ1๋ฒ”์ฃ„(์‚ฌ๊ธฐ)\n[์œ ํ˜•์˜ ๊ฒฐ์ •]\n์‚ฌ๊ธฐ๋ฒ”์ฃ„ > 01. ์ผ๋ฐ˜์‚ฌ๊ธฐ > [์ œ1์œ ํ˜•] 1์–ต ์› ๋ฏธ๋งŒ\n[ํŠน๋ณ„์–‘ํ˜•์ธ์ž]\n- ๊ฐ๊ฒฝ์š”์†Œ: ๋ฏธํ•„์  ๊ณ ์˜๋กœ ๊ธฐ๋งํ–‰์œ„๋ฅผ ์ €์ง€๋ฅธ ๊ฒฝ์šฐ ๋˜๋Š” ๊ธฐ๋งํ–‰์œ„์˜ ์ •๋„๊ฐ€ ์•ฝํ•œ ๊ฒฝ์šฐ, ์ฒ˜๋ฒŒ๋ถˆ์›\n[๊ถŒ๊ณ ์˜์—ญ ๋ฐ ๊ถŒ๊ณ ํ˜•์˜ ๋ฒ”์œ„]\nํŠน๋ณ„๊ฐ๊ฒฝ์˜์—ญ, ์ง•์—ญ 1์›”โˆผ1๋…„\n[์ผ๋ฐ˜์–‘ํ˜•์ธ์ž] ์—†์Œ\n๋‚˜. ์ œ2๋ฒ”์ฃ„(๊ณต๋ฌด์ง‘ํ–‰๋ฐฉํ•ด)\n[์œ ํ˜•์˜ ๊ฒฐ์ •]\n๊ณต๋ฌด์ง‘ํ–‰๋ฐฉํ•ด๋ฒ”์ฃ„ > 01. ๊ณต๋ฌด์ง‘ํ–‰๋ฐฉํ•ด > [์ œ1์œ ํ˜•] ๊ณต๋ฌด์ง‘ํ–‰๋ฐฉํ•ด/์ง๋ฌด๊ฐ•์š”\n[ํŠน๋ณ„์–‘ํ˜•์ธ์ž]\n- ๊ฐ๊ฒฝ์š”์†Œ: ํญํ–‰ยทํ˜‘๋ฐ•ยท์œ„๊ณ„์˜ ์ •๋„๊ฐ€ ๊ฒฝ๋ฏธํ•œ ๊ฒฝ์šฐ\n[๊ถŒ๊ณ ์˜์—ญ ๋ฐ ๊ถŒ๊ณ ํ˜•์˜ ๋ฒ”์œ„]\n๊ฐ๊ฒฝ์˜์—ญ, ์ง•์—ญ 1์›”โˆผ8์›”\n[์ผ๋ฐ˜์–‘ํ˜•์ธ์ž]\n- ๊ฐ๊ฒฝ์š”์†Œ: ์‹ฌ์‹ ๋ฏธ์•ฝ(๋ณธ์ธ ์ฑ…์ž„ ์žˆ์Œ)\n๋‹ค. ๋‹ค์ˆ˜๋ฒ”์ฃ„ ์ฒ˜๋ฆฌ๊ธฐ์ค€์— ๋”ฐ๋ฅธ ๊ถŒ๊ณ ํ˜•์˜ ๋ฒ”์œ„: ์ง•์—ญ 1์›”โˆผ1๋…„4์›”(์ œ1๋ฒ”์ฃ„ ์ƒํ•œ + ์ œ2๋ฒ”์ฃ„ ์ƒํ•œ์˜ 1/2)\n3. ์„ ๊ณ ํ˜•์˜ ๊ฒฐ์ •: ์ง•์—ญ 6์›”์— ์ง‘ํ–‰์œ ์˜ˆ 1๋…„\n๋งŒ์ทจ์ƒํƒœ์—์„œ ์‹๋‹น์—์„œ ์†Œ๋ž€์„ ํ”ผ์› ๊ณ , 112์‹ ๊ณ ๋กœ ์ถœ๋™ํ•œ ๊ฒฝ์ฐฐ๊ด€์ด ์—ฌ๋Ÿฌ ์ฐจ๋ก€ ๊ท€๊ฐ€๋ฅผ ์ข…์šฉํ•˜์˜€์Œ์—๋„ ์ด๋ฅผ ๊ฑฐ๋ถ€ํ•˜๊ณ  ๊ฒฝ์ฐฐ๊ด€์˜ ๊ฐ€์Šด์„ ๋ฐ€์นœ ์  ๋“ฑ์„ ์ข…ํ•ฉํ•˜๋ฉด ์ฃ„์ฑ…์„ ๊ฐ€๋ณ๊ฒŒ ๋ณผ ์ˆ˜ ์—†์œผ๋ฏ€๋กœ ์ง•์—ญํ˜•์„ ์„ ํƒํ•˜๋˜, ํ‰์†Œ ์ฃผ๋Ÿ‰๋ณด๋‹ค ํ›จ์”ฌ ๋งŽ์€ ์ˆ ์„ ๋งˆ์‹  ํƒ“์— ์ œ์ •์‹ ์„ ๊ฐ€๋ˆ„์ง€ ๋ชปํ•ด ์ €์ง€๋ฅธ ๋ฒ”ํ–‰์œผ๋กœ ๋ณด์ด๊ณ  ํญํ–‰ ์ •๋„๊ฐ€ ๋งค์šฐ ๊ฒฝ๋ฏธํ•œ ์ , ํ”ผ๊ณ ์ธ์ด ์ˆ ์ด ๊นฌ ํ›„ ์ž์‹ ์˜ ๊ฒฝ์†”ํ•œ ์–ธ๋™์„ ๊นŠ์ด ๋ฐ˜์„ฑํ•˜๋ฉด์„œ ์žฌ๋ฒ”ํ•˜์ง€ ์•Š๊ธฐ ์œ„ํ•ด ์ •์‹ ๊ฑด๊ฐ•์˜ํ•™๊ณผ์˜ ์น˜๋ฃŒ ๋ฐ ์ƒ๋‹ด์„ ๋ฐ›๊ณ  ์žˆ๋Š” ์ , ์‹๋‹น ์—…์ฃผ์—๊ฒŒ ํ”ผํ•ด๋ฅผ ๋ณ€์ƒํ•˜์—ฌ ์šฉ์„œ๋ฅผ ๋ฐ›์€ ์ , ํ”ผ๊ณ ์ธ์˜ ๋‚˜์ด์™€ ๊ฐ€์กฑ๊ด€๊ณ„ ๋“ฑ์˜ ์‚ฌ์ •์„ ์ฐธ์ž‘ํ•˜์—ฌ ํ˜•์˜ ์ง‘ํ–‰์„ ์œ ์˜ˆํ•˜๊ณ , ๋ฒ”ํ–‰ ๊ฒฝ์œ„์™€ ๋ฒ”ํ–‰ ํ›„ ํ”ผ๊ณ ์ธ์˜ ํƒœ๋„ ๋“ฑ์— ๋น„์ถ”์–ด ๋ณผ ๋•Œ ์žฌ๋ฒ”์˜ ์œ„ํ—˜์„ฑ์€ ๊ทธ๋‹ค์ง€ ์šฐ๋ คํ•˜์ง€ ์•Š์•„๋„ ๋  ๊ฒƒ์œผ๋กœ ๋ณด์—ฌ ๋ณดํ˜ธ๊ด€์ฐฐ ๋“ฑ ๋ถ€์ˆ˜์ฒ˜๋ถ„์€ ๋ถ€๊ณผํ•˜์ง€ ์•Š์Œ.\n์ด์ƒ์˜ ์ด์œ ๋กœ ์ฃผ๋ฌธ๊ณผ ๊ฐ™์ด ํŒ๊ฒฐํ•œ๋‹ค."
}
  • id: a data id.
  • precedent: a case from the court of Korea. It includes the ruling (์ฃผ๋ฌธ), the gist of claim (์ฒญ๊ตฌ์ทจ์ง€), the claim of appeal (ํ•ญ์†Œ์ทจ์ง€), and the reasoning (์ด์œ ).

casename_classification

  • Task: for the given facts (์‚ฌ์‹ค๊ด€๊ณ„), a model is asked to predict the case name.
  • The dataset consists of 10k (facts, case name) pairs extracted from Korean precedents.
  • There are 100 classes (case categories) and each class contains 100 corresponding examples.
  • 8,000 training, 1,000 validation, 1,000 test, and 1,294 test2 examples. The test2 set consists of examples that do not overlap with the precedents in precedent_corpus.
  • We also provide casename_classification_plus, a dataset that extends casename_classification by including infrequent case categories. casename_classification_plus consists of 31,283 examples with total 603 case categories. See our paper for the detail.
  • Example
{
  "id": 80,
  "casetype": "criminal",
  "casename": "๊ฐ์—ผ๋ณ‘์˜์˜ˆ๋ฐฉ๋ฐ๊ด€๋ฆฌ์—๊ด€ํ•œ๋ฒ•๋ฅ ์œ„๋ฐ˜",
  "facts": "์งˆ๋ณ‘๊ด€๋ฆฌ์ฒญ์žฅ, ์‹œยท๋„์ง€์‚ฌ ๋˜๋Š” ์‹œ์žฅยท๊ตฐ์ˆ˜ยท๊ตฌ์ฒญ์žฅ์€ ์ œ1๊ธ‰ ๊ฐ์—ผ๋ณ‘์ด ๋ฐœ์ƒํ•œ ๊ฒฝ์šฐ ๊ฐ์—ผ๋ณ‘์˜ ์ „ํŒŒ๋ฐฉ์ง€ ๋ฐ ์˜ˆ๋ฐฉ์„ ์œ„ํ•˜์—ฌ ๊ฐ์—ผ๋ณ‘์˜์‹ฌ์ž๋ฅผ ์ ๋‹นํ•œ ์žฅ์†Œ์— ์ผ์ •ํ•œ ๊ธฐ๊ฐ„ ๊ฒฉ๋ฆฌ์‹œํ‚ค๋Š” ์กฐ์น˜๋ฅผ ํ•˜์—ฌ์•ผ ํ•˜๊ณ , ๊ทธ ๊ฒฉ๋ฆฌ์กฐ์น˜๋ฅผ ๋ฐ›์€ ์‚ฌ๋žŒ์€ ์ด๋ฅผ ์œ„๋ฐ˜ํ•˜์—ฌ์„œ๋Š” ์•„๋‹ˆ ๋œ๋‹ค. ํ”ผ๊ณ ์ธ์€ ํ•ด์™ธ์—์„œ ๊ตญ๋‚ด๋กœ ์ž…๊ตญํ•˜์˜€์Œ์„ ์ด์œ ๋กœ 2021. 4. 21.๊ฒฝ ๊ฐ์—ผ๋ณ‘์˜์‹ฌ์ž๋กœ ๋ถ„๋ฅ˜๋˜์—ˆ๊ณ , ๊ฐ™์€ ๋‚  ์ฐฝ๋…•๊ตฐ์ˆ˜๋กœ๋ถ€ํ„ฐ โ€˜2021. 4. 21.๋ถ€ํ„ฐ 2021. 5. 5. 12:00๊ฒฝ๊นŒ์ง€ ํ”ผ๊ณ ์ธ์˜ ์ฃผ๊ฑฐ์ง€์ธ ๊ฒฝ๋‚จ ์ฐฝ๋…•๊ตฐ B์—์„œ ๊ฒฉ๋ฆฌํ•ด์•ผ ํ•œ๋‹คโ€™๋Š” ๋‚ด์šฉ์˜ ์ž๊ฐ€๊ฒฉ๋ฆฌ ํ†ต์ง€์„œ๋ฅผ ์ˆ˜๋ นํ•˜์˜€๋‹ค. 1. 2021. 4. 27.์ž ๋ฒ”ํ–‰ ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ํ”ผ๊ณ ์ธ์€ 2021. 4. 27. 11:20๊ฒฝ์—์„œ ๊ฐ™์€ ๋‚  11:59๊ฒฝ๊นŒ์ง€ ์‚ฌ์ด์— ์œ„ ๊ฒฉ๋ฆฌ์žฅ์†Œ๋ฅผ ๋ฌด๋‹จ์œผ๋กœ ์ดํƒˆํ•˜์—ฌ ์ž์‹ ์˜ ์Šน์šฉ์ฐจ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒฝ๋‚จ ์ฐฝ๋…•๊ตฐ C์— ์žˆ๋Š” โ€˜Dโ€™ ์‹๋‹น์— ๋‹ค๋…€์˜ค๋Š” ๋“ฑ ์ž๊ฐ€๊ฒฉ๋ฆฌ ์กฐ์น˜๋ฅผ ์œ„๋ฐ˜ํ•˜์˜€๋‹ค. 2. 2021. 5. 3.์ž ๋ฒ”ํ–‰ ํ”ผ๊ณ ์ธ์€ 2021. 5. 3. 10:00๊ฒฝ์—์„œ ๊ฐ™์€ ๋‚  11:35๊ฒฝ๊นŒ์ง€ ์‚ฌ์ด์— ์œ„ ๊ฒฉ๋ฆฌ์žฅ์†Œ๋ฅผ ๋ฌด๋‹จ์œผ๋กœ ์ดํƒˆํ•˜์—ฌ ์ž์‹ ์˜ ์Šน์šฉ์ฐจ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ถˆ์ƒ์˜ ์žฅ์†Œ๋ฅผ ๋‹ค๋…€์˜ค๋Š” ๋“ฑ ์ž๊ฐ€๊ฒฉ๋ฆฌ ์กฐ์น˜๋ฅผ ์œ„๋ฐ˜ํ•˜์˜€๋‹ค."
}
  • id: a data id.
  • casetype: a case type. The value is either civil (๋ฏผ์‚ฌ) or criminal (ํ˜•์‚ฌ).
  • casename: a case name.
  • facts: facts (์‚ฌ์‹ค๊ด€๊ณ„) extracted from reasoning (์ด์œ ) section of individual cases.

statute_classification

  • Task: for a given facts (์‚ฌ์‹ค๊ด€๊ณ„), a model is asked to predict related statutes (๋ฒ•๋ น).
  • The dataset consists of 2760 (facts, statutes) pairs extracted from individual Korean legal cases.
  • There are 46 classes (case categories) and each class has 60 examples.
  • 2,208 training, 276 validation, 276 test, 538 test2 examples. The test2 set consists of examples that do not overlap with the precedents in precedent_corpus.
  • We also release statute_classification_plus, a dataset that extends statute_classification by including less frequent case categories.statute_classification_plus includes 17,730 examples with total 434 case categories and 1,015 statutes.
  • Example
{
  "id": 5180,
  "casetype": "criminal",
  "casename": "์‚ฌ๋ฌธ์„œ์œ„์กฐ, ์œ„์กฐ์‚ฌ๋ฌธ์„œํ–‰์‚ฌ",
  "statutes": [
    "ํ˜•๋ฒ• ์ œ231์กฐ",
    "ํ˜•๋ฒ• ์ œ234์กฐ"
  ],
  "facts": "1. ์‚ฌ๋ฌธ์„œ์œ„์กฐ ํ”ผ๊ณ ์ธ์€ 2014. 5. 10.๊ฒฝ ์„œ์šธ ์†กํŒŒ๊ตฌ ๋˜๋Š” ํ•˜๋‚จ์‹œ ์ดํ•˜ ์•Œ ์ˆ˜ ์—†๋Š” ์žฅ์†Œ์—์„œ ์˜์ˆ˜์ฆ๋ฌธ๊ตฌ์šฉ์ง€์— ๊ฒ€์ •์ƒ‰ ๋ณผํŽœ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ˆ˜์‹ ์ธ๋ž€์— โ€˜Aโ€™, ์ผ๊ธˆ๋ž€์— โ€˜์˜ค์ฒœ์˜ค๋ฐฑ์œก์‹ญ๋งŒ์›์ •โ€™, ๋‚ด์—ญ ๋ž€์— โ€˜2010๊ฐ€ํ•ฉ7485์‚ฌ๊ฑด์˜ ํ•ฉ์˜๊ธˆ ๋ฐ ํ”ผํ•ด ๋ณด์ƒ๊ธˆ ์™„๊ฒฐ์กฐโ€™, ๋ฐœํ–‰์ผ๋ž€์— โ€˜2014๋…„ 5์›” 10์ผโ€™์ด๋ผ๊ณ  ๊ธฐ์žฌํ•œ ๋’ค, ๋ฐœํ–‰์ธ ์˜†์— ํ”ผ๊ณ ์ธ์ด ์ž„์˜๋กœ ๋งŒ๋“ค์—ˆ๋˜ B์˜ ๋„์žฅ์„ ์ฐ์—ˆ๋‹ค. ์ด๋กœ์จ ํ”ผ๊ณ ์ธ์€ ํ–‰์‚ฌํ•  ๋ชฉ์ ์œผ๋กœ ์‚ฌ์‹ค์ฆ๋ช…์— ๊ด€ํ•œ ์‚ฌ๋ฌธ์„œ์ธ B ๋ช…์˜์˜ ์˜์ˆ˜์ฆ 1์žฅ์„ ์œ„์กฐํ•˜์˜€๋‹ค. 2. ์œ„์กฐ์‚ฌ๋ฌธ์„œํ–‰์‚ฌ ํ”ผ๊ณ ์ธ์€ 2014. 10. 16.๊ฒฝ ํ•˜๋‚จ์‹œ ์ดํ•˜ ์•Œ ์ˆ˜ ์—†๋Š” ์žฅ์†Œ์—์„œ ํ”ผ๊ณ ์ธ์ด B์— ๋Œ€ํ•œ ์ฑ„๋ฌด๋ฅผ ๋ชจ๋‘ ๋ณ€์ œํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์— B๊ฐ€ CํšŒ์‚ฌ์— ์ฑ„๊ถŒ์„ ์–‘๋„ํ•œ ๊ฒƒ์„ ์ธ์ •ํ•  ์ˆ˜ ์—†๋‹ค๋Š” ์ทจ์ง€์˜ ๋‚ด์šฉ์ฆ๋ช…์›๊ณผ ํ•จ๊ป˜ ์œ„์™€ ๊ฐ™์ด ์œ„์กฐํ•œ ์˜์ˆ˜์ฆ ์‚ฌ๋ณธ์„ ๋งˆ์น˜ ์ง„์ •ํ•˜๊ฒŒ ์„ฑ๋ฆฝํ•œ ๋ฌธ์„œ์ธ ๊ฒƒ์ฒ˜๋Ÿผ B์—๊ฒŒ ์šฐํŽธ์œผ๋กœ ๋ณด๋ƒˆ๋‹ค. ์ด๋กœ์จ ํ”ผ๊ณ ์ธ์€ ์œ„์กฐํ•œ ์‚ฌ๋ฌธ์„œ๋ฅผ ํ–‰์‚ฌํ•˜์˜€๋‹ค."
}
  • id: a data id.
  • casetype: a case type. The value is always criminal.
  • casename: a case name.
  • statutes: related statues.
  • facts: facts (์‚ฌ์‹ค๊ด€๊ณ„) extracted from reasoning (์ด์œ ) section of individual cases.

ljp_criminal

  • Task: a model needs to predict the ranges of fine (๋ฒŒ๊ธˆ), imprisonment with labor (์ง•์—ญ), imprisonment without labor (๊ธˆ๊ณ ).
  • 10,500 facts and the corresponding punishment are extracted from cases with following case categories are โ€œindecent act by compulsionโ€ (๊ฐ•์ œ์ถ”ํ–‰), โ€œobstruction of performance of official dutiesโ€ (๊ณต๋ฌด์ง‘ํ–‰๋ฐฉํ•ด), โ€œbodily injuries from traffic accidentโ€ (๊ตํ†ต์‚ฌ๊ณ ์ฒ˜๋ฆฌํŠน๋ก€๋ฒ•์œ„๋ฐ˜(์น˜์ƒ)), โ€œdrunk drivingโ€ (๋„๋กœ๊ตํ†ต ๋ฒ•์œ„๋ฐ˜(์Œ์ฃผ์šด์ „)), โ€œfraudโ€ (์‚ฌ๊ธฐ), โ€œinflicting bodily injuriesโ€ (์ƒํ•ด), and โ€œviolenceโ€ (ํญํ–‰)
  • 8,400 training, 1,050 validation, 1,050 test, 928 test2 examples. The test2 set consists of the examples from the test set that do not overlap with the precedents in precedent_corpus.
  • Example
{
  "casename": "๊ณต๋ฌด์ง‘ํ–‰๋ฐฉํ•ด",
  "casetype": "criminal",
  "facts": "ํ”ผ๊ณ ์ธ์€ 2020. 3. 13. 18:57๊ฒฝ ์ˆ˜์›์‹œ ์žฅ์•ˆ๊ตฌ B ์•ž ๋…ธ์ƒ์—์„œ ์ง€์ธ์ธ C์™€ ์ˆ ์„ ๋งˆ์‹œ๋˜ ์ค‘ C๋ฅผ ๋•Œ๋ ค 112์‹ ๊ณ ๋ฅผ ๋ฐ›๊ณ  ์ถœ๋™ํ•œ ์ˆ˜์›์ค‘๋ถ€๊ฒฝ์ฐฐ์„œ D์ง€๊ตฌ๋Œ€ ์†Œ์† ๊ฒฝ์œ„ E๊ฐ€ C์˜ ์ง„์ˆ ์„ ์ฒญ์ทจํ•˜๊ณ  ์žˆ๋Š” ๋ชจ์Šต์„ ๋ณด๊ณ  ํ™”๊ฐ€ ๋‚˜ '์”จ๋ฐœ,๊ฐœ์ƒˆ๋ผ'๋ผ๋ฉฐ ์š•์„ค์„ ํ•˜๊ณ , ์œ„ E๊ฐ€ ์ด๋ฅผ ์ œ์ง€ํ•˜๋ฉฐ ๊ท€๊ฐ€๋ฅผ ์ข…์šฉํ•˜์ž ๊ทธ์˜ ์™ผ์ชฝ ๋บจ์„ ์˜ค๋ฅธ ์ฃผ๋จน์œผ๋กœ 1ํšŒ ๋•Œ๋ ค ํญํ–‰ํ•˜์˜€๋‹ค.\n์ด๋กœ์จ ํ”ผ๊ณ ์ธ์€ ๊ฒฝ์ฐฐ๊ด€์˜ 112์‹ ๊ณ ์‚ฌ๊ฑด ์ฒ˜๋ฆฌ์— ๊ด€ํ•œ ์ •๋‹นํ•œ ์ง๋ฌด์ง‘ํ–‰์„ ๋ฐฉํ•ดํ•˜์˜€๋‹ค. ์ฆ๊ฑฐ์˜ ์š”์ง€\n1. ํ”ผ๊ณ ์ธ์˜ ๋ฒ•์ •์ง„์ˆ \n1. ํ”ผ๊ณ ์ธ์— ๋Œ€ํ•œ ๊ฒฝ์ฐฐ ํ”ผ์˜์ž์‹ ๋ฌธ์กฐ์„œ\n1. E์— ๋Œ€ํ•œ ๊ฒฝ์ฐฐ ์ง„์ˆ ์กฐ์„œ\n1. ํ˜„์žฅ์‚ฌ์ง„ ๋“ฑ, ๋ฐ”๋””์บ ์˜์ƒ",
  "id": 2300,
  "label": {
    "fine_lv": 0,
    "imprisonment_with_labor_lv": 2,
    "imprisonment_without_labor_lv": 0,
    "text": "์ง•์—ญ 6์›”"
  },
  "reason": "์–‘ํ˜•์˜ ์ด์œ \n1. ๋ฒ•๋ฅ ์ƒ ์ฒ˜๋‹จํ˜•์˜ ๋ฒ”์œ„: ์ง•์—ญ 1์›”โˆผ5๋…„\n2. ์–‘ํ˜•๊ธฐ์ค€์— ๋”ฐ๋ฅธ ๊ถŒ๊ณ ํ˜•์˜ ๋ฒ”์œ„\n[์œ ํ˜•์˜ ๊ฒฐ์ •]\n๊ณต๋ฌด์ง‘ํ–‰๋ฐฉํ•ด๋ฒ”์ฃ„ > 01. ๊ณต๋ฌด์ง‘ํ–‰๋ฐฉํ•ด > [์ œ1์œ ํ˜•] ๊ณต๋ฌด์ง‘ํ–‰๋ฐฉํ•ด/์ง๋ฌด๊ฐ•์š”\n[ํŠน๋ณ„์–‘ํ˜•์ธ์ž] ์—†์Œ\n[๊ถŒ๊ณ ์˜์—ญ ๋ฐ ๊ถŒ๊ณ ํ˜•์˜ ๋ฒ”์œ„] ๊ธฐ๋ณธ์˜์—ญ, ์ง•์—ญ 6์›”โˆผ1๋…„6์›”\n3. ์„ ๊ณ ํ˜•์˜ ๊ฒฐ์ •\nํ”ผ๊ณ ์ธ์ด ์‹ธ์›€ ๋ฐœ์ƒ ์‹ ๊ณ ๋ฅผ ๋ฐ›๊ณ  ์ถœ๋™ํ•œ ๊ฒฝ์ฐฐ๊ด€์—๊ฒŒ ์š•์„ค์„ ํผ๋ถ“๊ณ  ๊ท€๊ฐ€๋ฅผ ์ข…์šฉํ•œ๋‹ค๋Š” ์ด์œ ๋กœ ๊ฒฝ์ฐฐ๊ด€์˜ ๋บจ์„ ๋•Œ๋ฆฌ๋Š” ๋“ฑ ํญํ–‰์„ ํ–‰์‚ฌํ•˜์—ฌ ๊ฒฝ์ฐฐ๊ด€์˜ ์ •๋‹นํ•œ ๊ณต๋ฌด์ง‘ํ–‰์„ ๋ฐฉํ•ดํ•œ ์ ์—์„œ ๊ทธ ์ฃ„์ฑ…์ด ๋งค์šฐ ๋ฌด๊ฒ๋‹ค. ํ”ผ๊ณ ์ธ์˜ ๋ฒ”์ฃ„ ์ „๋ ฅ๋„ ์ƒ๋‹นํžˆ ๋งŽ๋‹ค.\n๋‹ค๋งŒ, ํ”ผ๊ณ ์ธ์ด ๋ฒ”ํ–‰์„ ์ธ์ •ํ•˜๋ฉด์„œ ๋ฐ˜์„ฑํ•˜๊ณ  ์žˆ๋Š” ์ , ๊ณต๋ฌด์ง‘ํ–‰๋ฐฉํ•ด ๋ฒ”์ฃ„๋กœ ์ฒ˜๋ฒŒ๋ฐ›์€ ์ „๋ ฅ์ด ์—†๋Š” ์  ๋“ฑ์€ ํ”ผ๊ณ ์ธ์—๊ฒŒ ์œ ๋ฆฌํ•œ ์ •์ƒ์œผ๋กœ ์ฐธ์ž‘ํ•œ๋‹ค.\n๊ทธ ๋ฐ–์— ํ”ผ๊ณ ์ธ์˜ ์—ฐ๋ น, ์„ฑํ–‰, ํ™˜๊ฒฝ, ๊ฐ€์กฑ๊ด€๊ณ„, ๊ฑด๊ฐ•์ƒํƒœ, ๋ฒ”ํ–‰์˜ ๋™๊ธฐ์™€ ์ˆ˜๋‹จ ๋ฐ ๊ฒฐ๊ณผ, ๋ฒ”ํ–‰ ํ›„์˜ ์ •ํ™ฉ ๋“ฑ ์ด ์‚ฌ๊ฑด ๊ธฐ๋ก ๋ฐ ๋ณ€๋ก ์— ๋‚˜ํƒ€๋‚œ ๋ชจ๋“  ์–‘ํ˜•์š”์†Œ๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ, ์ฃผ๋ฌธ๊ณผ ๊ฐ™์ด ํ˜•์„ ์ •ํ•œ๋‹ค.",
  "ruling": {
    "parse": {
      "fine": {
        "type": "",
        "unit": "",
        "value": -1
      },
      "imprisonment": {
        "type": "์ง•์—ญ",
        "unit": "mo",
        "value": 6
      }
    },
    "text": "ํ”ผ๊ณ ์ธ์„ ์ง•์—ญ 6์›”์— ์ฒ˜ํ•œ๋‹ค.\n๋‹ค๋งŒ ์ด ํŒ๊ฒฐ ํ™•์ •์ผ๋กœ๋ถ€ํ„ฐ 2๋…„๊ฐ„ ์œ„ ํ˜•์˜ ์ง‘ํ–‰์„ ์œ ์˜ˆํ•œ๋‹ค."
  }
}
  • id: a data id.
  • casetype: a case type. The value is always criminal.
  • casename: a case name.
  • facts: facts (์‚ฌ์‹ค๊ด€๊ณ„) extracted from reasoning (์ด์œ ) section of individual cases.
  • label
    • fine_lv: a label representing individual ranges of the fine amount. See our paper for the detail.
    • imprisonment_with_labor_lv: a label representing the ranges of the imprisonemnt with labor.
    • imprisonment_without_labor_lv: a label for the imprisonment without labor case.
  • reason: the reason for the punishment (์–‘ํ˜•์˜ ์ด์œ ).
  • ruling: the ruling (์ฃผ๋ฌธ) and its parsing result. "" and -1 indicates null values.

ljp_civil

  • Task: a model is asked to predict the claim acceptance level (= "the approved money" / "the claimed money")
  • 4,678 facts and the corresponding acceptance lv from 4 case categories: 929 examples from โ€œprice of indemnificationโ€ (๊ตฌ์ƒ๊ธˆ), 745 examples from โ€œloanโ€ (๋Œ€์—ฌ๊ธˆ), 1,004 examples from โ€œunfair profitsโ€ (๋ถ€๋‹น์ด๋“๊ธˆ), and 2,000 examples from โ€œlawsuit for damages (etc)โ€ (์†ํ•ด๋ฐฐ์ƒ(๊ธฐ)).
  • 3,742 training, 467 validation, 467 test, 403 test2 examples. The test2 set consists of the test set examples those do not overlap with the precedents in precedent_corpus.
  • Example
{
  "id": 99,
  "casetype": "civil",
  "casename": "๊ตฌ์ƒ๊ธˆ",
  "claim_acceptance_lv": 1,
  "facts": "๊ฐ€. C๋Š” 2017. 7. 21. D์œผ๋กœ๋ถ€ํ„ฐ 100,000,000์›์„ ์ด์œจ ์—ฐ 25%, ๋ณ€์ œ๊ธฐ 2017. 8. 20.๋กœ ์ •ํ•˜์—ฌ ์ฐจ์šฉํ•˜์˜€๊ณ (์ดํ•˜ โ€˜์ด ์‚ฌ๊ฑด ์ฐจ์šฉ๊ธˆ์ฑ„๋ฌด'๋ผ๊ณ  ํ•œ๋‹ค), ํ”ผ๊ณ ๋Š” ์ด ์‚ฌ๊ฑด ์ฐจ์šฉ๊ธˆ ์ฑ„๋ฌด๋ฅผ ๋ณด์ฆํ•œ๋„์•ก 140,000,000์›, ๋ณด์ฆ๊ธฐํ•œ 10๋…„์œผ๋กœ ์ •ํ•˜์—ฌ ์—ฐ๋Œ€๋ณด์ฆํ•˜์˜€์œผ๋ฉฐ, ๊ฐ™์€ ๋‚  ์ด ์‚ฌ๊ฑด ์ฐจ์šฉ๊ธˆ์ฑ„๋ฌด์— ๊ด€ํ•œ ๊ณต์ •์ฆ์„œ๋ฅผ ์ž‘์„ฑํ•˜์˜€๋‹ค(๊ณต์ฆ์ธ๊ฐ€ ๋ฒ•๋ฌด๋ฒ•์ธ E ์ฆ์„œ 2017๋…„ ์ œ392ํ˜ธ, ์ดํ•˜ โ€˜์ด ์‚ฌ๊ฑด ๊ณต์ •์ฆ์„œ'๋ผ๊ณ  ํ•œ๋‹ค).\n๋‚˜. ์›๊ณ ๋Š” ์ด ์‚ฌ๊ฑด ์ฐจ์šฉ๊ธˆ์ฑ„๋ฌด์™€ ๊ด€๋ จํ•˜์—ฌ ์›๊ณ  ์†Œ์œ ์˜ ์•ˆ์‚ฐ์‹œ ์ƒ๋ก๊ตฌ F, G, H ๋ฐ ๊ทธ ์ง€์ƒ ๊ฑด๋ฌผ(์ดํ•˜ โ€˜์ด ์‚ฌ๊ฑด ๋ถ€๋™์‚ฐ'์ด๋ผ๊ณ  ํ•œ๋‹ค)์„ ๋‹ด๋ณด๋กœ ์ œ๊ณตํ•˜๊ธฐ๋กœ ํ•˜์—ฌ 2017. 7. 21. ์ˆ˜์›์ง€๋ฐฉ๋ฒ•์› ์•ˆ์‚ฐ์ง€์› ์ ‘์ˆ˜ ์ œ53820ํ˜ธ๋กœ ์ฑ„๊ถŒ์ตœ๊ณ ์•ก 140,000,000์›, ์ฑ„๋ฌด์ž C, ๊ทผ์ €๋‹น๊ถŒ์ž D์œผ๋กœ ํ•œ ๊ทผ์ €๋‹น๊ถŒ์„ค์ •๋“ฑ๊ธฐ๋ฅผ ๊ฒฝ๋ฃŒํ•˜๋Š” ํ•œํŽธ, 2018. 7. 13. D์—๊ฒŒ ์ด ์‚ฌ๊ฑด ๊ณต์ •์ฆ์„œ์— ๊ธฐํ•œ ์ฑ„๋ฌด๋ฅผ 2018. 7. 31.๊นŒ์ง€ ๋ณ€์ œํ•˜๊ณ , ๋ณ€์ œ๊ธฐ ์ดํ›„ ์—ฐ 24%์˜ ๋น„์œจ๋กœ ๊ณ„์‚ฐํ•œ ์ง€์—ฐ์†ํ•ด๊ธˆ์„ ์ง€๊ธ‰ํ•˜๊ธฐ๋กœ ํ•˜๋Š” ์ฐจ์šฉ์ฆ์„ ์ž‘์„ฑํ•˜์—ฌ ์ฃผ์—ˆ๋‹ค(์ดํ•˜ โ€˜์ด ์‚ฌ๊ฑด ์ฐจ์šฉ์ฆ'์ด๋ผ๊ณ  ํ•œ๋‹ค).\n๋‹ค. ์›๊ณ ๋Š” 2019. 11. 29. D์—๊ฒŒ ์ด ์‚ฌ๊ฑด ์ฐจ์šฉ๊ธˆ์ฑ„๋ฌด ์›๋ฆฌ๊ธˆ์œผ๋กœ ํ•ฉ๊ณ„ 157,500,000์›์„ ๋ณ€์ œํ•˜์˜€๋‹ค.",
  "gist_of_claim": {
    "money": {
      "provider": "ํ”ผ๊ณ ",
      "taker": "์›๊ณ ",
      "unit": "won",
      "value": 140000000
    },
    "text": "ํ”ผ๊ณ ๋Š” ์›๊ณ ์—๊ฒŒ 140,000,000์› ๋ฐ ์ด์— ๋Œ€ํ•œ 2019. 11. 30.๋ถ€ํ„ฐ ์ด ์‚ฌ๊ฑด ์†Œ์žฅ ๋ถ€๋ณธ ์†ก๋‹ฌ์ผ๊นŒ์ง€๋Š” ์—ฐ 5%์˜, ๊ทธ ๋‹ค์Œ๋‚ ๋ถ€ํ„ฐ ๋‹ค ๊ฐš๋Š” ๋‚ ๊นŒ์ง€๋Š” ์—ฐ 12%์˜ ๊ฐ ๋น„์œจ๋กœ ๊ณ„์‚ฐํ•œ ๋ˆ์„ ์ง€๊ธ‰ํ•˜๋ผ."
  },
  "ruling": {
    "litigation_cost": 0.5,
    "money": {
      "provider": "ํ”ผ๊ณ ",
      "taker": "์›๊ณ ",
      "unit": "won",
      "value": 78750000
    },
    "text": "1. ํ”ผ๊ณ ๋Š” ์›๊ณ ์—๊ฒŒ 78,750,000์› ๋ฐ ์ด์— ๋Œ€ํ•œ 2019. 11. 30.๋ถ€ํ„ฐ 2021. 11. 26.๊นŒ์ง€๋Š” ์—ฐ 5%์˜, ๊ทธ ๋‹ค์Œ๋‚ ๋ถ€ํ„ฐ ๋‹ค ๊ฐš๋Š” ๋‚ ๊นŒ์ง€๋Š” ์—ฐ 12%์˜ ๊ฐ ๋น„์œจ๋กœ ๊ณ„์‚ฐํ•œ ๋ˆ์„ ์ง€๊ธ‰ํ•˜๋ผ.\n2. ์›๊ณ ์˜ ๋‚˜๋จธ์ง€ ์ฒญ๊ตฌ๋ฅผ ๊ธฐ๊ฐํ•œ๋‹ค.\n3. ์†Œ์†ก๋น„์šฉ ์ค‘ 1/2์€ ์›๊ณ ๊ฐ€ ๋‚˜๋จธ์ง€๋Š” ํ”ผ๊ณ ๊ฐ€ ๊ฐ ๋ถ€๋‹ดํ•œ๋‹ค.\n4. ์ œ1ํ•ญ์€ ๊ฐ€์ง‘ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค."
  }
}
  • id: a data id.
  • casetype: a case type. The value is always civil.
  • casename: a case name.
  • facts: facts (์‚ฌ์‹ค๊ด€๊ณ„) extracted from reasoning (์ด์œ ) section of individual cases.
  • claim_acceptaance_lv: the claim acceptance level. 0, 1, and 2 indicate rejection, partial approval, and full approval respectively.
  • gist_of_claim: a gist of claim from plaintiffs (์ฒญ๊ตฌ ์ทจ์ง€) and its parsing result.
  • ruling: a ruling (์ฃผ๋ฌธ) and its parsing results.
    • litigation_cost: the ratio of the litigation cost that the plaintiff should pay.

summarization

  • Task: a model is asked to summarize precedents from the Supreme Court of Korea.

  • The dataset is obtained from LAW OPEN DATA.

  • The dataset consists of 20k (precendent, summary) pairs.

  • 16,000 training, 2,000 validation, and 2,000 test examples.

  • We also provide summarization_plus by extending summarization with precedents with longer text making the task more challenging and realistic. In the extended dataset there are a total of 51,114 examples. The average number of tokens in the precedents and the corresponding summaries are 1,516 and 248 respectively. The maximum number of tokens in the input texts and the summaries are 93,420 and 6,536 respectively.

  • Example

{
  "id": 16454,
  "summary": "[1] ํ”ผ๊ณ ์™€ ์ œ3์ž ์‚ฌ์ด์— ์žˆ์—ˆ๋˜ ๋ฏผ์‚ฌ์†Œ์†ก์˜ ํ™•์ •ํŒ๊ฒฐ์˜ ์กด์žฌ๋ฅผ ๋„˜์–ด์„œ ๊ทธ ํŒ๊ฒฐ์˜ ์ด์œ ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ์‚ฌ์‹ค๊ด€๊ณ„๋“ค๊นŒ์ง€ ๋ฒ•์›์— ํ˜„์ €ํ•œ ์‚ฌ์‹ค๋กœ ๋ณผ ์ˆ˜๋Š” ์—†๋‹ค. ๋ฏผ์‚ฌ์žฌํŒ์— ์žˆ์–ด์„œ ์ด๋ฏธ ํ™•์ •๋œ ๊ด€๋ จ ๋ฏผ์‚ฌ์‚ฌ๊ฑด์˜ ํŒ๊ฒฐ์—์„œ ์ธ์ •๋œ ์‚ฌ์‹ค์€ ํŠน๋ณ„ํ•œ ์‚ฌ์ •์ด ์—†๋Š” ํ•œ ์œ ๋ ฅํ•œ ์ฆ๊ฑฐ๊ฐ€ ๋˜์ง€๋งŒ, ๋‹นํ•ด ๋ฏผ์‚ฌ์žฌํŒ์—์„œ ์ œ์ถœ๋œ ๋‹ค๋ฅธ ์ฆ๊ฑฐ ๋‚ด์šฉ์— ๋น„์ถ”์–ด ํ™•์ •๋œ ๊ด€๋ จ ๋ฏผ์‚ฌ์‚ฌ๊ฑด ํŒ๊ฒฐ์˜ ์‚ฌ์‹ค์ธ์ •์„ ๊ทธ๋Œ€๋กœ ์ฑ„์šฉํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฒฝ์šฐ์—๋Š” ํ•ฉ๋ฆฌ์ ์ธ ์ด์œ ๋ฅผ ์„ค์‹œํ•˜์—ฌ ์ด๋ฅผ ๋ฐฐ์ฒ™ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๋ฒ•๋ฆฌ๋„ ๊ทธ์™€ ๊ฐ™์ด ํ™•์ •๋œ ๋ฏผ์‚ฌํŒ๊ฒฐ ์ด์œ  ์ค‘์˜ ์‚ฌ์‹ค๊ด€๊ณ„๊ฐ€ ํ˜„์ €ํ•œ ์‚ฌ์‹ค์— ํ•ด๋‹นํ•˜์ง€ ์•Š์Œ์„ ์ „์ œ๋กœ ํ•œ ๊ฒƒ์ด๋‹ค.\n\n\n[2] ์›์‹ฌ์ด ๋‹ค๋ฅธ ํ•˜๊ธ‰์‹ฌํŒ๊ฒฐ์˜ ์ด์œ  ์ค‘ ์ผ๋ถ€ ์‚ฌ์‹ค๊ด€๊ณ„์— ๊ด€ํ•œ ์ธ์ • ์‚ฌ์‹ค์„ ๊ทธ๋Œ€๋กœ ์ธ์ •ํ•˜๋ฉด์„œ, ์œ„ ์‚ฌ์ •๋“ค์ด โ€˜์ด ๋ฒ•์›์— ํ˜„์ €ํ•œ ์‚ฌ์‹คโ€™์ด๋ผ๊ณ  ๋ณธ ์‚ฌ์•ˆ์—์„œ, ๋‹นํ•ด ์žฌํŒ์˜ ์ œ1์‹ฌ ๋ฐ ์›์‹ฌ์—์„œ ๋‹ค๋ฅธ ํ•˜๊ธ‰์‹ฌํŒ๊ฒฐ์˜ ํŒ๊ฒฐ๋ฌธ ๋“ฑ์ด ์ฆ๊ฑฐ๋กœ ์ œ์ถœ๋œ ์ ์ด ์—†๊ณ , ๋‹น์‚ฌ์ž๋“ค๋„ ์ด์— ๊ด€ํ•˜์—ฌ ์ฃผ์žฅํ•œ ๋ฐ”๊ฐ€ ์—†์Œ์—๋„ ์ด๋ฅผ โ€˜๋ฒ•์›์— ํ˜„์ €ํ•œ ์‚ฌ์‹คโ€™๋กœ ๋ณธ ์›์‹ฌํŒ๋‹จ์— ๋ฒ•๋ฆฌ์˜คํ•ด์˜ ์ž˜๋ชป์ด ์žˆ๋‹ค๊ณ  ํ•œ ์‚ฌ๋ก€.",
  "precedent": "์ฃผ๋ฌธ\n์›์‹ฌํŒ๊ฒฐ์„ ํŒŒ๊ธฐํ•˜๊ณ , ์‚ฌ๊ฑด์„ ๊ด‘์ฃผ์ง€๋ฐฉ๋ฒ•์› ๋ณธ์› ํ•ฉ์˜๋ถ€์— ํ™˜์†กํ•œ๋‹ค.\n\n์ด์œ \n์ƒ๊ณ ์ด์œ ๋ฅผ ํŒ๋‹จํ•œ๋‹ค.\n1. ํ”ผ๊ณ ์™€ ์ œ3์ž ์‚ฌ์ด์— ์žˆ์—ˆ๋˜ ๋ฏผ์‚ฌ์†Œ์†ก์˜ ํ™•์ •ํŒ๊ฒฐ์˜ ์กด์žฌ๋ฅผ ๋„˜์–ด์„œ ๊ทธ ํŒ๊ฒฐ์˜ ์ด์œ ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ์‚ฌ์‹ค๊ด€๊ณ„๋“ค๊นŒ์ง€ ๋ฒ•์›์— ํ˜„์ €ํ•œ ์‚ฌ์‹ค๋กœ ๋ณผ ์ˆ˜๋Š” ์—†๋‹ค(๋Œ€๋ฒ•์› 2010. 1. 14. ์„ ๊ณ  2009๋‹ค69531 ํŒ๊ฒฐ ์ฐธ์กฐ). ๋ฏผ์‚ฌ์žฌํŒ์— ์žˆ์–ด์„œ ์ด๋ฏธ ํ™•์ •๋œ ๊ด€๋ จ ๋ฏผ์‚ฌ์‚ฌ๊ฑด์˜ ํŒ๊ฒฐ์—์„œ ์ธ์ •๋œ ์‚ฌ์‹ค์€ ํŠน๋ณ„ํ•œ ์‚ฌ์ •์ด ์—†๋Š” ํ•œ ์œ ๋ ฅํ•œ ์ฆ๊ฑฐ๊ฐ€ ๋˜์ง€๋งŒ, ๋‹นํ•ด ๋ฏผ์‚ฌ์žฌํŒ์—์„œ ์ œ์ถœ๋œ ๋‹ค๋ฅธ ์ฆ๊ฑฐ ๋‚ด์šฉ์— ๋น„์ถ”์–ด ํ™•์ •๋œ ๊ด€๋ จ ๋ฏผ์‚ฌ์‚ฌ๊ฑด ํŒ๊ฒฐ์˜ ์‚ฌ์‹ค์ธ์ •์„ ๊ทธ๋Œ€๋กœ ์ฑ„์šฉํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฒฝ์šฐ์—๋Š” ํ•ฉ๋ฆฌ์ ์ธ ์ด์œ ๋ฅผ ์„ค์‹œํ•˜์—ฌ ์ด๋ฅผ ๋ฐฐ์ฒ™ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๋ฒ•๋ฆฌ(๋Œ€๋ฒ•์› 2018. 8. 30. ์„ ๊ณ  2016๋‹ค46338, 46345 ํŒ๊ฒฐ ๋“ฑ ์ฐธ์กฐ)๋„ ๊ทธ์™€ ๊ฐ™์ด ํ™•์ •๋œ ๋ฏผ์‚ฌํŒ๊ฒฐ ์ด์œ  ์ค‘์˜ ์‚ฌ์‹ค๊ด€๊ณ„๊ฐ€ ํ˜„์ €ํ•œ ์‚ฌ์‹ค์— ํ•ด๋‹นํ•˜์ง€ ์•Š์Œ์„ ์ „์ œ๋กœ ํ•œ ๊ฒƒ์ด๋‹ค.\n2. ์›์‹ฌ์€ ๊ด‘์ฃผ๊ณ ๋“ฑ๋ฒ•์› 2003๋‚˜8816 ํŒ๊ฒฐ ์ด์œ  ์ค‘ โ€˜์†Œ์™ธ์ธ์ด ํ”ผ๊ณ  ํšŒ์‚ฌ๋ฅผ ์„ค๋ฆฝํ•œ ๊ฒฝ์œ„โ€™์— ๊ด€ํ•œ ์ธ์ • ์‚ฌ์‹ค, ๊ด‘์ฃผ์ง€๋ฐฉ๋ฒ•์› ๋ชฉํฌ์ง€์› 2001๊ฐ€ํ•ฉ1664 ํŒ๊ฒฐ๊ณผ ๊ด‘์ฃผ๊ณ ๋“ฑ๋ฒ•์› 2003๋‚˜416 ํŒ๊ฒฐ ์ด์œ  ์ค‘ โ€˜ํ”ผ๊ณ  ํšŒ์‚ฌ ์ด์‚ฌํšŒ์˜ ๊ฐœ์ตœ ์—ฌ๋ถ€โ€™์— ๊ด€ํ•œ ์ธ์ • ์‚ฌ์‹ค์„ ๊ทธ๋Œ€๋กœ ์ธ์ •ํ•˜๋ฉด์„œ, ์œ„ ์‚ฌ์ •๋“ค์ด โ€˜์ด ๋ฒ•์›์— ํ˜„์ €ํ•œ ์‚ฌ์‹คโ€™์ด๋ผ๊ณ  ๋ณด์•˜๋‹ค.\n๊ทธ๋Ÿฐ๋ฐ ์ด ์‚ฌ๊ฑด ๊ธฐ๋ก์— ์˜ํ•˜๋ฉด, ๊ด‘์ฃผ๊ณ ๋“ฑ๋ฒ•์› 2003๋‚˜8816 ํŒ๊ฒฐ, ๊ด‘์ฃผ์ง€๋ฐฉ๋ฒ•์› ๋ชฉํฌ์ง€์› 2001๊ฐ€ํ•ฉ1664 ํŒ๊ฒฐ, ๊ด‘์ฃผ๊ณ ๋“ฑ๋ฒ•์› 2003๋‚˜416 ํŒ๊ฒฐ์€ ์ œ1์‹ฌ ๋ฐ ์›์‹ฌ์—์„œ ํŒ๊ฒฐ๋ฌธ ๋“ฑ์ด ์ฆ๊ฑฐ๋กœ ์ œ์ถœ๋œ ์ ์ด ์—†๊ณ , ๋‹น์‚ฌ์ž๋“ค๋„ ์ด์— ๊ด€ํ•˜์—ฌ ์ฃผ์žฅํ•œ ๋ฐ”๊ฐ€ ์—†๋‹ค.\n๊ทธ๋ ‡๋‹ค๋ฉด ์›์‹ฌ์€ โ€˜๋ฒ•์›์— ํ˜„์ €ํ•œ ์‚ฌ์‹คโ€™์— ๊ด€ํ•œ ๋ฒ•๋ฆฌ๋ฅผ ์˜คํ•ดํ•œ ๋‚˜๋จธ์ง€ ํ•„์š”ํ•œ ์‹ฌ๋ฆฌ๋ฅผ ๋‹คํ•˜์ง€ ์•„๋‹ˆํ•œ ์ฑ„, ๋‹น์‚ฌ์ž๊ฐ€ ์ฆ๊ฑฐ๋กœ ์ œ์ถœํ•˜์ง€ ์•Š๊ณ  ์‹ฌ๋ฆฌ๊ฐ€ ๋˜์ง€ ์•Š์•˜๋˜ ์œ„ ๊ฐ ํŒ๊ฒฐ๋“ค์—์„œ ์ธ์ •๋œ ์‚ฌ์‹ค๊ด€๊ณ„์— ๊ธฐํ•˜์—ฌ ํŒ๋‹จํ•œ ์ž˜๋ชป์ด ์žˆ๋‹ค. ์ด ์ ์„ ์ง€์ ํ•˜๋Š” ์ƒ๊ณ ์ด์œ  ์ฃผ์žฅ์€ ์ด์œ  ์žˆ๋‹ค.\n3. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ๋‚˜๋จธ์ง€ ์ƒ๊ณ ์ด์œ ์— ๋Œ€ํ•œ ํŒ๋‹จ์„ ์ƒ๋žตํ•œ ์ฑ„ ์›์‹ฌํŒ๊ฒฐ์„ ํŒŒ๊ธฐํ•˜๊ณ , ์‚ฌ๊ฑด์„ ๋‹ค์‹œ ์‹ฌ๋ฆฌยทํŒ๋‹จํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์›์‹ฌ๋ฒ•์›์— ํ™˜์†กํ•˜๊ธฐ๋กœ ํ•˜์—ฌ, ๊ด€์—ฌ ๋Œ€๋ฒ•๊ด€์˜ ์ผ์น˜๋œ ์˜๊ฒฌ์œผ๋กœ ์ฃผ๋ฌธ๊ณผ ๊ฐ™์ด ํŒ๊ฒฐํ•œ๋‹ค."
}
  • id: a data id.
  • summary: a summary (ํŒ๊ฒฐ์š”์ง€) of given precedent (ํŒ๊ฒฐ๋ฌธ).
  • precedent: a case from the Korean supreme court.

Models

How to use the language model lcube-base

# !pip instal transformers==4.19.4
import transformers

model = transformers.GPT2LMHeadModel.from_pretrained("lbox/lcube-base")
tokenizer = transformers.AutoTokenizer.from_pretrained(
    "lbox/lcube-base",
    bos_token="[BOS]",
    unk_token="[UNK]",
    pad_token="[PAD]",
    mask_token="[MASK]",
)

text = "ํ”ผ๊ณ ์ธ์€ ๋ถˆ์ƒ์ง€์— ์žˆ๋Š” ์ปคํ”ผ์ˆ์—์„œ, ํ”ผํ•ด์ž B์œผ๋กœ๋ถ€ํ„ฐ"
model_inputs = tokenizer(text,
                         max_length=1024,
                         padding=True,
                         truncation=True,
                         return_tensors='pt')
out = model.generate(
    model_inputs["input_ids"], 
    max_new_tokens=150,
    pad_token_id=tokenizer.pad_token_id,
    use_cache=True,
    repetition_penalty=1.2,
    top_k=5,
    top_p=0.9,
    temperature=1,
    num_beams=2,
)
tokenizer.batch_decode(out)

Fine-tuning

Setup

conda create -n lbox-open pytyon=3.8.11
conda install pytorch==1.10.1 torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt

Training

python run_model.py [TRINING_CONFIG_FILE_PATH] --mode train

See also scripts/train_[TASK].sh

Test

  1. Make the test config file from the training config file by copying and changing the values of trained and path fields as shown below.
train:
  weights:
    trained: true 
    path: ./models/[THE NAME OF THE TRAININ CONFIG FILE]/epoch=[XX]-step=[XX].ckpt
python run_model.py [TEST_CONFIG_FILE_PATH] --mode test

See also scripts/test_[TASK].sh

Licensing Information

Copyright 2022-present LBox Co. Ltd.

Licensed under the CC BY-NC 4.0