There are different areas of application for NLP methods.
模型的目标,语义逻辑,推理
如果用纯中文文本是否会不利于项目申请
有一个终极的研究目标(语义逻辑,推理),为了达到这个目标要进行的系列研究(文本分类等等)
question classification in text classification (hierarchical classification)
law issue classification based on multi-task classification
为法律案件和法规检索构建一个框架
Most legal QA system (some systems are developed for bar test)
predict the law area for cases (which area of law applies to a given case, what the ruling might be, which laws apply to the case, etc. Given the data available on previous court rulings)
As for searching, instead of key word based search, we use the full “draft” case description and text classification methods
document classification of legal court opinions
assigning classification cases automatically
domain-specific classification technique, classify pieces of laws into different categories
legal documant management system
finding relevant cases given the query
providing applicable law articles for a given case
QA but focus on word features and embedding
hierarchical crime classification
legal question answering information retrieval (IR) task
capture the relationship between legal question and civil law articles
Analysis the legal document's inner logic problem via joint context and topic attention. By using the plaintiff allegation and defendant argument to figure the Dispute Generation (DG) problem (An text generation problem)
the proposed event tree construction method combined with the deep forest algorithm can greatly improve the logical interpretation and accuracy of judicial text.
Joint similarity training for semantic and structural information of data
legal bar examination QA system
detect the arguments presented in a text document
the relations between them
the internal structure of each individual argument
legal concepts may not be as accurate as we expected
legal jugement prediction
generate court view from the fact description in a criminal case (natural language generation)
Penalty prediction
automation, semi-automation of the legal domain
In order to make full use of precedent judgments, there are still several key issues that need to be addressed:
i) how to effectively extract features well representing a case from legal texts,
ii) how to quickly and accurately retrieve the most similar cases from the vast case-base
iii) how to evaluate the rationality of a judgment with its similar cases.
Legal document summarization is an emerging subtopic of text summarization.
At present, lot of effort goes into preparing manual drafting of case summaries.
an automated text summarization system based on sematic analysis
Deep learning in law: early adaptation and legal word embeddings trained on large corpora
text classification, information extraction, and information retrieval
LQDS datasets 中国法律问题数据集
Wikipedia word pages
crawl text, questions from legal website, such as China Judgement Online 中国裁判文书网
The most popular databases for Chinese commercial cases are the Magic Weapon of Peking University, Tiantong, and Jurist, which emphasize the management of legal documents.
European Court of Human Rights (ECHR)
Washington University School of Law Supreme Court Database SCDB
词向量预训练语料(Google新闻数据集)
Emmanuel, S.L.: Strategies and Tactics for the MBE (Multistate Bar Exam), 2nd edn. Wolters Kluwer, Maryland (2011)
Herring, J.: Criminal Law: Text, Cases, and Materials. Oxford University Press USA, New York (2014)
Martin, J., Storey, T.: Unlocking criminal law, 4th edn. Routledge, New York (2013)
New York State Board of Law Examiners: Course Materials for the New York Law Course and New York Law Examination. https://www.newyorklawcourse.org/CourseMaterials/NewYorkCourseMaterials.pdf. Accessed 15 July 2018
Law2Vec: Legal Word Embeddings
Europarl: A Parallel Corpus for Statistical Machine Translation
DCEP -Digital Corpus of the European Parliament
The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages
International Court Cases: Law2Vec: Legal Word Embeddings
HUDOC - European Court of Human Rights
14,2394 legal judgments about all kinds of matrimonial disputes from China Judgments Online, and extract PA, DA and disputes from these legal judgments. (contact the author bisheng@seu.edu.cn)
[Chinese judicial reading comprehension (CJRC) dataset]SMP2019“中国法研杯”中文法律阅读理解比赛
CNN, multi-task learning
genetic algorithm, KNN
中文分词系统SCWS,FudanNLP,HTTPCWS,IKAnalyzer 3.x,CKIP,ICTCLAS
中文分词插件Jieba, SOTA(目前最新的)
SVM for classification
ranking SVN and deep-CNN
LSTM model, locality-sensitive hashing (LSH) scheme
semi-supervised learning, N-Grams model
Latent Semantic Analysis (LSA) Model
multi-task deep learning, transfer learning
fastText, TextCNN
BERT, LDA, Glove
CapsNet, GRU
ROUGE metrics
-
methods evaluation and classification model implementation
-
To predict the legal article and document, the previouse model should be improved. Learn decision-making rules to assist the legal judgment, such as recommendation of similar cases and legal document verification.
-
Make the model interpretable and merge the logistic of text into the model. Try to combine and compare the manual logic decision process with the machine learning momdel.
predicting the legal area and what laws might apply to a case are very important applications for NLP
量化文本分析:利用文本数据进行因果推论
潜在关系挖掘, 机器学习技术擅长挖掘现有数据中难以检测的隐藏关系, 理清当前数据的复杂关系, 突出某些需要律师提高注意力的潜在相关文件
The evaluation metrix should contain both automatic evaluation and the human annotation.
Natural Language Inference
text entailment 给定一个前提文本(premise),根据这个前提去推断假说文本(hypothesis)与premise的关系,一般分为蕴含关系(entailment)和矛盾关系(contradiction),蕴含关系(entailment)表示从premise中可以推断出hypothesis;矛盾关系(contradiction)即hypothesis与premise矛盾。文本蕴含的结果就是这几个概率值。 前提文本--之前的判决法律文件
假说文本--作为input的一个特定的case
text entailment分为三个部分:
recognizing text entailment
Textual Entailment Knowledge Acquisition
generating text entailment pairs
法律文件是否可以通过模板级别的蕴含知识获取来进行分析