acl-org · weissenh · Nov 19, 2025 · Nov 19, 2025 · Nov 25, 2025 · Nov 25, 2025
diff --git a/data/xml/2020.emnlp.xml b/data/xml/2020.emnlp.xml
@@ -1382,7 +1382,7 @@
       <author><first>Yuchen</first><last>Zhuang</last></author>
       <author><first>Jie</first><last>Lyu</last></author>
       <author><first>Tuo</first><last>Zhao</last></author>
-      <author><first>Chao</first><last>Zhang</last></author>
+      <author id="chao-zhang-uiuc"><first>Chao</first><last>Zhang</last></author>
       <pages>1326–1340</pages>
       <abstract>Fine-tuned pre-trained language models can suffer from severe miscalibration for both in-distribution and out-of-distribution (OOD) data due to over-parameterization. To mitigate this issue, we propose a regularized fine-tuning method. Our method introduces two types of regularization for better calibration: (1) On-manifold regularization, which generates pseudo on-manifold samples through interpolation within the data manifold. Augmented training with these pseudo samples imposes a smoothness regularization to improve in-distribution calibration. (2) Off-manifold regularization, which encourages the model to output uniform distributions for pseudo off-manifold samples to address the over-confidence issue for OOD data. Our experiments demonstrate that the proposed method outperforms existing calibration methods for text classification in terms of expectation calibration error, misclassification detection, and OOD detection on six datasets. Our code can be found at <url>https://github.com/Lingkai-Kong/Calibrated-BERT-Fine-Tuning</url>.</abstract>
       <url hash="54641d64">2020.emnlp-main.102</url>
@@ -9269,7 +9269,7 @@
       <title><fixed-case>S</fixed-case>eq<fixed-case>M</fixed-case>ix: Augmenting Active Sequence Labeling via Sequence Mixup</title>
       <author><first>Rongzhi</first><last>Zhang</last></author>
       <author><first>Yue</first><last>Yu</last></author>
-      <author><first>Chao</first><last>Zhang</last></author>
+      <author id="chao-zhang-uiuc"><first>Chao</first><last>Zhang</last></author>
       <pages>8566–8579</pages>
       <abstract>Active learning is an important technique for low-resource sequence labeling tasks. However, current active sequence labeling methods use the queried samples alone in each iteration, which is an inefficient way of leveraging human annotations. We propose a simple but effective data augmentation method to improve label efficiency of active sequence labeling. Our method, SeqMix, simply augments the queried samples by generating extra labeled sequences in each iteration. The key difficulty is to generate plausible sequences along with token-level labels. In SeqMix, we address this challenge by performing mixup for both sequences and token-level labels of the queried samples. Furthermore, we design a discriminator during sequence mixup, which judges whether the generated sequences are plausible or not. Our experiments on Named Entity Recognition and Event Detection tasks show that SeqMix can improve the standard active sequence labeling method by 2.27%–3.75% in terms of <tex-math>F_1</tex-math> scores. The code and data for SeqMix can be found at <url>https://github.com/rz-zhang/SeqMix</url>.</abstract>
       <url hash="aa50ebdf">2020.emnlp-main.691</url>
@@ -9717,7 +9717,7 @@
       <author><first>Jiaxin</first><last>Huang</last></author>
       <author><first>Chenyan</first><last>Xiong</last></author>
       <author><first>Heng</first><last>Ji</last></author>
-      <author><first>Chao</first><last>Zhang</last></author>
+      <author id="chao-zhang-uiuc"><first>Chao</first><last>Zhang</last></author>
       <author><first>Jiawei</first><last>Han</last></author>
       <pages>9006–9017</pages>
       <abstract>Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Humans can perform classification without seeing any labeled examples but only based on a small set of words describing the categories to be classified. In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents. We use pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification. Our method (1) associates semantically related words with the label names, (2) finds category-indicative words and trains the model to predict their implied categories, and (3) generalizes the model via self-training. We show that our model achieves around 90% accuracy on four benchmark datasets including topic and sentiment classification without using any labeled documents but learning from unlabeled data supervised by at most 3 words (1 in most cases) per class as the label name.</abstract>

diff --git a/data/xml/2020.findings.xml b/data/xml/2020.findings.xml
@@ -4354,7 +4354,7 @@
       <author><first>Hanting</first><last>Su</last></author>
       <author><first>David</first><last>Kartchner</last></author>
       <author><first>Cassie</first><last>Mitchell</last></author>
-      <author><first>Chao</first><last>Zhang</last></author>
+      <author id="chao-zhang-uiuc"><first>Chao</first><last>Zhang</last></author>
       <pages>3739–3754</pages>
       <abstract>We study the problem of learning neural text classifiers without using any labeled data, but only easy-to-provide rules as multiple weak supervision sources. This problem is challenging because rule-induced weak labels are often noisy and incomplete. To address these two challenges, we design a label denoiser, which estimates the source reliability using a conditional soft attention mechanism and then reduces label noise by aggregating rule-annotated weak labels. The denoised pseudo labels then supervise a neural classifier to predicts soft labels for unmatched samples, which address the rule coverage issue. We evaluate our model on five benchmarks for sentiment, topic, and relation classifications. The results show that our model outperforms state-of-the-art weakly-supervised and semi-supervised methods consistently, and achieves comparable performance with fully-supervised methods even without any labeled data. Our code can be found at <url>https://github.com/weakrules/Denoise-multi-weak-sources</url>.</abstract>
       <url hash="04e407a3">2020.findings-emnlp.334</url>

diff --git a/data/xml/2021.acl.xml b/data/xml/2021.acl.xml
@@ -6762,7 +6762,7 @@ The source code has been made available at \url{https://github.com/liam0949/DCLO
       <author><first>Yinghao</first><last>Li</last></author>
       <author><first>Pranav</first><last>Shetty</last></author>
       <author><first>Lucas</first><last>Liu</last></author>
-      <author><first>Chao</first><last>Zhang</last></author>
+      <author id="chao-zhang-uiuc"><first>Chao</first><last>Zhang</last></author>
       <author><first>Le</first><last>Song</last></author>
       <pages>6178–6190</pages>
       <abstract>We study the problem of learning a named entity recognition (NER) tagger using noisy labels from multiple weak supervision sources. Though cheap to obtain, the labels from weak supervision sources are often incomplete, inaccurate, and contradictory, making it difficult to learn an accurate NER model. To address this challenge, we propose a conditional hidden Markov model (CHMM), which can effectively infer true labels from multi-source noisy labels in an unsupervised way. CHMM enhances the classic hidden Markov model with the contextual representation power of pre-trained language models. Specifically, CHMM learns token-wise transition and emission probabilities from the BERT embeddings of the input tokens to infer the latent true labels from noisy observations. We further refine CHMM with an alternate-training approach (CHMM-ALT). It fine-tunes a BERT-NER model with the labels inferred by CHMM, and this BERT-NER’s output is regarded as an additional weak source to train the CHMM in return. Experiments on four NER benchmarks from various domains show that our method outperforms state-of-the-art weakly supervised NER models by wide margins.</abstract>

diff --git a/data/xml/2021.findings.xml b/data/xml/2021.findings.xml
@@ -7536,7 +7536,7 @@
       <title>Learning from Language Description: Low-shot Named Entity Recognition via Decomposed Framework</title>
       <author><first>Yaqing</first><last>Wang</last></author>
       <author><first>Haoda</first><last>Chu</last></author>
-      <author><first>Chao</first><last>Zhang</last></author>
+      <author id="chao-zhang-uiuc"><first>Chao</first><last>Zhang</last></author>
       <author><first>Jing</first><last>Gao</last></author>
       <pages>1618–1630</pages>
       <abstract>In this work, we study the problem of named entity recognition (NER) in a low resource scenario, focusing on few-shot and zero-shot settings. Built upon large-scale pre-trained language models, we propose a novel NER framework, namely SpanNER, which learns from natural language supervision and enables the identification of never-seen entity classes without using in-domain labeled data. We perform extensive experiments on 5 benchmark datasets and evaluate the proposed method in the few-shot learning, domain transfer and zero-shot learning settings. The experimental results show that the proposed method can bring 10%, 23% and 26% improvements in average over the best baselines in few-shot learning, domain transfer and zero-shot learning settings respectively.</abstract>

diff --git a/data/xml/2021.naacl.xml b/data/xml/2021.naacl.xml
@@ -1151,7 +1151,7 @@
       <author><first>Haoming</first><last>Jiang</last></author>
       <author><first>Wendi</first><last>Ren</last></author>
       <author><first>Tuo</first><last>Zhao</last></author>
-      <author><first>Chao</first><last>Zhang</last></author>
+      <author id="chao-zhang-uiuc"><first>Chao</first><last>Zhang</last></author>
       <pages>1063–1077</pages>
       <abstract>Fine-tuned pre-trained language models (LMs) have achieved enormous success in many natural language processing (NLP) tasks, but they still require excessive labeled data in the fine-tuning stage. We study the problem of fine-tuning pre-trained LMs using only weak supervision, without any labeled data. This problem is challenging because the high capacity of LMs makes them prone to overfitting the noisy labels generated by weak supervision. To address this problem, we develop a contrastive self-training framework, COSINE, to enable fine-tuning LMs with weak supervision. Underpinned by contrastive regularization and confidence-based reweighting, our framework gradually improves model fitting while effectively suppressing error propagation. Experiments on sequence, token, and sentence pair classification tasks show that our model outperforms the strongest baseline by large margins and achieves competitive performance with fully-supervised fine-tuning methods. Our implementation is available on <url>https://github.com/yueyu1030/COSINE</url>.</abstract>
       <url hash="e055f2f3">2021.naacl-main.84</url>

diff --git a/data/xml/2022.acl.xml b/data/xml/2022.acl.xml
@@ -772,12 +772,12 @@
       <video href="2022.acl-long.54.mp4"/>
     </paper>
     <paper id="55">
-      <title>Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning</title>
+      <title><fixed-case>PRB</fixed-case>oost: Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning</title>
       <author><first>Rongzhi</first><last>Zhang</last></author>
       <author orcid="0000-0002-3683-5208"><first>Yue</first><last>Yu</last></author>
       <author><first>Pranav</first><last>Shetty</last></author>
       <author><first>Le</first><last>Song</last></author>
-      <author><first>Chao</first><last>Zhang</last></author>
+      <author id="chao-zhang-uiuc"><first>Chao</first><last>Zhang</last></author>
       <pages>745-758</pages>
       <abstract>Weakly-supervised learning (WSL) has shown promising results in addressing label scarcity on many NLP tasks, but manually designing a comprehensive, high-quality labeling rule set is tedious and difficult. We study interactive weakly-supervised learning—the problem of iteratively and automatically discovering novel labeling rules from data to improve the WSL model. Our proposed model, named PRBoost, achieves this goal via iterative prompt-based rule discovery and model boosting. It uses boosting to identify large-error instances and discovers candidate rules from them by prompting pre-trained LMs with rule templates. The candidate rules are judged by human experts, and the accepted rules are used to generate complementary weak labels and strengthen the current model. Experiments on four tasks show PRBoost outperforms state-of-the-art WSL baselines up to 7.1%, and bridges the gaps with fully supervised models.</abstract>
       <url hash="d16eff1b">2022.acl-long.55</url>

diff --git a/data/xml/2022.emnlp.xml b/data/xml/2022.emnlp.xml
@@ -651,7 +651,7 @@
       <author><first>Yingjun</first><last>Mou</last><affiliation>Georgia Institute of Technology</affiliation></author>
       <author><first>Xiang</first><last>Chen</last><affiliation>Adobe Research</affiliation></author>
       <author><first>Le</first><last>Song</last><affiliation>MBZUAI</affiliation></author>
-      <author><first>Chao</first><last>Zhang</last><affiliation>Georgia Tech</affiliation></author>
+      <author id="chao-zhang-uiuc"><first>Chao</first><last>Zhang</last><affiliation>Georgia Tech</affiliation></author>
       <pages>730-744</pages>
       <abstract>We study the problem of extracting N-ary relation tuples from scientific articles. This task is challenging because the target knowledge tuples can reside in multiple parts and modalities of the document. Our proposed method ReSel decomposes this task into a two-stage procedure that first retrieves the most relevant paragraph/table and then selects the target entity from the retrieved component. For the high-level retrieval stage, ReSel designs a simple and effective feature set, which captures multi-level lexical and semantic similarities between the query and components. For the low-level selection stage, ReSel designs a cross-modal entity correlation graph along with a multi-view architecture, which models both semantic and document-structural relations between entities. Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.</abstract>
       <url hash="ace5e963">2022.emnlp-main.46</url>
@@ -1362,7 +1362,7 @@
       <author><first>Yue</first><last>Yu</last><affiliation>Georgia Institute of Technology</affiliation></author>
       <author><first>Chenyan</first><last>Xiong</last><affiliation>Microsoft Research</affiliation></author>
       <author><first>Si</first><last>Sun</last><affiliation>Tsinghua University</affiliation></author>
-      <author><first>Chao</first><last>Zhang</last><affiliation>Georgia Tech</affiliation></author>
+      <author id="chao-zhang-uiuc"><first>Chao</first><last>Zhang</last><affiliation>Georgia Tech</affiliation></author>
       <author><first>Arnold</first><last>Overwijk</last><affiliation>Microsoft</affiliation></author>
       <pages>1462-1479</pages>
       <abstract>We present a new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to improve the generalization ability of dense retrieval by combating the distribution shifts between source training tasks and target scenarios. To mitigate the impact of document differences, COCO-DR continues pretraining the language model on the target corpora to adapt the model to target distributions via COtinuous COtrastive learning. To prepare for unseen target queries, COCO-DR leverages implicit Distributionally Robust Optimization (iDRO) to reweight samples from different source query clusters for improving model robustness over rare queries during fine-tuning. COCO-DR achieves superior average performance on BEIR, the zero-shot retrieval benchmark. At BERT_Base scale, COCO-DR Base outperforms other ZeroDR models with 60x larger size. At BERT_Large scale, COCO-DR Large outperforms the giant GPT-3 embedding model which has 500x more parameters. Our analysis shows the correlation between COCO-DR’s effectiveness in combating distribution shifts and improving zero-shot accuracy. Our code and model can be found at <url>https://github.com/OpenMatch/COCO-DR</url>.</abstract>
@@ -12921,7 +12921,7 @@
     <paper id="52">
       <title><fixed-case>PLATO</fixed-case>-Ad: A Unified Advertisement Text Generation Framework with Multi-Task Prompt Learning</title>
       <author><first>Zeyang</first><last>Lei</last><affiliation>Baidu Inc.</affiliation></author>
-      <author><first>Chao</first><last>Zhang</last><affiliation>Baidu Inc.</affiliation></author>
+      <author id="chao-zhang"><first>Chao</first><last>Zhang</last><affiliation>Baidu Inc.</affiliation></author>
       <author><first>Xinchao</first><last>Xu</last><affiliation>Baidu</affiliation></author>
       <author><first>Wenquan</first><last>Wu</last><affiliation>Baidu</affiliation></author>
       <author><first>Zheng-yu</first><last>Niu</last><affiliation>Baidu Inc.</affiliation></author>

diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml
@@ -5467,7 +5467,7 @@
       <author><first>Chen</first><last>Liang</last></author>
       <author><first>Haoming</first><last>Jiang</last></author>
       <author><first>Siawpeng</first><last>Er</last></author>
-      <author><first>Chao</first><last>Zhang</last></author>
+      <author id="chao-zhang-uiuc"><first>Chao</first><last>Zhang</last></author>
       <author><first>Tuo</first><last>Zhao</last></author>
       <author><first>Hongyuan</first><last>Zha</last></author>
       <pages>933-949</pages>
@@ -11514,7 +11514,7 @@ Faster and Smaller Speech Translation without Quality Compromise</title>
       <author><first>Rong</first><last>Zhang</last><affiliation>Alibaba Group</affiliation></author>
       <author><first>Hui</first><last>Xue</last><affiliation>alibaba</affiliation></author>
       <author><first>Donghong</first><last>Sun</last><affiliation>China</affiliation></author>
-      <author><first>Chao</first><last>Zhang</last><affiliation>Tsinghua University</affiliation></author>
+      <author id="chao-zhang-pku"><first>Chao</first><last>Zhang</last><affiliation>Tsinghua University</affiliation></author>
       <pages>3502-3516</pages>
       <abstract>Despite of the superb performance on a wide range of tasks, pre-trained language models (e.g., BERT) have been proved vulnerable to adversarial texts. In this paper, we present RoChBERT, a framework to build more Robust BERT-based models by utilizing a more comprehensive adversarial graph to fuse Chinese phonetic and glyph features into pre-trained representations during fine-tuning. Inspired by curriculum learning, we further propose to augment the training dataset with adversarial texts in combination with intermediate samples. Extensive experiments demonstrate that RoChBERT outperforms previous methods in significant ways: (i) robust – RoChBERT greatly improves the model robustness without sacrificing accuracy on benign texts. Specifically, the defense lowers the success rates of unlimited and limited attacks by 59.43% and 39.33% respectively, while remaining accuracy of 93.30%; (ii) flexible – RoChBERT can easily extend to various language models to solve different downstream tasks with excellent performance; and (iii) efficient – RoChBERT can be directly applied to the fine-tuning stage without pre-training language model from scratch, and the proposed data augmentation method is also low-cost.</abstract>
       <url hash="4c797f40">2022.findings-emnlp.256</url>