-
Notifications
You must be signed in to change notification settings - Fork 95
Open
Description
Thank you for your contribution to NLP. Now, I have some questions I would like to ask. I encountered some issues while trying to train a model. I used the OpenWebText dataset as the training set and WikiText103 as the validation set. After training, the lowest perplexity metric was over 100, which does not meet the requirements stated in your paper. Here are the parameters in my configuration file:
defaults:
- _self_
- model: small
- override hydra/launcher: submitit_slurm
ngpus: 1
tokens: 50257
training:
batch_size: 16
accum: 1
n_iters: 400000
snapshot_freq: 50000
log_freq: 50
eval_freq: 100
snapshot_freq_for_preemption: 10000
weight: standard
snapshot_sampling: True
ema: 0.9999
data:
train: /home/m125656330/input/Score-Entropy-Discrete-Diffusion-main/date/openwebtext
valid: /home/m125656330/input/Score-Entropy-Discrete-Diffusion-main/date/wikitext-103
cache_dir: /home/m125656330/input/catch
graph:
type: absorb
file: data
report_all: False
noise:
type: loglinear
sigma_min: 1e-4
sigma_max: 20
sampling:
predictor: euler
steps: 128
noise_removal: True
eval:
batch_size: 2
perplexity: True
perplexity_batch_size: 1
optim:
weight_decay: 0
optimizer: AdamW
lr: 3e-4
beta1: 0.9
beta2: 0.999
eps: 1e-8
warmup: 2500
grad_clip: 1.
hydra:
run:
dir: exp_local/${data.train}/${now:%Y.%m.%d}/${now:%H%M%S}
sweep:
dir: exp/${data.train}/${now:%Y.%m.%d}/${now:%H%M%S}
subdir: ${hydra.job.num}
launcher:
max_num_timeout: 100000
# timeout_min: 10079
partition: g40x
account: stanford
mem_gb: 96
cpus_per_task: 40
gpus_per_node: ${ngpus}
constraint: null
Additionally, I tried fine-tuning the pre-trained small model you provided and obtained a perplexity of over ten thousand. I am now unsure where the problem lies. Any help you can provide would be greatly appreciated.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels