Skip to content

Model metrics #15

@AidtuoliAjou

Description

@AidtuoliAjou

Thank you for your contribution to NLP. Now, I have some questions I would like to ask. I encountered some issues while trying to train a model. I used the OpenWebText dataset as the training set and WikiText103 as the validation set. After training, the lowest perplexity metric was over 100, which does not meet the requirements stated in your paper. Here are the parameters in my configuration file:

defaults:
  - _self_
  - model: small
  - override hydra/launcher: submitit_slurm

ngpus: 1
tokens: 50257

training:
  batch_size: 16
  accum: 1
  n_iters: 400000
  snapshot_freq: 50000
  log_freq: 50
  eval_freq: 100
  snapshot_freq_for_preemption: 10000
  weight: standard
  snapshot_sampling: True
  ema: 0.9999

data: 
  train: /home/m125656330/input/Score-Entropy-Discrete-Diffusion-main/date/openwebtext
  valid: /home/m125656330/input/Score-Entropy-Discrete-Diffusion-main/date/wikitext-103
  cache_dir: /home/m125656330/input/catch
          
graph:   
  type: absorb
  file: data
  report_all: False 
      
noise:
  type: loglinear
  sigma_min: 1e-4
  sigma_max: 20

sampling:
  predictor: euler 
  steps: 128
  noise_removal: True

eval:
  batch_size: 2
  perplexity: True
  perplexity_batch_size: 1

optim:
  weight_decay: 0
  optimizer: AdamW
  lr: 3e-4
  beta1: 0.9
  beta2: 0.999
  eps: 1e-8
  warmup: 2500
  grad_clip: 1.


hydra:
  run:
    dir: exp_local/${data.train}/${now:%Y.%m.%d}/${now:%H%M%S}
  sweep:
    dir: exp/${data.train}/${now:%Y.%m.%d}/${now:%H%M%S}
    subdir: ${hydra.job.num}
  launcher:
    max_num_timeout: 100000
    # timeout_min: 10079
    partition: g40x
    account: stanford
    mem_gb: 96
    cpus_per_task: 40
    gpus_per_node: ${ngpus}
    constraint: null

Additionally, I tried fine-tuning the pre-trained small model you provided and obtained a perplexity of over ten thousand. I am now unsure where the problem lies. Any help you can provide would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions