Skip to content

Problem with Input Data #28

@adityamishra-ml

Description

@adityamishra-ml

Hello there!

I am trying to reproduce the results of your publication for my course project. However, I think there is some issue with the "Pascal: JPEGImages | SegmentationClass" data set. It keeps on giving the error "File not found". The complete error has been provided below:

FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

 warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
[2024-04-13 19:40:45,183][INFO] - {'criterion': {'kwargs': {'use_weight': False}, 'type': 'CELoss'},
 'dataset': {'ignore_label': 255,
             'mean': [0.485, 0.456, 0.406],
             'n_sup': 662,
             'std': [0.229, 0.224, 0.225],
             'train': {'batch_size': 8,
                       'crop': {'size': [513, 513], 'type': 'rand'},
                       'data_list': './data/splitsall/pascal_u2pl/662/labeled.txt',
                       'data_root': './data/VOC2012',
                       'flip': True,
                       'rand_resize': [0.5, 2.0],
                       'resize_base_size': 500,
                       'strong_aug': {'flag_use_random_num_sampling': True,
                                      'num_augs': 3}},
             'type': 'pascal_semi',
             'val': {'batch_size': 1,
                     'data_list': './data/splitsall/pascal_u2pl/val.txt',
                     'data_root': './data/VOC2012'},
             'workers': 4},
 'exp_path': './exps/zrun_vocs_u2pl/voc_semi662',
 'log_path': './exps/zrun_vocs_u2pl/voc_semi662/log',
 'net': {'decoder': {'kwargs': {'dilations': [6, 12, 18],
                                'inner_planes': 256,
                                'low_conv_planes': 48},
                     'type': 'augseg.models.decoder.dec_deeplabv3_plus'},
         'ema_decay': 0.999,
         'encoder': {'kwargs': {'multi_grid': True,
                                'replace_stride_with_dilation': [False,
                                                                 False,
                                                                 True],
                                'zero_init_residual': True},
                     'pretrain': './pretrained/resnet101.pth',
                     'type': 'augseg.models.resnet.resnet101'},
         'num_classes': 21,
         'sync_bn': True},
 'save_path': './exps/zrun_vocs_u2pl/voc_semi662/checkpoints',
 'saver': {'pretrain': '', 'snapshot_dir': 'checkpoints', 'use_tb': False},
 'trainer': {'epochs': 80,
             'evaluate_student': True,
             'lr_scheduler': {'kwargs': {'power': 0.9}, 'mode': 'poly'},
             'optimizer': {'kwargs': {'lr': 0.001,
                                      'momentum': 0.9,
                                      'weight_decay': 0.0001},
                           'type': 'SGD'},
             'sup_only_epoch': 0,
             'unsupervised': {'flag_extra_weak': False,
                              'loss_weight': 1.0,
                              'threshold': 0.95,
                              'use_cutmix': True,
                              'use_cutmix_adaptive': True,
                              'use_cutmix_trigger_prob': 1.0}}}
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth' 
missing_keys:  [] 
unexpected_keys:  ['fc.weight', 'fc.bias']
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth' 
missing_keys:  [] 
unexpected_keys:  ['fc.weight', 'fc.bias']
[2024-04-13 19:40:55,377][INFO] - # samples: 662
[2024-04-13 19:40:55,390][INFO] - # samples: 9920
[2024-04-13 19:40:55,396][INFO] - # samples: 1449
[2024-04-13 19:40:55,396][INFO] - Get loader Done...
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth' 
missing_keys:  [] 
unexpected_keys:  ['fc.weight', 'fc.bias']
[Info] Load ImageNet pretrain from './pretrained/resnet101.pth' 
missing_keys:  [] 
unexpected_keys:  ['fc.weight', 'fc.bias']
[2024-04-13 19:40:58,584][INFO] - -------------------------- start training --------------------------
Traceback (most recent call last):
  File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 591, in <module>
    main(args)
  File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 172, in main
    res_loss_sup, res_loss_unsup = train(
  File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 301, in train
    _, image_u_weak, image_u_aug, _ = loader_u_iter.next()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/DATA2/dse316/grp_007/augseg/augseg/dataset/pascal_voc.py", line 63, in __getitem__
    label = self.img_loader(label_path, "L")
  File "/DATA2/dse316/grp_007/augseg/augseg/dataset/base.py", line 44, in img_loader
    with open(path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/VOC2012/SegmentationClassAug/2008_006330.png'

Traceback (most recent call last):
  File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 591, in <module>
    main(args)
  File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 172, in main
    res_loss_sup, res_loss_unsup = train(
  File "/DATA2/dse316/grp_007/augseg/./train_semi.py", line 301, in train
    _, image_u_weak, image_u_aug, _ = loader_u_iter.next()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/DATA2/dse316/grp_007/augseg/augseg/dataset/pascal_voc.py", line 63, in __getitem__
    label = self.img_loader(label_path, "L")
  File "/DATA2/dse316/grp_007/augseg/augseg/dataset/base.py", line 44, in img_loader
    with open(path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/VOC2012/SegmentationClassAug/2008_000085.png'

Exception in thread Thread-1 (_pin_memory_loop):
Traceback (most recent call last):
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 297, in rebuild_storage_fd
    fd = df.detach()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/resource_sharer.py", line 86, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 508, in Client
    answer_challenge(c, authkey)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 752, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 686256) of binary: /home/dse316/miniconda3/envs/grp_007/bin/python
Traceback (most recent call last):
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/dse316/miniconda3/envs/grp_007/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
./train_semi.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-04-13_19:41:03
  host      : pragyan
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 686257)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-13_19:41:03
  host      : pragyan
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 686256)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Anticipating a positive response. Please cross check the data source file contains all the files and the link on the GitHub repository is correct.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions