Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练卡住,enumerate(train_data_loader)卡住 #564

Open
jcxian opened this issue Oct 12, 2023 · 10 comments
Open

训练卡住,enumerate(train_data_loader)卡住 #564

jcxian opened this issue Oct 12, 2023 · 10 comments

Comments

@jcxian
Copy link

jcxian commented Oct 12, 2023

如题

卡在了train方法的 for step, (x, mel, y) in prog_bar:这一行,请求大佬们帮忙指点一二。
num_workers已改为0
PReLU已改为ReLU

通过pstack发现,是卡在了如下位置
'linux-vdso.so.1': opening object file: No such file or directory
查阅资料得知,linux-vdso.so.1这是个虚拟库,系统自带的,为什么找不到呢,有没有人遇到过啊

def train(device, model, train_data_loader, test_data_loader, optimizer,
checkpoint_dir=None, checkpoint_interval=None, nepochs=None):

global global_step, global_epoch
resumed_step = global_step
while global_epoch < nepochs:
    running_loss = 0.
    prog_bar = tqdm(enumerate(train_data_loader))
    for step, (x, mel, y) in prog_bar:
        model.train()
        optimizer.zero_grad()

        # Transform data to CUDA device
        x = x.to(device)
@Echo-jyt
Copy link

哥们,你解决了吗,我是卡在推理的时候

@jcxian
Copy link
Author

jcxian commented Oct 17, 2023

还没有。。

@Crestina2001
Copy link

prog_bar = tqdm(enumerate(train_data_loader))
如果是这一行卡住的话,是因为filelists/train.txt, filelists/val.txt, filelists/test.txt没配置好,导致Dataset死循环

检查Dataset.__getitem__方法,卡在while(1)里面,while(1)得到一个错误的输入直接continue...

@jcxian
Copy link
Author

jcxian commented Nov 8, 2023 via email

@Echo-jyt
Copy link

prog_bar = tqdm(enumerate(train_data_loader)) 如果是这一行卡住的话,是因为filelists/train.txt, filelists/val.txt, filelists/test.txt没配置好,导致Dataset死循环

检查Dataset.__getitem__方法,卡在while(1)里面,while(1)得到一个错误的输入直接continue...

谢谢!

@llliiiu
Copy link

llliiiu commented Nov 28, 2023

prog_bar = tqdm(enumerate(train_data_loader)) 如果是这一行卡住的话,是因为filelists/train.txt, filelists/val.txt, filelists/test.txt没配置好,导致Dataset死循环

检查Dataset.__getitem__方法,卡在while(1)里面,while(1)得到一个错误的输入直接continue...

请问这个是什么意思呢?

@llliiiu
Copy link

llliiiu commented Nov 28, 2023

prog_bar = tqdm(enumerate(train_data_loader)) 如果是这一行卡住的话,是因为filelists/train.txt, filelists/val.txt, filelists/test.txt没配置好,导致Dataset死循环

检查Dataset.__getitem__方法,卡在while(1)里面,while(1)得到一个错误的输入直接continue...

请问,txt文件应该怎么配置?不是数据集在的路径吗?

@prometheus-alien
Copy link

prog_bar = tqdm(enumerate(train_data_loader)) 如果是这一行卡住的话,是因为filelists/train.txt, filelists/val.txt, filelists/test.txt没配置好,导致Dataset死循环

检查Dataset.__getitem__方法,卡在while(1)里面,while(1)得到一个错误的输入直接continue...

谢谢,好人一生平安~

@jcxian
Copy link
Author

jcxian commented Feb 18, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@jcxian @prometheus-alien @Echo-jyt @Crestina2001 @llliiiu and others