Skip to content

DT_UKNOWN cannot be ignored and is being mishandled #10

@mabrowning

Description

@mabrowning

I have been using locar to synchronize and manipulate data on a huge vast cluster which can only be operated at full I/O speed with lots of parallel transactions. By quickly parallelizing over my deep and wide directory structure, I was able to get speedups just as you advertise...

BUT. I just discovered that locar is sometimes missing a huge 30%-90% of the files.

Turns out, it appears that the vast NFS implementation only fills in DT_TYPE in the dirents structure for the first ~10,000 entries in a directory, and locar does not correctly handle this case (it unfortunately needs to fall back to stat() with the extra syscall). See https://stackoverflow.com/a/39430337/381313 for a description of the caveats of using DT_TYPE in dirents.

$ ls directory| wc -l
13142

$ ./locar_linux_amd64 directory | wc -l
...
2024/03/01 04:02:56 Skipped record: directory/output_chunk_9997.h5 iNode<10168721505490461340>[type:unknown(0)]
2024/03/01 04:02:56 Skipped record: directory/output_chunk_9998.h5 iNode<11256784254947026052>[type:unknown(0)]
2024/03/01 04:02:56 Skipped record: directory/output_chunk_9999.h5 iNode<14804106816623306678>[type:unknown(0)]
10692

$ ./locar_linux_amd64 directory  -all | wc -l
13142

This means locar may be failing to descend into some directories, though it seems that at least in my case all these DT_UNKNOWN entries are indeed regular files, permitting use of --all as a workaround for me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions