I have been using locar to synchronize and manipulate data on a huge vast cluster which can only be operated at full I/O speed with lots of parallel transactions. By quickly parallelizing over my deep and wide directory structure, I was able to get speedups just as you advertise...
BUT. I just discovered that locar is sometimes missing a huge 30%-90% of the files.
Turns out, it appears that the vast NFS implementation only fills in DT_TYPE in the dirents structure for the first ~10,000 entries in a directory, and locar does not correctly handle this case (it unfortunately needs to fall back to stat() with the extra syscall). See https://stackoverflow.com/a/39430337/381313 for a description of the caveats of using DT_TYPE in dirents.
$ ls directory| wc -l
13142
$ ./locar_linux_amd64 directory | wc -l
...
2024/03/01 04:02:56 Skipped record: directory/output_chunk_9997.h5 iNode<10168721505490461340>[type:unknown(0)]
2024/03/01 04:02:56 Skipped record: directory/output_chunk_9998.h5 iNode<11256784254947026052>[type:unknown(0)]
2024/03/01 04:02:56 Skipped record: directory/output_chunk_9999.h5 iNode<14804106816623306678>[type:unknown(0)]
10692
$ ./locar_linux_amd64 directory -all | wc -l
13142
This means locar may be failing to descend into some directories, though it seems that at least in my case all these DT_UNKNOWN entries are indeed regular files, permitting use of --all as a workaround for me.
I have been using
locarto synchronize and manipulate data on a huge vast cluster which can only be operated at full I/O speed with lots of parallel transactions. By quickly parallelizing over my deep and wide directory structure, I was able to get speedups just as you advertise...BUT. I just discovered that
locaris sometimes missing a huge 30%-90% of the files.Turns out, it appears that the vast NFS implementation only fills in
DT_TYPEin the dirents structure for the first ~10,000 entries in a directory, andlocardoes not correctly handle this case (it unfortunately needs to fall back tostat()with the extra syscall). See https://stackoverflow.com/a/39430337/381313 for a description of the caveats of usingDT_TYPEin dirents.This means
locarmay be failing to descend into some directories, though it seems that at least in my case all theseDT_UNKNOWNentries are indeed regular files, permitting use of--allas a workaround for me.