Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--diff options shows files "Only in dir1" twice, once correctly and again with no dir path and last character removed. #42

Open
davidi17 opened this issue Jul 20, 2021 · 2 comments

Comments

@davidi17
Copy link

If you use the --diff option to compare directories, it outputs the filenames "Only in dir1" twice, once correctly and then again but without the directory path and the filename has its last character missing.

Example
On Windows 10 using Python 3.9.1.

dir1
|   file101.txt
|   file102.txt
|   file103.txt
|
+---sub1
|       file004.txt
|       file005.txt
|
\---sub2
        file-noextension
        file006.txt
        file007.txt
dir2
|   file010.txt
|   file011.txt
|   file103.txt
|
+---sub1
|       file004.txt
|       file005.txt
|
\---sub3
        file020.txt
dirsync dir1 dir2 --diff
Difference of directory dir2 from dir1
Only in dir1
**WRONG>> file-noextensio
**WRONG>> file004.tx
**WRONG>> file005.tx
**WRONG>> file006.tx
**WRONG>> file007.tx
>> file101.txt
>> file102.txt
>> sub2
>> sub2\file-noextension
>> sub2\file006.txt
>> sub2\file007.txt

Only in dir2
<< file010.txt
<< file011.txt
<< sub3
<< sub3\file020.txt

Common to dir1 and dir2
-- file103.txt
-- sub1
-- sub1\file004.txt
-- sub1\file005.txt
dirsync finished in 0.00 seconds.
4 directories parsed, 0 files copied

Where the problem is:

In dirsync 2.2.5, syncer.py, lines 159-165.

159 if add_path:
160     left.add(path)
161     anc_dirs = re_path[:-1].split('/')
162     anc_dirs_path = ''
163     for ad in anc_dirs[1:]:
164         anc_dirs_path = os.path.join(anc_dirs_path, ad)
165         left.add(anc_dirs_path)

Line 161: anc_dirs = re_path[:-1].split('/')
strips off the last character of the filename file004.txt -> file004.tx
Perhaps it should be just re_path.split('/')

And the loop lines 163-165 with left.add(anc_dirs_path) adds the unnecessary files to the list left (all with their last character missing).

I really can't see what this loop with anc_dirs_path does. If lines 161-165 are simply removed, the program would seem to do the right thing. Maybe I've missed some important use case, but it's certainly wrong for the example above.

@j-archit
Copy link

j-archit commented Nov 13, 2021

Maybe try making the changes and running it through the tests?
Assuming the tests are written with edge cases in mind this should catch the issue?
I'm seeing the same behaviour btw and it's very annoying when I try it on directories with lots of files and frequent changes.

Here's the permalink to the lines mentioned above: L161-165

Update:
On Win10 Python 3.8.10
So I was trying to pass the source and destination paths via the command line python invocation as follows (sync.py is my script):

python .\sync.py H:\ "E:\Data All\"

Doing this gives me this error:

[WinError 123] The filename, directory name, or volume label syntax is incorrect: 'E:\\Data All"'

Apparently this was due to the spaces in the directory name, so maybe the [:-1] slicing is to remove trailing spaces or trailing slashes? @tkhyn, can you please explain bc if that's the case then it can easily be done via either re.replace() or str.strip() / lstrip() / rstrip()

@j-archit
Copy link

j-archit commented Nov 13, 2021

Update 2:
@tkhyn Correct me if I'm wrong but after going through the compare function I understand that you're building two sets of files+directories in the source and target and then taking the intersection for commons, left is then for the files/dirs only in source and right for target.

However other than the [:-1] slicing, I do no understand the use of removing the root directory in the for loop here by using anc_dirs[1:]?

for ad in anc_dirs[1:]:
    anc_dirs_path = os.path.join(anc_dirs_path, ad)
    left.add(anc_dirs_path)

@davidi17 This loop however makes absolutely perfect sense as you might want to create the directory tree in case it doesn't exist on the target side.
The loop basically stores each of ancestor directories to the set as it might be non existent on any level.

For example:
source side H:\foo\bar\baz\target.xyz and target side only H:\foo\ exists, in this case both the H:\foo\bar\ and H:\foo\bar\baz\ directory will need to be created as you cannot directly make the directory H:\foo\bar\baz\ without making H:\foo\bar\ first.

However, I do believe this can be done in a better way than this. Not exaclty sure how though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants