Skip to content

Commit 08f0a7c

Browse files
committed
added support for multiple reference patterns/directories
1 parent b264f42 commit 08f0a7c

9 files changed

+998
-2
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ bld/
3030
[Oo]bj/
3131
[Ll]og/
3232
[Ll]ogs/
33+
_make/
3334

3435
# Visual Studio 2015/2017 cache/options directory
3536
.vs/

LICENSE

+674
Large diffs are not rendered by default.

LICENSE-3RD-PARTY

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
-----------------------------------------------------------------------------
2+
The License
3+
applies to:
4+
- original version, Copyright (c) 2006 Matthias Wandel
5+
-----------------------------------------------------------------------------
6+
7+
Finddupe is totally free. Do whatever you like with it. You can integrate
8+
it into GPL or BSD style licensed programs if you would like to.

README.md

+57-1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,58 @@
11
# finddupe
2-
Enhanced version of finddupe, a duplicate file detector for Windows
2+
3+
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
4+
5+
Enhanced version of [finddupe](https://www.sentex.ca/~mwandel/finddupe/), a duplicate file detector and eliminator for Windows, originally by [Matthias Wandel](https://github.com/Matthias-Wandel).
6+
7+
## Reasons
8+
I really like finddupe when I look for duplicate files. It is fast and clever. The match candidates are clustered according to the signature of the first 32k, then checked byte for byte. It can also create and find NTFS hard links. Creating hard links saves you disk space. Listing all existing hard links is very difficult otherwise.
9+
10+
Please refer to [Matthias' site](https://www.sentex.ca/~mwandel/finddupe/) for full description. My favourites are
11+
`finddupe -bat d:\ImageLibray\Hardlinks_to_be_created.bat -ref d:\ImageLibray\originals1\** -ref d:\ImageLibray\originals2\** d:\ImageLibray\**\*.jpg` to remove duplicates in an image collection and `finddupe -listlink d:\ImageLibray` to list them.
12+
13+
However, Matthias' current version 1.23 is not supporting my requirements.
14+
15+
## Enhancements
16+
I enhanced finddupe to support my requirements of having multiple reference directories that shall not be touched. It works for me, but some more testing is desirable.
17+
18+
I used Visual Studio 2010 for building.
19+
20+
## Usage
21+
```
22+
finddupe v1.24 compiled May 28 2017
23+
Usage: finddupe [options] [-ref] <filepat> [filepat]...
24+
Options:
25+
-bat <file.bat> Create batch file with commands to do the hard
26+
linking. run batch file afterwards to do it
27+
-hardlink Create hardlinks. Works on NTFS file systems only.
28+
Use with caution!
29+
-del Delete duplicate files
30+
-v Verbose
31+
-sigs Show signatures calculated based on first 32k for each file
32+
-rdonly Apply to readonly files also (as opposed to skipping them)
33+
-ref <filepat> Following file pattern are files that are for reference, NOT
34+
to be eliminated, only used to check duplicates against
35+
-z Do not skip zero length files (zero length files are ignored
36+
by default)
37+
-u Do not print a warning for files that cannot be read
38+
-p Hide progress indicator (useful when redirecting to a file)
39+
-j Follow NTFS junctions and reparse points (off by default)
40+
-listlink hardlink list mode. Not valid with -del, -bat, -hardlink,
41+
or -rdonly, options
42+
filepat Pattern for files. Examples:
43+
c:\** Match everything on drive C
44+
c:\**\*.jpg Match only .jpg files on drive C
45+
**\foo\** Match any path with component foo
46+
from current directory down
47+
```
48+
49+
## Authors
50+
51+
- originator: [Matthias Wandel](https://www.sentex.ca/~mwandel/finddupe/)
52+
- additional features: [thomas694](https://github.com/thomas694/finddupe)
53+
54+
## License <a rel="license" href="https://www.gnu.org/licenses/gpl-3.0"><img alt="GNU GPLv3 license" style="border-width:0" src="https://img.shields.io/badge/License-GPLv3-blue.svg" /></a>
55+
56+
<span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">finddupe</span> by thomas694
57+
is licensed under a <a rel="license" href="https://www.gnu.org/licenses/gpl-3.0">GNU GPLv3 license</a>.
58+
Based on a work at <a xmlns:dct="http://purl.org/dc/terms/" href="https://www.sentex.ca/~mwandel/finddupe/" rel="dct:source">https://www.sentex.ca/~mwandel/finddupe/</a>.

finddupe.c

+73-1
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,34 @@
11
//--------------------------------------------------------------------------
2+
// finddupe - duplicate file detector and eliminator
3+
//
24
// Find duplicate files and hard link, delete, or write batch files to do the same.
35
// Also includes a separate option to scan for and enumerate hardlinks in the search space.
46
//
57
// Version 1.23
68
//
79
// Matthias Wandel Oct 2006 - Aug 2010
10+
//
11+
// Version 1.24
12+
// Copyright (C) May 2017 thomas694 (@GH 0CFD61744DA1A21C)
13+
// added support for multiple ref patterns
14+
//
15+
// This program is free software: you can redistribute it and/or modify
16+
// it under the terms of the GNU General Public License as published by
17+
// the Free Software Foundation, either version 3 of the License, or
18+
// (at your option) any later version.
19+
//
20+
// This program is distributed in the hope that it will be useful,
21+
// but WITHOUT ANY WARRANTY; without even the implied warranty of
22+
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
23+
// GNU General Public License for more details.
24+
//
25+
// You should have received a copy of the GNU General Public License
26+
// along with this program. If not, see <http://www.gnu.org/licenses/>.
827
//--------------------------------------------------------------------------
928

10-
#define VERSION "1.23"
29+
#define VERSION "1.24"
30+
31+
#define REF_CODE
1132

1233
#include <stdio.h>
1334
#include <stdlib.h>
@@ -54,6 +75,12 @@ static FileData_t * FileData;
5475
static int NumAllocated;
5576
static int NumUnique;
5677

78+
#ifdef REF_CODE
79+
char* * PathData;
80+
int PathAllocated;
81+
int PathUnique;
82+
#endif
83+
5784
// Duplicate statistics summary
5885
struct {
5986
int TotalFiles;
@@ -300,6 +327,37 @@ static int EliminateDuplicate(FileData_t ThisFile, FileData_t DupeOf)
300327
return 2;
301328
}
302329

330+
#ifdef REF_CODE
331+
static int IsNonRefPath(char * filename)
332+
{
333+
int i;
334+
char * cmpPath;
335+
336+
i = strlen(filename)-1;
337+
for (i; i >= 0; i--)
338+
{
339+
if ((int)filename[i] == (int)'\\') break;
340+
}
341+
342+
if (i == 0)
343+
{
344+
fprintf(stderr, "IsNonRefPath, path without any slash!?");
345+
exit(EXIT_FAILURE);
346+
}
347+
348+
cmpPath = malloc(sizeof(char) * (i+2));
349+
strncpy(cmpPath, filename, i+1);
350+
cmpPath[i+1] = '\0';
351+
352+
for (i = 0; i < PathUnique; i++)
353+
{
354+
if (strcmp(cmpPath, PathData[i]) == 0) return 0;
355+
}
356+
357+
return 1;
358+
}
359+
#endif
360+
303361
//--------------------------------------------------------------------------
304362
// Check for duplicates.
305363
//--------------------------------------------------------------------------
@@ -320,7 +378,11 @@ static void CheckDuplicate(FileData_t ThisFile)
320378
comp = memcmp(&ThisFile.Checksum, &FileData[Ptr].Checksum, sizeof(Checksum_t));
321379
if (comp == 0){
322380
// Check for true duplicate.
381+
#ifdef REF_CODE
382+
if (!ReferenceFiles && !HardlinkSearchMode && IsNonRefPath(ThisFile.FileName)){
383+
#else
323384
if (!ReferenceFiles && !HardlinkSearchMode){
385+
#endif
324386
int r = EliminateDuplicate(ThisFile, FileData[Ptr]);
325387
if (r){
326388
if (r == 2) FileData[Ptr].NumLinks += 1; // Update link count.
@@ -665,6 +727,16 @@ int main (int argc, char **argv)
665727
exit(EXIT_FAILURE);
666728
}
667729

730+
#ifdef REF_CODE
731+
PathUnique = 0;
732+
PathAllocated = 64;
733+
PathData = malloc(sizeof(char*)*PathAllocated);
734+
if (PathData == NULL){
735+
fprintf(stderr, "Malloc failure");
736+
exit(EXIT_FAILURE);
737+
}
738+
#endif
739+
668740
if (BatchFileName){
669741
BatchFile = fopen(BatchFileName, "w");
670742
if (BatchFile == NULL){

finddupe.sln

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
2+
Microsoft Visual Studio Solution File, Format Version 11.00
3+
# Visual Studio 2010
4+
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "finddupe", "finddupe.vcxproj", "{5ACEBB24-9088-4D2B-9B89-0AABE4250D99}"
5+
EndProject
6+
Global
7+
GlobalSection(SolutionConfigurationPlatforms) = preSolution
8+
Debug|Win32 = Debug|Win32
9+
Release|Win32 = Release|Win32
10+
EndGlobalSection
11+
GlobalSection(ProjectConfigurationPlatforms) = postSolution
12+
{5ACEBB24-9088-4D2B-9B89-0AABE4250D99}.Debug|Win32.ActiveCfg = Debug|Win32
13+
{5ACEBB24-9088-4D2B-9B89-0AABE4250D99}.Debug|Win32.Build.0 = Debug|Win32
14+
{5ACEBB24-9088-4D2B-9B89-0AABE4250D99}.Release|Win32.ActiveCfg = Release|Win32
15+
{5ACEBB24-9088-4D2B-9B89-0AABE4250D99}.Release|Win32.Build.0 = Release|Win32
16+
EndGlobalSection
17+
GlobalSection(SolutionProperties) = preSolution
18+
HideSolutionNode = FALSE
19+
EndGlobalSection
20+
EndGlobal

finddupe.vcxproj

+83
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
<?xml version="1.0" encoding="utf-8"?>
2+
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
3+
<ItemGroup Label="ProjectConfigurations">
4+
<ProjectConfiguration Include="Debug|Win32">
5+
<Configuration>Debug</Configuration>
6+
<Platform>Win32</Platform>
7+
</ProjectConfiguration>
8+
<ProjectConfiguration Include="Release|Win32">
9+
<Configuration>Release</Configuration>
10+
<Platform>Win32</Platform>
11+
</ProjectConfiguration>
12+
</ItemGroup>
13+
<PropertyGroup Label="Globals">
14+
<ProjectGuid>{5ACEBB24-9088-4D2B-9B89-0AABE4250D99}</ProjectGuid>
15+
<Keyword>Win32Proj</Keyword>
16+
<RootNamespace>finddupe</RootNamespace>
17+
</PropertyGroup>
18+
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
19+
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
20+
<ConfigurationType>Application</ConfigurationType>
21+
<UseDebugLibraries>true</UseDebugLibraries>
22+
<CharacterSet>NotSet</CharacterSet>
23+
</PropertyGroup>
24+
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
25+
<ConfigurationType>Application</ConfigurationType>
26+
<UseDebugLibraries>false</UseDebugLibraries>
27+
<WholeProgramOptimization>true</WholeProgramOptimization>
28+
<CharacterSet>Unicode</CharacterSet>
29+
</PropertyGroup>
30+
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
31+
<ImportGroup Label="ExtensionSettings">
32+
</ImportGroup>
33+
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
34+
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
35+
</ImportGroup>
36+
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
37+
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
38+
</ImportGroup>
39+
<PropertyGroup Label="UserMacros" />
40+
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
41+
<LinkIncremental>true</LinkIncremental>
42+
</PropertyGroup>
43+
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
44+
<LinkIncremental>false</LinkIncremental>
45+
</PropertyGroup>
46+
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
47+
<ClCompile>
48+
<PrecompiledHeader>
49+
</PrecompiledHeader>
50+
<WarningLevel>Level3</WarningLevel>
51+
<Optimization>Disabled</Optimization>
52+
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
53+
</ClCompile>
54+
<Link>
55+
<SubSystem>Console</SubSystem>
56+
<GenerateDebugInformation>true</GenerateDebugInformation>
57+
</Link>
58+
</ItemDefinitionGroup>
59+
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
60+
<ClCompile>
61+
<WarningLevel>Level3</WarningLevel>
62+
<PrecompiledHeader>
63+
</PrecompiledHeader>
64+
<Optimization>MaxSpeed</Optimization>
65+
<FunctionLevelLinking>true</FunctionLevelLinking>
66+
<IntrinsicFunctions>true</IntrinsicFunctions>
67+
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
68+
</ClCompile>
69+
<Link>
70+
<SubSystem>Console</SubSystem>
71+
<GenerateDebugInformation>true</GenerateDebugInformation>
72+
<EnableCOMDATFolding>true</EnableCOMDATFolding>
73+
<OptimizeReferences>true</OptimizeReferences>
74+
</Link>
75+
</ItemDefinitionGroup>
76+
<ItemGroup>
77+
<ClCompile Include="finddupe.c" />
78+
<ClCompile Include="myglob.c" />
79+
</ItemGroup>
80+
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
81+
<ImportGroup Label="ExtensionTargets">
82+
</ImportGroup>
83+
</Project>

finddupe.vcxproj.filters

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
<?xml version="1.0" encoding="utf-8"?>
2+
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
3+
<ItemGroup>
4+
<Filter Include="Source Files">
5+
<UniqueIdentifier>{4FC737F1-C7A5-4376-A066-2A32D752A2FF}</UniqueIdentifier>
6+
<Extensions>cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx</Extensions>
7+
</Filter>
8+
<Filter Include="Header Files">
9+
<UniqueIdentifier>{93995380-89BD-4b04-88EB-625FBE52EBFB}</UniqueIdentifier>
10+
<Extensions>h;hpp;hxx;hm;inl;inc;xsd</Extensions>
11+
</Filter>
12+
<Filter Include="Resource Files">
13+
<UniqueIdentifier>{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}</UniqueIdentifier>
14+
<Extensions>rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav;mfcribbon-ms</Extensions>
15+
</Filter>
16+
</ItemGroup>
17+
<ItemGroup>
18+
<ClCompile Include="myglob.c">
19+
<Filter>Source Files</Filter>
20+
</ClCompile>
21+
<ClCompile Include="finddupe.c">
22+
<Filter>Source Files</Filter>
23+
</ClCompile>
24+
</ItemGroup>
25+
</Project>

0 commit comments

Comments
 (0)