fdupdirs - Search and report duplicate files and directories
Searches and reports duplicate files and directories:
- two files are identicals if the MD5 hash is the same (in id_files.txt)
- reports two dirs as identical if they have the same set of hash files (ignore filenames, empty files and dirs) (in id_dirs.txt)
- if the two diretories are realy equal (same filenames, empty files ans dirs), it reports it with "EXACT" flag
- try to find "similar" dirs: dir A is a subset of dir B, or A inter B is big
- reports are sorted by the saved space size (reverse order)
- do not consider or follow symlinks, or block/char files
- do not report hardlinked files as duplicate
- do not make a byte to byte comparaison (useless...)
It tries to optimise the hash computation :
- on a first pass, it compute only hash of first and last 4k data of larges files
- sort files by inode number before hashing
Also:
- report hard links in hl.txt
- report directories with a lot of files in lof_dirs.txt
- reports different files with same begining in sim_files.txt (useful for mp3 or other container with tags)
Known bugs :
- may use a lot of memory...
- EXACT dirs may be badly reported if there is more than two identical dirs
Download (ver 0.3).
(Olds: V0.2, V0.1.)
Changelog:
- V0.3: 21/07/11
- interactive mode
- use less memory
- V0.2: 15/07/11
- new algo for sim_dir (quicker)
- use less memory
- V0.1: 10/07/11
If you have any question, comment, idea, criticsm, please email me at firstname dot lastname at laposte dot net, where (firstname,lastname)=(michael,rao).
Main page