Compacts directories by replacing duplicate files by symbolic links
clink is a simple Python script that replaces duplicate files in Unix filesystems by symbolic links.
usage: clink [options] [files or directories] Compacts folders by replacing identical files by symbolic links options: --version show program's version number and exit -h, --help show this help message and exit -d, --dry-run just reports identical files, doesn't make any change.
- Stable version:
(Jun. 14, 2006)
- Older releases:
(Sep. 6, 2005)
(Aug. 25, 2005)
Here is the OpenPGP key used to generate the signatures.
How it works
clink reads all the files one by one, and computes their SHA (20 bytes) and MD5 (16 bytes) checksums. The trick to easily find identical files is a dictionary of files lists indexed by their SHA checksum.
All the files with the same SHA checksum are not immediately considered as identical. Their MD5 checksums and sizes are also compared then. There is an extremely low probability that files meeting all these 3 criteria at once are different. You are much more likely to face file corruption because of a hardware failure on your computer!
Hard links to the same contents are treated as regular files. Keeping one instance and replacing the others by symbolic links is harmless. Files implemented by symbolic links also have the advantage of not having their contents duplicated in tar archives.
Limitations and possible improvements
- File permissions: clink just keeps one copy of duplicate files. The permissions of this file may be less strict than those of other duplicates. If permissions matter, enforce them by yourself after running clink.
- Directory structure: even when entire directories are identical, clink just creates links between files. This is not fully optimal in this case, but it keeps clink simple.