Deduplication of my NAS

Chris Noxz

July 22, 2022

After developing acst, I realized that the generated and stored checksum could be used for duplicate detection on my NAS. The detection could also be done fairly quickly. The result is, of course, dependent on checksums being created or corrected fairly recently as no checksums are being computed during the duplicate check.

When testing this (keep in mind that my NAS is a Raspberry PI with disks connected over USB), I was able to detect duplicates in 100,000 files in 1.5 seconds. That’s fast enough for me.

I implemented the feature into acst with the added argument -d, using the merge sort algorithm as a means for detecting duplicates.

I also realized that having a feature for recursively traversing through a file tree can be considered an anti-feature when find could be used instead with faster results. So, I’m considering removing this feature in favor of a smaller code base.