commit: f3f5e2fa94c4d0268cb9bdabfb4d19f01b1cf9dc
parent: 357f97bb73356276ffadba9f88eebb45c81584bf
author: Chris Noxz <chris@noxz.tech>
date: Fri, 22 Jul 2022 14:28:54 +0200
add new article about acst and deduplication
3 files changed, 37 insertions(+)
diff --git a/noxz.tech/articles/deduplication_of_my_nas/.assemble b/noxz.tech/articles/deduplication_of_my_nas/.assemble
@@ -0,0 +1 @@
+index.html
diff --git a/noxz.tech/articles/deduplication_of_my_nas/.buildignore b/noxz.tech/articles/deduplication_of_my_nas/.buildignore
diff --git a/noxz.tech/articles/deduplication_of_my_nas/index.www b/noxz.tech/articles/deduplication_of_my_nas/index.www
@@ -0,0 +1,36 @@
+.ds YEAR 2022
+.ds MONTH July
+.ds DAY 22
+.ds DATE \*[MONTH] \*[DAY], \*[YEAR]
+.ds TITLE Deduplication of my NAS
+.
+.HnS 0
+\*[TITLE]
+.HnE
+
+.B "\*[DATE]"
+
+After developing
+.URL //noxz.tech/software/acst acst ,
+I realized that the generated and stored checksum could be used for duplicate
+detection on my NAS. The detection could also be done fairly quickly. The
+result is, of course, dependent on checksums being created or corrected fairly
+recently as no checksums are being computed during the duplicate check.
+
+When testing this (keep in mind that my NAS is a Raspberry PI with disks
+connected over USB), I was able to detect duplicates in 100,000 files in 1.5
+seconds. That's fast enough for me.
+
+I implemented the feature into
+.ICD acst
+with the added argument
+.ICD -d ,
+using the
+.URL https://en.wikipedia.org/wiki/Merge_sort "merge sort"
+algorithm as a means for detecting duplicates.
+
+I also realized that having a feature for recursively traversing through a file
+tree can be considered an anti-feature when
+.ICD find
+could be used instead with faster results. So, I'm considering removing this
+feature in favor of a smaller code base.