Skip to main content

Topic: Pattern-Configurable Duplicate Music Finder/Remover (Read 1100 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • Marc27
  • [*]
Pattern-Configurable Duplicate Music Finder/Remover
I'm aware of applications such as Similarity that scan your media library and detect duplicated music based on different critherias such as checksums or information stored on ID3 tags. But they are limited in the scope as they don't provide the means to fine-tune the critheria or define a custom one. You can expect better results if the critheria is fine-tuned according to the categorization, naming, tagging scheme to name some used in your media library.

From another informational perspective it would be interesting to know in an as accurate as possible way how much redudancy is in your music collection. For example is you have large VA Compilations, different releases of the same album (1st masters, remasters, vinyl), CD singles & Maxi CDs, Radioshows and Live Sessions. In a more broader sense and pure informational porpuse only it would be interesting to know as well what would be the level of redundacy if you consider remixes or different versions of a same track as redundant copies (aren't they at some degree?). For example if you are a Beatles fan, and you have all the possible versions of a given album or further a single track, including bootlegs, this track would feature a higher "redundancy level" in your music collection. What would be the tracks, albums or artists with a higher redundacy level" in your media library?

  • Porcus
  • [*][*][*][*][*]
Pattern-Configurable Duplicate Music Finder/Remover
Reply #1
Duplicate finders have been a recurring topic with lots of recommendations which I hope are outdated as of 2015 when a lot of utilities utilize musical fingerprints.

This "Pattern-Configurable" was likely the topic title  closest to what I am looking for:

-> Check bit-by-bit by audio content
... and warn if one is corrupted;
If file1.MP3 and file2.MP3 are the same but one has wrong length information, then I also want to know - I have made the mistake of rewriting noncompliant mp3 headers and losing gaplessness.
If flac (Reference) decodes file7.FLAC and file8.FLAC to the same but tells me that file7.FLAC is invalid, then most likely file8.FLAC is a transcode and I might not want to destroy the evidence.

-> Check bit-by-bit for extract ("substring") or overlap?
Is audio stream A an extract (time T1 to T2) of stream B? Do streams A and B equal except A has some extra samples in the beginning and B in the end? Are these zeroes? (Offset-correction, y'know ...)
This should not be too hard given that one can reduce the number of possible matches by fingerprinting.

-> Relax bit-exactness only-so-slightly
(Is file.WAV really a decoded file.MP3? One may want to tolerate roundoff error, but nothing more than that ... and possibly some samples beginning or ending due to the gaplessness issue, right?)
  • Last Edit: 05 January, 2015, 06:23:12 PM by Porcus