Skip to main content

Topic: Audio (fingerprinting) duplicate finders - do they really all suck big time? (Read 1688 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • Porcus
  • [*][*][*][*][*]
Audio (fingerprinting) duplicate finders - do they really all suck big time?
I have tried a few free and trialwares. Some of them I need to re-review because I do not remember why I uninstalled each. Currently I have the following five installed. None of them can identify redundant folders/directories, but Duplicate Cleaner promises that the paid version has such a feature.

* DupeGuru Music Edition from hardcoded.net:
I use this because it quite efficiently scans for audio-only parts, and appears fairly user friendly. It works with my Windows 7, although Windows is not supported anymore - YMMV.
This one does not do "fingerprinting", it only has ambition of checking for identical audio.
Sucks because:
Fails to identify different files with precisely the same audio, because it does not decode. The developer knows e.g. the FLAC format well enough to isolate the audio section from the tags section, but does then evidently just make a hash of the audio part of the file, not the decoded audio.  So two FLAC files with different encoding parameters - as well as FLAC and WAV - of the same audio, are not matched.
(Since it knows tags, then why not match by MD5 sum? I can use a separate utility to verify that the files are not broken ...)

* PerfectTunes from Illustrate (Spoon, the creator of AccurateRip and dBpoweramp, co-developer of fb2k Mobile etc):
Does fingerprinting. But:
Seems to be no way to score similarity, nor highlight the ones that are bit-identical

* Similarity, http://www.similarityapp.com/
Does fingerprinting. Scores in terms of similarity. Has a "quality" meter to help you choose which ones to keep, if you are too lazy for listening; in the least, the "max frequency" figure can help identify upconverted transcodes.
Sucks because:
Does not distinguish out bit-identical files. Gives a full 100.0 % similarity score even to files which are not identical.
Even instrumental vs. vocal versions get nearly full similarity score.
Its quality meter has a clipping indicator which is doubtful to say the least. (You just apply gain to the file, and it is instantly fooled.)
Does not know WavPack.
Does not know ID3v2.4

* Mediamonkey with some script for the purpose (where do I find its name?)
Only picks a selected portion of the file and makes a hash.

* Duplicate Cleaner Free. Too few features in the free version (cannot even identify by audio content), so I can just use Ccleaner instead - but the paid version allegedly scans for duplicate folders and audio content.


I guess others have experience to expand the list?

  • spoon
  • [*][*][*][*][*]
  • Administrator
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #1
RE PectectTUNES, Matches are shown as either:

Certain Matches  where the audio is 100% identical between the two files,

Possible Matches   the audio is similar between the files,

  • Porcus
  • [*][*][*][*][*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #2
My apologies for the inaccuracy, and thanks for the correction.

  • Moni
  • [*][*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #3
I have acquired enough music over the years, and fragmented my collection over numerous drives, that this is a concern for me. I use Roon and it has certainly grabbed a good number of them but a purpose-built tool is very worthwhile.

  • sanskrit44
  • [*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #4
i wish i could contribute, but i just stumbled over this thread while searching for a free linux alternative as well. unfortunately it seems there is no tool avaiable that does fingerprinting at all?

  • Porcus
  • [*][*][*][*][*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #5
Why I do not use PerfectTunes for deduplication (although I have used it for AccurateRip verification).  These four are supposed matches:
* https://www.youtube.com/watch?v=LcLc-8Ay_Ys
* Track 1 from https://yithmetal.bandcamp.com/album/demo-3
* Track 5 from the same
* https://www.youtube.com/watch?v=xVANJYZlbv4
And I cannot tweak the selectivity (have not checked the paid version).

Its "Certain" functionality (thanks again, Spoon) remedies some of DupeGuruME's stupidness, but I will still use DupeGuruME for speed, as it does not do fingerprinting.
  • Last Edit: 16 May, 2017, 04:05:32 PM by Porcus

  • Porcus
  • [*][*][*][*][*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #6
i wish i could contribute, but i just stumbled over this thread while searching for a free linux alternative as well. unfortunately it seems there is no tool avaiable that does fingerprinting at all?

Sure there are tools that do fingerprinting (like Picard), but are there deduplication utilities?
You could try Similarity with Wine or the OSX version with https://www.darlinghq.org/ ?

  • spoon
  • [*][*][*][*][*]
  • Administrator
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #7
Why I do not use PerfectTunes for deduplication (although I have used it for AccurateRip verification).  These four are supposed matches:
* https://www.youtube.com/watch?v=LcLc-8Ay_Ys
* Track 1 from https://yithmetal.bandcamp.com/album/demo-3
* Track 5 from the same
* https://www.youtube.com/watch?v=xVANJYZlbv4
And I cannot tweak the selectivity (have not checked the paid version).

Its "Certain" functionality (thanks again, Spoon) remedies some of DupeGuruME's stupidness, but I will still use DupeGuruME for speed, as it does not do fingerprinting.

The issue IMHO of allowing a slider of match accuracy is that for anyone with more than a screen full of matches, moving the slider would have unknown effects on other matches. PerfectTUNES allows the quick 'hiding' of matches, which should be used in your instance for those tracks.

  • Porcus
  • [*][*][*][*][*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #8
The issue IMHO of allowing a slider of match accuracy is that for anyone with more than a screen full of matches, moving the slider would have unknown effects on other matches. PerfectTUNES allows the quick 'hiding' of matches, which should be used in your instance for those tracks.
Is there a way to quickly 'hide' the most obvious 1200 out of 1500 matches? That 'slider' could easily have done so, couldn't it?

  • Porcus
  • [*][*][*][*][*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #9
From https://www.lifewire.com/eliminate-duplicate-songs-with-these-free-tools-2438770 , I have already mentioned Duplicate Cleaner Free (#02 on their list) and Similarity (#03, which I actually use).
#01, AllDup, can seemingly compare audio-only, but not in the free version.  The two others cannot, it seems.


One that fails to work altogether on my computer (gives error message): https://software.amato.com.br/content/mp3-duplicate-finder . Is supposed to use fingerprints, but only .mp3 and .ogg.
  • Last Edit: 17 May, 2017, 02:00:27 PM by Porcus

  • Porcus
  • [*][*][*][*][*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #10
AudioDedupe from Mindgems shows some promise. Unlike Similarity, which offers a freemium model, this is trialware-nagware where you have to press OK buttons all the time.

Notes:

- Bad: does not distinguish out bit-identical files. Gives full "100%" score to files with differing audio content (tried before and after a CUETools repair).
- Good: Seems to be able to distinguish better than Similarity on mixes with and without vocals (same instruments, one instrumental version), as well as different-language versions (same singer).
- Good: does not give "100%" similarity to FLAC and an MP3 generated from it, it seems - "only" somewhat above 99%
- Which brings me to an unnecessary nuissance: I can set a minimum similarity threshold in 10ths of percents, e.g. 99.7 - but I cannot display it. If it is > 99%, I must bump up the threshold til it disappears.

And then something I forgot about Similarity: no good at checking for different offsets. Need to test PerfectTunes on that too. Stay untuned.

  • Porcus
  • [*][*][*][*][*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #11
Audio Comparer. (http://audiocomparer.com/) certainly should be able to get the dust up from your carpet (and spit it in your face, I guess).
30-days trialware.  Has both a choice beween exact and similar, and can in the preferences set a similarity threshold.

Sucks because of major user-unfriendliness, but let me first consider how it does its thing (when I can get it to do its thing):
-> "Exact" identifies a lossless file with an and MP3 I made of the same song.  That's not "exact".  On the other hand, it does distinguish different masterings of the same recordings, so it isn't completely stupid.
-> Ditches some tracks without notifying me.  (Could be length ... though somewhat more than a minute should be enough?)

And then a long list:
-> The results window can be sorted by a few columns, but not by similarity score - and I cannot customize the columns (and there is none for codec or filetype)
-> Sucks at reporting metadata and the like.  Only reports metadata for a few of the files in the list, leaving the rest blank ("Reload tags" does not help), and routinely reports bitrates at around 1380 for both FLAC and MP3 files. 
-> No drag + drop, asks me to navigate to each folder (PerfectTunes should have a whipping for this too) - but fortunately, I can type the folder names.
-> I have to select a file (possibly type in a file name) for the group and results.  Couldn't it instead offer me an option to save to file the few times it tells me something useful?  (It isn't that stupid when I enter the wizard at startup though.)
-> I can add a single folder to a group, and then it starts scanning.  I must wait for it to finish before I can add another.  (Trying a wildcard hack ... it takes the entire drive?)
-> ... and if I then by mistake add a folder twice, they are reported as duplicate identical files.  Dare not even try asking to delete one of them!
-> I can remove folders from groups, and I can mark several by shift + arrows, but not shift + pagedown.  If I have many to remove ... yawn.
-> Exits fullscreen when I hit the Preferences shortcut.
-> Bugware!  I cannot add a new group after one round of comparisons - then it throws an error message of a missing file (in its own folder - one they forgot to include in the install then?).  Not only that, it requests me to submit the error via email, and even if I uncheck that, it asks me to type in a description on how to reproduce it.  Some times it is kind enough to display a Restart button - and hitting that, it still pops up the description dialogue.
This error sometimes occurs on the first round too (freshly started program).

It appears as being reasonably fast once you have typed all the BS and until it is done comparing (and possibly displays the results, possibly crashes).
  • Last Edit: 25 May, 2017, 01:12:31 PM by Porcus

  • Porcus
  • [*][*][*][*][*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #12
Music Duplicate Remover http://www.maniactools.com/soft/music-duplicate-remover/ from ManiacTools - the creators of that other "MP3 tag" application (the curse of generic naming). It can find duplicates by tags or by audio fingerprinting or by averaging those two (no customizable weighting though).  The following is based on testing the audio fingerprinting only.

It could be an alternative for some users.  It is extremely selective, and would maybe be useful for those who want to detect only near-identical audio where e.g. Similarity would throw lots of ninety-nines and above at you.  I did not encounter a 100 % that was not bit-exact. No false positives (in the sense of identifying two different pieces of music) even at the lowest possible selectivity ("1" of "100"). Though, I did not try too many folders. 
But it is probably too selective.  I have Fates Warning: "Still Life" in American and European editions, with somewhat different CDTOC, and even at threshold down to the minimum "1", only two of nineteen track-pairs were identified as possible duplicates - and that was not even the ones which according to fb2k were are identical after applying offset and truncating the longer.  (The offset is over half a second, I should add.)

So it is really too picky about quite a lot: offset, dynamic range in different masterings - and possibly also volume, though I doubt this is much of an issue: I tried to adjust an mp3 by -9 dB, and it came up with a nice  score still in the high nineties, but at -10.5 dB or -12 dB: no identification even at lowest threshold.


Other issues:

Fairly slow.  Windows labels it as non-responsive when scanning a single long track, meaning that even navigating in Windows Explorer (minimize windows etc.) would prompt the question whether to kill it.  Best left overnight on a medium-size collection, best left untouched with a large.  Seems to be even worse on lossless files (lots of data to process I guess).  And rescanning is no quicker, apparently it does not cache.
And, its window often pops up to foreground even when it is in the middle of scanning (which it would be for most of the time eh?).  I have no need for it until it is done, have I?

Results window unsortable.

And I must navigate directory trees by mouse.  Arguably less of an issue if you have a smaller collection.

Finally, it does not know WavPack.
  • Last Edit: 26 May, 2017, 05:07:53 PM by Porcus

  • Porcus
  • [*][*][*][*][*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #13
Phelix from the now-defunct Phonome Labs.  An ancient Java app (files dated 2007) - I was able to track it down again from http://www.brothersoft.com/phelix-57168.html
Boasts of managing both offset and volume differences.  Reports matches by Data, Audio and Tags.  Seems that "Data" means bit-exact audio part.

Sucks because:
Handles only MP3 and OGG. Not even WAVE.
Ten years old, unsupported, vendor gone, no way of buying the full version
Not the most user-friendly UI.
Crashes quite a bit.




And a few more to tick off: I have not tried the following, as they appear not to have any way to compare audio:
Not tried, as they do not appear to have any way to compare audio:
iTunes, certainly.
http://www.duplicatedetective.com/
http://noclone.net/duplicate_mp3.aspx
http://www.joerg-rosenthal.com/en/antitwin/ (claims to be able to find similar pictures, does not claim to do the same for audio)
https://www.auslogics.com/en/software/duplicate-file-finder/
http://www.ashisoft.com/
http://www.hothotsoftware.com/remove_and_delete_duplicate_mp3_files_software/
http://www.efsoftware.com/d3/e.htm
http://www.essentialdatatools.com/products/duplicatefileremover/
http://manyprog.com/duplicate-music-remover.php
http://orderprog.com/remove-music-duplicates.php
http://www.duplicatechecker.com/
http://urbantwilight.com/index.php?id=products&fto=duperazor
http://duplicatefilefinder4pc.com/duplicate-mp3-finder-plus.htm and the seemingly identical http://xixisoftware.com/p/duplicate-mp3-finder.htm
Abee, allegedly hard-to-uninstall junkware
These defunct vendors: http://download.cnet.com/windows/remove-duplicate-mp3/3260-20_4-10046855-1.html http://download.cnet.com/Remove-Duplicate-Songs/3000-2141_4-10913768.html

And these MacOS based ones.  I do not use appleware, but from the description it is clear that they do not use audio fingerprinting: https://www.cisdem.com/resource/best-duplicate-mp3-finder-or-cleaner-for-mac.html

Two more for other reasons: TuneUp, http://www.tuneupmedia.com:8080/download/windows , requires Windows Media Player (I have managed to click cancel on every occasion for years now) or iTunes (no way!).  And, no use testing http://www.find-duplicates.com/Duplicate_MP3_Finder.html as it seems to just be AudioDedupe.

  • Porcus
  • [*][*][*][*][*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #14
Music Duplicate Remover http://www.maniactools.com/soft/music-duplicate-remover/ from ManiacTools
[...]
It could be an alternative for some users.  It is extremely selective

... and ignores the user-defined threshold for "Compare by sound".  Displays the same hits and checks the same hits for removal no matter what. (At least the free version.)

  • Porcus
  • [*][*][*][*][*]
Re: Audio (fingerprinting) duplicate finders - do they really all suck big time?
Reply #15
Some testing to follow.  I cannot test too many files at the time, due to limitations in the free versions (and Audio Comparer crashing on me). 
Tests: Different masterings, different pressings with different differences.  Created an "artificial" set of different pressings by introducing a one-sample offset too.

In very rough order of what I think would be the more significant differences:

(I) A selection of one-sample-offset tracks, offset shift by CUETools - and one track volume-shifted
My shortest 1-track single (Peligro: Purple Haze 1:59), my longest one-track CD (Devil Doll: Sacrilege of Fatal Arms 'fan club' re-release, 79:01), ars moriendi: in memoriam (electronic music from bandcamp was my first idea at getting something where the affected single sample is silent, and yes it is), Altar of Plagues: Tides 2-tracks EP from Bandcamp (2 tracks, the "borderline" samples not silent: one sample off yields one sample mismatching, and it is the last of track 1 and the first of track 2); and one single track  (Mansion: Sorrowless) off Bandcamp, chosen because the length is not divisible by 588 and CUETools would pad to CD frame boundary.  Sorrowless was also shifted back again (the result being at same offset as original but the processed file being padded), and I also made a 320kb/s mp3 of it and shifted its volume a few times: -1.5 dB to -12 dB.

(II) Soundgarden: Badmotorfinger ("fairly close" duplicates)
The 1CD version vs the 2CD version: all 12 tracks identical when fb2k has shifted offset and crops, but 6 of them have to be cropped both before and after offset, and only the other 6 are then reported as identical after doing so. 

(III) Soundgarden: Let me do Superunknown as well (this is obviously no random draw of albums).  American (15-track) vs European (16-track) version, same "mastering" in the sense that it is not any subsequent "remaster" with EQ and all applied, but all tracks are different according to fb2k, probably they have been run through dithering both here and there ... ?

(IV) Fates Warning: Still Life, 2CD.  Track lengths 1:10 to 20:54.  Different CDTOC: one of the editions cut 48 seconds wrong between Pleasant Shade ... pts 7 and 8 (tracks 1-07 and 1-08).  On the other hand, two track-pairs (1-04 and 1-10) out of 19 are reported as identical by fb2k after applying offset (of more than half a second) and truncating.

(V) "Different concurrent masters", one of full dynamic range and the others squashed at the record company's order.
Witherscape: "The Inheritance", starring Dan Swanö.  More here: https://hydrogenaud.io/index.php/topic,98199.msg866222.html#msg866222
Dan Swanö included the as MP3 bonus material the masterings for iTunes download sales (@192, lowest dynamic range), and - more interesting - the un-DR-compressed mix for vinyl release (@320).  The label, Century Media, didn't care for what went on vinyl, and so they did not request a dynamically ruined loudness war-victim for that purpose.  So these are not "remasters" etc. beefed up by some later engineer's taste. 
Disregarding the title track, a short piano piece that is fairly dynamic in all mixes, the foo_dynamic_range measures the following of the ten other tracks: 
The most compressed version is the one intended for download.  A DR of 5 dB, though the cover of Judas Priest's "Last Rose of Summer" has a couple more.  All tracks peak-normalized.
Add one dB for the CD version, where the tracks peak between -0.3 and -0.59 dB.
The mix for vinyl: all tracks have a DR of 11 dB.  Peaks nearly as the CD version.


Applications were set to allow for duration differences, if there were such an options.l The Similarity app reports two scores, "% content" and "% precise" (the latter they boast of, but not more than that they recommend to have a look at both, and that is certainly a good idea it seems).  Recall that PerfectTunes only has certain vs possible matches, no finer accuracy reporting and no tolerance tweaking.


Test (I): one-sample offsets and volume shift
  • Audio Comparer: Seems to identify all as "Exact" matches I think.
  • Audio Dedupe: At most seletive setting, it still catches some volume-changed mp3s, and the padded Sorrowless.  Shifting offset: some 96 percent match and up.
  • Music Duplicate Remover: single-sample shifts score > 99, volume change 96 and up though it does not catch the -10.5 and -12.
  • Similarity: at most selective threshold, only identifies the volume shifts, and not all of them: the -12 dB goes into a different group from the 0 dB.
    One sample off: "% content" score 99.9 and above; "% precise" score 87.7 to 99.4.  That's not very precise ...  one sample off scored as much worse than a lossy encoding.
  • PerfectTunes: Surprise - it misses two of the three Ars Moriendi tracks from the certain matches, but picks the others that are bit-identical up to offset.


Test (II): Badmotorfinger (fairly close matches)
  • Audio Comparer: Reports one pair (1, Rusty Cage) as "Exact" matches.  Looking for "Similar" at most selective threshold (100%) it identifies all pairs.
  • Audio Dedupe: scores 81, 86, 90, and up to 99.
  • Music Duplicate Remover: 87.something to 97
  • Similarity: At most selective threshold, identifies five pairs.  Bump threshold down a little bit, and all pairs are identified; and, at 100%/100%.  Similarity should do something about its scale.
  • PerfectTunes: Identifies one pair (7) as certain, the others as possible.


Test (III): Superunknown (not so close bit-wise)
  • Audio Comparer: Reports one pair (12) as "Exact" matches.  Looking for "Similar" at it identifies all pairs but one (11) - and the threshold does not matter at all (minimum or maximum yield the same).
  • Audio Dedupe: 75 (track 10), 76 (track 15), then 80 to 98.  No issues with #11.
  • Music Duplicate Remover: 88 to 98 except does not identify track 6.
  • Similarity: Nothing reported at most selective threshold.  "% content" scores from 98.4 to 100.0, "% precise" from 95.7 to 98.7
  • PerfectTunes: All caught as possible matches


Test (IV): Still Life (widely varying track lengths but two pairs identified by fb2k)
  • Audio Comparer: None reported as "Exact" matches.  Looking for "Similar" at most selective threshold (100%) it identifies five pairs (and not the two fb2k does) - and again, the threshold does not matter at all (minimum or maximum yield the same).
  • Audio Dedupe: Wow (not!).  At 49 percent threshold it generates false positives on both discs.  At 50 percent threshold, it catches 8 of 12 pairs on disc 1 and 6 of 7 on disc 2 (several scores in the fifties, and up to 95; the ones identified by fb2k are 64 and 70).
  • Music Duplicate Remover:  Matches only three tracks, 1-01 at score 55, 2-01 and 2-06 at score 88.  Not the ones identified by fb2k.
  • Similarity: At most selective threshold, identifies five pairs (2-01, 2-02), and not the ones that fb2k does.  Those which are bit-identical after shifting and cropping, are scored at 93.8/98.9 and 83.4/94.6.  Only at lower thresholds will it identify half the tracks.  Has all sorts of issues with track 1-08 (length/boundary).
  • PerfectTunes: 8 of 19 as possible matches.


Test (V): The Inheritance (full-DR vs squashed-at-label's-orders concurrent masters)
  • Audio Compare: drops the piano piece.  When I try to add it as a single file, it just does not show up after been "listened to".
    Yields Similarity scores in the mid nieties, max 97.  One down to 87 when comparing the most compressed, and down to 66 or 68 (track 3) when comparing a compressed mix to a dynamic. Minimum sensitivity is 50, and at that threshold it does not identify different songs.
  • Audio Dedupe:
    The squashed against each other: 92 to 98 except two tracks (4 and 8 ) at 87 and 88.
    Anything against full-DR: 91 to 97 except two tracks: 4 at 87 and, whoops: track 3 at 65.  (At 55 it can no longer tell tracks apart.)
  • Similarity:
    iTunes vs CD (the two squashed ones): "% content" from 99.7 to 100.0, "precise" from 95.8 (track 1) to 99.8.  Setting threshold down to 70 percent, it starts messing up different tracks.  Then, discovering a nuissance: when grouping together different tracks, it reports for each track the maximum match within the group (as if all tracks but one were near-identical).
    anything against the full-DR: similar results, although now some other track (11) is the "worst".  The "worst" shows worse scores, but the others in fact at times better.
  • Music Duplicate Remover:
    The squashed ones: 84 for the piano piece, the others 51 to 71
    The mp3s: Identifies the piano piece at 54 percent.  None of the others are identified even at 1 percent threshold.
    The CD vs the full DR: Track 1 not matched at all.  The others 50 to 64 (the piano piece not standing out in any way)
  • PerfectTunes: One track fails to be recognized as possible duplicate (in any of the three possible pairs): track 1.

  • Last Edit: 27 May, 2017, 09:10:59 AM by Porcus