Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Duplicate removal (Read 7746 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Duplicate removal

Does anyone know of a automated (or semi-) tool that removes duplicates from large mp3 libraries? I have searched here and in the foobar forum and I can't seem to find any direct answers. Foobar will remove duplicates from playlists but not files. The db search features are cumbersome and about as efficient as going through my entire file library. I tried seeing if anythign in mp3tagger worked bu no joy there.

Basically I recently acquired a large number of anthology cds. There is some crossover between them and my existing collection. It seems to me that this is porbably a common enough problem that one of the smart people (i.e. NOT ME) woud have created a tool for this. I just can't seem to find it...

Oh and freeware is a bonus but I would be willing to pay if I had to, as a last resort of course 

Duplicate removal

Reply #1
Directory Opus will find duplicate files of any kind:

http://www.gpsoft.com.au/

It also does hundreds of other things too! It's not free, but I find it indispensable.

--
Baxter

Duplicate removal

Reply #2
In case you can run linux, my solution is simple:

Code: [Select]
find -type f -print0 | xargs -0 md5sum -b | sort | uniq --all-repeated=prepend -w32 | colrm 1 34 | sed 's/^$/===== Probably identical =====/'


It's *extremely* fast (approaches the top speed of the HDD), works recursively and does support filenames with special characters or spaces

Also, it operates in a single pass.

Duplicate removal

Reply #3
but it doesn't search the tags and md5 is useles, if you don't run it on the audio-data only.

Duplicate removal

Reply #4
Quote
but it doesn't search the tags and md5 is useles, if you don't run it on the audio-data only.
[a href="index.php?act=findpost&pid=301000"][{POST_SNAPBACK}][/a]


A please would have been nice but, here you go..

(untested code)   

Code: [Select]
find -type f -print0 | xargs -0 -n1 bash -c 'mpg123 -d100 -s "$0" | md5sum -b' | sort | uniq --all-repeated=prepend -w32 | colrm 1 34 | sed 's/^$/===== Probably identical =====/'

Duplicate removal

Reply #5
I don't run linux - just wanted to note this

Duplicate removal

Reply #6
here
Dimitris

Duplicate removal

Reply #7
Quote
Directory Opus will find duplicate files of any kind:

http://www.gpsoft.com.au/

It also does hundreds of other things too! It's not free, but I find it indispensable.

--
Baxter
[{POST_SNAPBACK}][/a]



Quote
[a href="http://users.otenet.gr/~jtcliper/tgf/]here[/url]
[a href="index.php?act=findpost&pid=301107"][{POST_SNAPBACK}][/a]


I'm having the same problem, but does any of the programs mentioned actually compare audio-hashes only? If the program creates hashes of the entire file it won't get me any further since I might have changed the tags of the file...
--alt-presets are there for a reason! These other switches DO NOT work better than it, trust me on this.
LAME + Joint Stereo doesn't destroy 'Stereo'

Duplicate removal

Reply #8
Quote
here
[a href="index.php?act=findpost&pid=301107"][{POST_SNAPBACK}][/a]

I'm having the same problem, but does any of the programs mentioned actually compare audio-hashes only? If the program creates hashes of the entire file it won't get me any further since I might have changed the tags of the file...
[a href="index.php?act=findpost&pid=301155"][{POST_SNAPBACK}][/a]


It will only compare Audio data no tags, there are also options to guide the program what to compare or not.
Dimitris

Duplicate removal

Reply #9
I have achieved VERY good results using the following procedure:
===============================================
1. Used Godfather to scan my whole MP3 collection (over 20b) set to compute MD5 hashes also. This is good, because TGF computes the MD5 from the audio data only. Thus two differently named and tagged files will have the same MD5 hash.

2. Used my little MS access frontend to the DB of TGF, that searches the duplicates out of that DB. This app has two operation modes:
a) using the MD5 hashes ONLY - full automatic search/delete engine
b) Comparing them based on their tags giving the opportunity to see all the ID3 tag details of both files and the possibility of a instant preview - ensuring really fast manual operations (delete the first, delete the second, preview the first, preview the second, etc).

This app is not in the shape for general publishing, but if you would like to have it, I would be glad to send it to you with the instructions necessary.

3. Used Godfather to find duplicates on its own (based on ID3 tags, etc)
===============================================
With this, it took me only 8-10 hours to fully check my 20Gb music data.

@jtclipeer: maybe it won't be too difficult to integrate this little access app into your application.

 

Duplicate removal

Reply #10
Wouldn't it be cool if you could use one of those music fingerprinting systems?

It would require some manual work because these things aren't 100% precise and some extra CPU cylces. But this method would also find improperly tagged/named files in different encodings!

Musicbrainz has a Perl interface; should be rather simple for a programmer to whip something together.

Duplicate removal

Reply #11
Quote
here
[a href="index.php?act=findpost&pid=301107"][{POST_SNAPBACK}][/a]


Thanks man.. that is exactly the type of app I was looking for. Graphical interface works even for a person prone to "chair to keyboard interface errors", such as myself.  A lot of power packed into that app.. looks like I will have to put in some time playing with it.

Duplicate removal

Reply #12
Quote
I have achieved VERY good results using the following procedure:
===============================================
1. Used Godfather to scan my whole MP3 collection (over 20b) set to compute MD5 hashes also. This is good, because TGF computes the MD5 from the audio data only. Thus two differently named and tagged files will have the same MD5 hash.

2. Used my little MS access frontend to the DB of TGF, that searches the duplicates out of that DB. This app has two operation modes:
a) using the MD5 hashes ONLY - full automatic search/delete engine
b) Comparing them based on their tags giving the opportunity to see all the ID3 tag details of both files and the possibility of a instant preview - ensuring really fast manual operations (delete the first, delete the second, preview the first, preview the second, etc).

This app is not in the shape for general publishing, but if you would like to have it, I would be glad to send it to you with the instructions necessary.

3. Used Godfather to find duplicates on its own (based on ID3 tags, etc)
===============================================
With this, it took me only 8-10 hours to fully check my 20Gb music data.

@jtclipeer: maybe it won't be too difficult to integrate this little access app into your application.
[a href="index.php?act=findpost&pid=301214"][{POST_SNAPBACK}][/a]


sure send me what you got and I'll see what can be done ( I already have the dup removal using the library in my TODO list )
Dimitris

Duplicate removal

Reply #13
For deleting dupes from my system I use Dupe Checker PRO 6.0. I have a large mp3 collection (~250 Gb). So of course I allways must be sure that I'm adding a "new" file, which doesn't exist on HD already. So I use Dupe Checker and satisfied with the results. It can find dupe files which names are slightly different containing for ex. one same word. Quite convenient program with user friendly interface.

Good luck!