Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Finding duplicates and only delete duplicates in folders which contain (Read 5823 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Finding duplicates and only delete duplicates in folders which contain

Is there a program / convenient way to find duplicates (of music albums) and to automatically delete the duplicates being in folders which contain duplicates only (having no none duplicates)?

Suppose you have 2 versions (partially duplicates) of the same album, this one, #1 (folder 1):

Quote
Covers
01. Blood Train.flac
02. Leon and Maya.flac
03. Leon's Scary Dream.flac
04. Mahogany's Iron Hammer.flac
05. Leon Follows Mahogany.flac
06. Guardian Angel.flac
07. Engaged To Be Engaged.flac
08. Leon's Obsession.flac
09. I Love You - Taking Photos.flac
10. Leaving The Diner.flac
11. Mahogany and Leon.flac
12. Leon Wakes Up.flac
13. I've Been Caught.flac
14. Retrieving The Camera.flac
15. I Have A Train To Catch....flac
16. Leon Jumps On Train.flac
17. Train Fight.flac
18. Im Untergrund.flac
19. Final Kampf.flac
20. Maya Dies.flac
21. Leon The Butcher.flac
folder.jpg
Johannes Kobilke & Robert Williamson - The Midnight Meat Train [Original Motion Picture Score].m3u
The Midnight Meat Train [Original Motion Picture Score].log


And this one, #2 (folder 2):

Quote
01. Blood Train.flac
02. Leon and Maya.flac
03. Leon's Scary Dream.flac
04. Mahogany's Iron Hammer.flac
05. Leon Follows Mahogany.flac
06. Guardian Angel.flac
07. Engaged To Be Engaged.flac
08. Leon's Obsession.flac
09. I Love You - Taking Photos.flac
10. Leaving The Diner.flac
11. Mahogany and Leon.flac
12. Leon Wakes Up.flac
13. I've Been Caught.flac
14. Retrieving The Camera.flac
15. I Have A Train To Catch....flac
16. Leon Jumps On Train.flac
17. Train Fight.flac
18. Im Untergrund.flac
19. Final Kampf.flac
20. Maya Dies.flac
21. Leon The Butcher.flac
Johannes Kobilke & Robert Williamson - The Midnight Meat Train [Original Motion Picture Score].m3u


If one removed the duplicates of folder 1 the folder "covers" and some files would be kept in folder 1 and in folder 2 all of the flac files, the same album in two folders at the end. So, how could one easily (automatically) remove the duplicate files / folders of the folders which have duplicates only (here folder 2)?
Newest stable foobar, portable | Win 7

Finding duplicates and only delete duplicates in folders which contain

Reply #1
http://www.similarityapp.com/    ?

Finding duplicates and only delete duplicates in folders which contain

Reply #2
Well, I do not have any idea. Many thanks for the link.
Newest stable foobar, portable | Win 7

Finding duplicates and only delete duplicates in folders which contain

Reply #3
Seems to me that Similarity cannot restrict selection down to folders which contain no unique tracks - though you can sort so that it is easier to spot that all tracks 1,2,3 ... are marked.
Furthermore, Similarity will not compare audio bit-by-bit, it seems. I always get false positives.

For bit-by-bit duplicates, try DupeGuru (the Music Edition).  Which does not have a smart way of identifying redundant folders either, AFAICS.

Finding duplicates and only delete duplicates in folders which contain

Reply #4
Quote
Seems to me that Similarity cannot restrict selection down to folders which contain no unique tracks

Yes, I would say so, too.

Quote
- though you can sort so that it is easier to spot that all tracks 1,2,3 ... are marked.

Yes, but it would be quite a big effort, I guess.

That looks very nice (strange, I hadn't found it before): http://i.imgur.com/LE2v25h.png. Hmmm, but, suppose you have two versions of an album, e.g. two ones of 1st Avenue in this example. And you would delete all of the 192 bitrate tracks shown (and keep the 320 bitrate tracks). And if that 192 bitrate album may be had two bonus tracks (not being in the 320 bitrate album), these 2 tracks would remain in that folder so you had 2 folders with a complete album and one with 2 bonus tracks (I would keep both of the albums completely). To check if there will be remainings of that version after deleting you would have to open that album in e.g. a file manager or so. That appears to be very inconvenient. If you would do it for thousands of albums....or...is there anything I am missing?
Newest stable foobar, portable | Win 7


Finding duplicates and only delete duplicates in folders which contain

Reply #6
Thank you for the link. There are quite a lot of such duplicate finders (but without the option to look for those logarithms or what it is to compare mp3s) , I assume this one will do it the same / similar way most of the others do.

At the moment the program coming closest to (but still far away) what I want is WinMerge (as long as the duplicate's file names are the same), but to check each single album is...ugly...

Many thanks anyway, Porcus
Newest stable foobar, portable | Win 7

Finding duplicates and only delete duplicates in folders which contain

Reply #7
but to check each single album is...ugly...


My solution is not to. I have lots of duplicates I live with happily. But I run those scanners every now and then on demos, bootlegs, and that sort of stuff which sometimes isn't what they pretend to be.

And I ran a few on the entire collection to identify mistagged music, as everything was auto-ripped. A few surprises yes :-)

Finding duplicates and only delete duplicates in folders which contain

Reply #8
Quote
My solution is not to.

That's - since yesterday - mine one, too. I should have left the duplicates like / where they are and shouldn't have started caring about them (I had removed before about 450 GB of duplicates, but those ones were very easily removable duplicates).

And all of those programs like Similarity do not appear to work for me at all (may be I use the wrong settings for me or...), respectively you would have to check each single music file before removing it.

Alright, done. Many thanks for your help
Newest stable foobar, portable | Win 7

Finding duplicates and only delete duplicates in folders which contain

Reply #9
Are these files tagged properly? Are the duplicates always proper duplicates (you can delete whichever without information loss) or more like transcodes/partial matches?

If I'm not worried of losing data I would just use a media player like foobar to reorganize these files into one strict pattern based on the tags. In that case the original would overwrite the duplicate (or the other way around) and you'd end up with only one copy.

If they are partial matches (but still tagged) the only other thing I can think of is assigning a somewhat unique identifier to the albums (again, in a library viewer with a program such as foobar) and eyeballing through your library. Something like ALBUM - CURRENT DIRECTORY will produce two different hits for the original and copied album while sorting them next to each other so you can notice them.

Or are we talking about completely misstagged files that flatout lie about what they are? I'd be more concerned about how did that even happen in the first place.

I personally had not much of luck with stuff like Similarity. They require of lot of resources, especially time to produce any results. I'd rather make sure duplicates won't happen in the first place. Such as not letting anything into your library without going through the tagging process. If they make it through, you at least have a way to quickly get rid of them (elimination via merging).

Finding duplicates and only delete duplicates in folders which contain

Reply #10
Quote
Are these files tagged properly?

Some will be tagged well, some not, I assume, difficulty to say.

Quote
Are the duplicates always proper duplicates (you can delete whichever without information loss) or more like transcodes/partial matches?

They are / were all exact duplicates (compared by file content, sometimes by checksum), none of them are / were compared only by name and / or file size (date) / parts / tags content of them.

Quote
If I'm not worried of losing data...

I am extremely.

Quote
In that case the original would overwrite the duplicate (or the other way around) and you'd end up with only one copy.

So may be a version of worse or better quality, with more or less tracks (bonus tracks) and so on.

Quote
If they are partial matches (but still tagged) the only other thing I can think of is assigning a somewhat unique identifier to the albums (again, in a library viewer with a program such as foobar) and eyeballing through your library.

Some time ago I had once tried such (the way may abilities let me do it), but there are too many possibilities, so it would end up in something like chaos.

Quote
Something like ALBUM - CURRENT DIRECTORY will produce two different hits for the original and copied album while sorting them next to each other so you can notice them.

And there often are more than a single duplicate of an album, may be 3, 4...soundtracks often have very many different versions of the same movie. So I would have to eyeball very much, a very long time, I guess.

Quote
Or are we talking about completely misstagged files that flatout lie about what they are?

No, no, most of them will be tagged quite well, I would say.

Quote
I personally had not much of luck with stuff like Similarity. They require of lot of resources, especially time to produce any results.

I didn't get such stuff to work at all. Those programs displayed files to be duplicates with high rating (or whatever it is called) although these duplicates differed in size with 5, 6 MB and runtime. So you would have to check each single duplicate separately anyway, I assume. But may be I used the wrong setting, I do not have any idea.

Quote
If they make it through, you at least have a way to quickly get rid of them (elimination via merging).

Hmmm, what does that mean? To just mix them? Copy the duplicate files of version 1 to version 2?

But it appears there absolutely is no way to properly find / delete duplicates without extremely effort.
Newest stable foobar, portable | Win 7

Finding duplicates and only delete duplicates in folders which contain

Reply #11
Some time ago I had once tried such (the way may abilities let me do it), but there are too many possibilities, so it would end up in something like chaos.

And there often are more than a single duplicate of an album, may be 3, 4...soundtracks often have very many different versions of the same movie. So I would have to eyeball very much, a very long time, I guess.

Well if they are at least tagged with album titles, it should be fairly easy to eliminate them. If there are 4 duplicates you should end up with 4 hits in a library viewer, like so:
Code: [Select]
Random album -- drive "E:\" folder "random album"
Random album -- drive "D:\" folder "random album"
Random album -- drive "D:\" folder "asdasdasd"
Random album -- drive "D:\" folder "best of"

The fact that the album name is shown first means alphabetically these should end up close together (even if there's any variance in the album names, generally it occurs at the end). Since they are duplicates, the second part is most likely different and easy to notice like this. Hopefully you didn't shove duplicates next to the originals with differing filenames.

Obviously this isn't some magical solve all with one click method, because that doesn't exist. But it still eliminates a potentially large percentage of the problematic files fairly efficiently.

Hmmm, what does that mean? To just mix them? Copy the duplicate files of version 1 to version 2?

What I described at the beginning of my previous post. Assuming they are properly tagged, you can reorganize and move them into a strict filename/folder structure on the same drive, overwriting duplicates that way. This is only safe if your tags are trustworthy.

But it appears there absolutely is no way to properly find / delete duplicates without extremely effort.

Well, not really. Stuff is really easy to find and eliminate if they are properly tagged. If they are a mishmash of randomly tagged/named files, good luck. That's why it's so important to start tagging your library and keep it that way (new additions should take minimal effort). How many tracks do you have in your library? It will only get much worse if you neglect it now as your library grows and you still didn't get into the habit of proper organization.

Fixing your library is painful, yes, but you only have to do it once, not in one sitting, and if you use the tools something like (the audio player) foobar provides, you can leverage a lot of the busywork with the smart use of it. You want to look for patterns in your library that are constant across a large number of tracks, run a batch tagging process on them, and repeat this until you are left only with tracks that are completely messed up. Track titles can usually filled using filenames, artist/album names by folder names, etc.

Then at the end you can decide whether you want to fix the completely messed up ones by hand or you just never cared enough about them to keep them organized at all. Perhaps you'll find that getting rid of the junk let's you focus on and appreciate artists you actually care about more. You might also want to use a way to clearly tell apart files that have been fixed already (move them to another folder, use a dummy tag on files that are not fixed which you delete when they are, etc).

As someone who went through ~40k tracks at the time, I don't think it was that bad at all as I thought it would be (I didn't type have to type in every single tag by for every single track by hand). Granted most of the files were at least decently organized into folder structures/filenames already. And I do understand that you probably have better things to do than this, but given you are already looking into tools to fix that mess, you can probably see for yourself that it will only get worse over time without organization. And the duplicate finder tools won't fix it, proper tagging will.

Finding duplicates and only delete duplicates in folders which contain

Reply #12
Quote
Well if they are at least tagged with album titles, it should be fairly easy to eliminate them. If there are 4 duplicates you should end up with 4 hits in a library viewer, like so:
CODE
Random album -- drive "E:\" folder "random album"
Random album -- drive "D:\" folder "random album"
Random album -- drive "D:\" folder "asdasdasd"
Random album -- drive "D:\" folder "best of"

Yes, that appears to be a good method. But finding the duplicate albums may not be that difficult. To check out the albums found would make an extremely big effort (for me I assume) and to delete the right files. You would have to look at each single album.

Actually there are two kinds of duplicates I had to handle: the duplicates with same content (same checksum) and the ones being just the same piece of music / the same track (different content). The first ones you could find easily with any duplicate search program independently of the duplicates' names. But deleting even these ones is a big effort, one would to check each single album. And the duplicates being the same track (different content) might be different versions (12 inch, mixes, different runtimes, different sound, of better or worse quality and so on). To check this would...

Quote
What I described at the beginning of my previous post. Assuming they are properly tagged, you can reorganize and move them into a strict filename/folder structure on the same drive, overwriting duplicates that way. This is only safe if your tags are trustworthy.

I couldn't rely on that. And overwriting duplicates this way...whew...I would have a really bad feeling.

Quote
How many tracks do you have in your library?

At the moment there are about 7000 tracks here. But I am at the beginning of organizing, the other tracks / albums are stored on external drives, I still have to add them to the library (I hope scanning a 4 TB drive does not last that long with foobar and I hope foobar shows these drives, respectively the music on them even when they are plugged off).

Quote
Well, not really. Stuff is really easy to find and eliminate if they are properly tagged. If they are a mishmash of randomly tagged/named files, good luck. That's why it's so important to start tagging your library and keep it that way (new additions should take minimal effort). How many tracks do you have in your library? It will only get much worse if you neglect it now as your library grows and you still didn't get into the habit of proper organization.

Fixing your library is painful, yes, but you only have to do it once, not in one sitting, and if you use the tools something like (the audio player) foobar provides, you can leverage a lot of the busywork with the smart use of it. You want to look for patterns in your library that are constant across a large number of tracks, run a batch tagging process on them, and repeat this until you are left only with tracks that are completely messed up. Track titles can usually filled using filenames, artist/album names by folder names, etc.

Then at the end you can decide whether you want to fix the completely messed up ones by hand or you just never cared enough about them to keep them organized at all. Perhaps you'll find that getting rid of the junk let's you focus on and appreciate artists you actually care about more. You might also want to use a way to clearly tell apart files that have been fixed already (move them to another folder, use a dummy tag on files that are not fixed which you delete when they are, etc).

As someone who went through ~40k tracks at the time, I don't think it was that bad at all as I thought it would be (I didn't type have to type in every single tag by for every single track by hand). Granted most of the files were at least decently organized into folder structures/filenames already. And I do understand that you probably have better things to do than this, but given you are already looking into tools to fix that mess, you can probably see for yourself that it will only get worse over time without organization. And the duplicate finder tools won't fix it, proper tagging will.


Yes, that's very plausible...of course...40.000 files, that really is a lot. But...when I think of handling 40.000 files or more I...I couldn't imagine to survive that pain. Even moving albums to other drives to get some kind of basic structure lasts extremely long with USB 2.0, 2, 3 days none stop for 3, 4 TB with a Notebook doing only this work. So just making some backups before trying to rename / tag with foobar or try some more or less automatically working batches or whatever always takes very much time.

Alright, after I have some kind of basic structure I will add all the albums (with all of the duplicates) to foobar's library and see how it looks like.

Many thanks.
Newest stable foobar, portable | Win 7