Skip to main content
Topic: Anyway to find duplicate songs in Foobar? (Read 14283 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Anyway to find duplicate songs in Foobar?

I know you can remove duplicates, but is there a way to find duplicates (without having to look through the playlist too). Because, I know I have some when I converted some .mp4s into .mp3s. 

Anyway to find duplicate songs in Foobar?

Reply #1
Menu Edit -> Remove duplicates

Anyway to find duplicate songs in Foobar?

Reply #2
Oops, I conveyed the wrong message in my orignal post. Sorry about that. I meant, is there any way to list all duplicate files (I know you can remove them from a playlist) because I want to delete them.

Anyway to find duplicate songs in Foobar?

Reply #3
I'd like to know this too.

I don't want to remove duplicates, maybe there's a way of creating a new playlist of all the duplicates, so I can sort through them?

I'd like this because I have many single files which I can now delete because I have the tracks as part of an album.

Anyway to find duplicate songs in Foobar?

Reply #4
I've been using a Windows Explorer replacement called Directory Opus.  It does some pretty powerful things, and for those who like foobar2000 because of the level of possible customization, it might be great.

You can create 'collections' which are often the results of a Find, and there is a special function for making a collection based on duplicates (determined by chosen parameters).  You see the duplicate files, and delete the ones you wish to delete.

It is a bit expensive, but it comes with a 60-day free trial.  It has good support, as well.


Anyway to find duplicate songs in Foobar?

Reply #5
Could you sort by Song Title or is that not what you're trying to do either?

Anyway to find duplicate songs in Foobar?

Reply #6
I've wanted to perform a similar action, as my library has become particularly agglomerative, with lots of overlap between collections.  I've been playing around with the Playlist Tree plugin and its the nested query function as a solution to this problem.  I'm a newbie to all this, having downloaded Playlist Tree last week and having no programming background, so I'm still in the "playing" stages right now and don't have a fool-proof response.  However, here's what I have so far.

1. Under Criteria: Type in an autoplaylist query that will narrow the scope of your duplicate search.  Alternatively, you can leave this field blank and search your entire database.  So if, for example, you have a collection with an album tag labeled "Top 500 Hits of Last year," you would type, album IS "Top 500 Hits of Last Year." (or alternatively use "album HAS" for nonexact searches). Only duplicates emanating from that collection will be found.  *See explanation below

2. Under Format:
Code: [Select]
@query<%artist% - %title%;@database;title HAS %title% AND artist HAS %artist%;'%artist% - %title%'>

3. Check Sort by display name after populating.

This operation, depending on the size of your library, is very CPU intense and you may need to go get a sandwich or two while this runs.  Basically what is happening is that the Playlist Tree is loading the files you originally specify under critieria, "grabbing" the artist and title tags (among all the other tags of the file), then for each of those files, performing a subsequent automated search (of the entire database) for files that match those tags, then listing those files.

Drawbacks, Next Steps
The obvious problem with my current set up is that while it is useful in identifying duplicates, provided that a human combs through the resulting data, it is extremely overinclusive in that it will list all files specified under "criteria" which will include even the initial file, even if that file is the only file in the databse (thus, every file in your specified collection will be listed + other files matching the artist and title tags).  So, if you don't narrow the scope of your initial search and wish to find duplicates in your entire database, my method makes zero sense, because you could just create a playlist of all your files and then sort by title, and save yourself some CPU and time (but lose a sandwich).  Nonetheless, if you are adding a collection and you want to comb through that collection's duplicates, my method is one possible, albeit imperfect, solution.  What I am hoping to do is to limit Playlist Tree to return only searches that produce at least 2 results.  Anyway, I thought I would just try to get the ball rolling, as I think Playlist Tree and its nested query abilities could be a very useful tool in solving the problem of finding duplicates.

Anyway to find duplicate songs in Foobar?

Reply #7
How long does the code take to run on your library (could you also provide the size)?


First we make a playlist of files. Export this playlist as m3u (or use text tools to export the paths of the files)

Then, we use the "Remove duplicates" tool, export the new playlist as m3u (or use thext tools to export the paths of the files)

Finally, we use either excel or a database program to perform a differential operation (Create a new list of songs which were present in the first list but not in the second list -- the duplicates).

Once we have the list, we could export it as m3u, import it into foobar and see for ourselves.

Anyway to find duplicate songs in Foobar?

Reply #8
I would estimate that it takes approximately 3-4 minutes per 1,000 files in a given "collection" to run my Playlist Tree code.  My library is ~50,000 songs.  I have not tried running this code on my entire library because of its volume.

Anyway to find duplicate songs in Foobar?

Reply #9
Both good ideas, I'll try these.

It often happens that I would download a number of single MP3s (from which I remove the album and tracknumber tags). Then I might decide to buy the full album after that and rip it to its own folder, so hopefully one of these methods will help me see which single MP3s I can now delete.

Anyway to find duplicate songs in Foobar?

Reply #10
I thought I would just provide two additional thoughts to my previous post:

Regarding Playlist Tree Nested Query Function:
I believe I've solved the overinclusiveness problem I mentioned in my original post.  The following code could be used under Playlist Tree > Format (while keeping the other fields the same as in my original post):
Code: [Select]
@query<'@format<$ifgreater(%_itemcount%,1,,@hidden)>' %artist% - %title%;@database;title HAS %title% AND artist HAS %artist%;'%artist% - %title%'>

This code is a slight tweak from my earlier code, which simply hides items that only contain a single match (i.e. non-duplicative files).  Within Playlist Tree, you can go track-by-track or load the entire list (root) into the playlist.  From the playlist, I usually just do a Ctrl+F (or sort by some distinguishing factor) and select the duplicate files I wish to delete.

An alternative, less CPU-intense, non-nested query function:
Upon figuring out how to use the %_itemcount% tag, I came up with an alternative way of finding "duplicate" files.  In playlist tree, search your entire database and use the following format:
Code: [Select]
%artist%|'@format<$ifgreater(%_itemcount%,1,,@hidden)>' %title%|%album%

This is a standard track-listing under Playlist Tree that simply sorts by artist, then title, then album, but only reports tracks with more than one match for a track (potentially duplicate tracks).  The "'@format<..." portion of the code "hides" all tracks with a single item (i.e. non-duplicates).  You could go a step further and move the @format portion to the %album% portion, thus finding only songs that have the exact same artist, title, and album.

The downside to this method over a nested query is that it will only find duplicate files that have the exact same %artist% and %title% fields, which happens to be an unfortunate problem since, in many cases, the duplicate files we are seeking to eliminate are poorly tagged.  Moreover, even well-tagged variations will not match.  For instance, "Bruce Springsteen - Born to Run" will not match with "Bruce Springsteen - Born to Run (Live)."  I'm sure there are ways around this problem, but I am still in the early stages of using this method.  Until then, realize that this method is underinclusive.

Quick edit: To roughly quantify "less CPU-intense," it took about a minute to execute this code on my entire library of ~50,000 files.

SimplePortal 1.0.0 RC1 © 2008-2020