Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Best solution for seeing / managing duplicates ? (Read 17974 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Best solution for seeing / managing duplicates ?

Hi everybody,

I have a quite large music library (18.000+ files). Among my files, I have lots of duplicates (files with the same %artist% and %title%). These duplicates can be of two kinds :

1/ Same version of the same song : e.g. for a given artist, the original version AND the same song put inside a "greatest hits" album : both songs are 99% to 100% the same, bit per bit. For quite known artists, I can often have 4-5 times the same song (original album, "greatest hits" album, 3rd party compilation, etc.).

2/ Different versions of the same song : e.g. for a given artist, the original version AND a remastered version. Here both songs can be quite different because of the remastering.

There is a third possibility (Live versions, Unplugged/Acoustic versions...), but I've already handled it by inserting a " (Live)" / " (Unplugged)" / " (Acoustic)" mention directly at the end of the song title. So the titles are different, simple because for me these are different songs : e.g. for me "Jagged Little Pill" is one song, and "Jagged Little Pill (Acoustic)" is a different song (even if it's an acoustic version of the first song, it has nothing to do with a remastered version of "Jagged Little Pill" that would sound quite the same - to my ears it's a different song, so it has a different name).

Anyway, here's my point : I'd like an utility that could help me to scan my music library and immediately see regrouped ALL the files with the same %artist% and %title%. I want to be able to see them first, and then decide what I will do to them : either keep them (if they are of kind 2) or delete them (kind 1).

iTunes does that quite well with its "Show duplicates" function. Foobar 0.9.5 only haves a "Remove duplicates" / "Remove dead items" function. And I haven't even been able to get it to work ! Just for the info, I've put two test songs in the Playlist view (same artist, same title), selected them, then selected "Remove duplicates"... and nothing happens. But anyway, that's not the kind of solutions that I'm searching for. I want to SEE duplicates, not remove them.

Can anybody help me out with this ?

Thanks in advance. 

Best solution for seeing / managing duplicates ?

Reply #1
I'd like an utility that could help me to immediately see ALL the files with the same %artist% and %title% among my music library. I want to be able to see them first, and then decide what I will do to them : (...)

Facets can help you with that.

Create one Facet with three columns: %artist%, %title%, and the Tracks statistic.

Then sort by the Tracks column (descending) to move all duplicates to the top.

Quote
Foobar only haves a "Remove duplicates" / "Remove dead items" function. And I haven't even been able to get it to work ! I've put two test songs in the Playlist view (same artist, same title), selected them, then "Remove duplicates"... and nothing happens.

This function only removes duplicate playlist entries which point to the same file.

Best solution for seeing / managing duplicates ?

Reply #2
That's brilliant !  I'm going to try that out right away. Thanks Frank, you've been very helpful, I appreciate.

EDIT : it works beautifully ! This is great !! 

Best solution for seeing / managing duplicates ?

Reply #3
1/ Is it possible to use Facets to display all albums with same artist, same title, but more than one date ?

e.g. 1 : same artist, same album "album title" but number of dates > 1 (the album "album title" probably has a duplicate, and only one of the dates can be correct)

e.g. 2 : same artist, same album "album title" --> Tracks 1-5 have Date 1, but Tracks 6-10 have Date 2 (the album "album title" has 2 dates depending on the tracks, which is incorrect)


2/ Is it possible to use Facets to display duplicate albums, regardless of the date ?

e.g. 3 : same artist, same album "album title", but number of albums > 1



What would be the best way to achieve these two objectives, with or without Facets ?

Best solution for seeing / managing duplicates ?

Reply #4
I use playlist tree for these types of queries.


Moderation: removed fullquote, moved the last two replies into this thread.

Since Jose is asking about solutions in any component, I think this general thread will do better.

Best solution for seeing / managing duplicates ?

Reply #5
Quote
Is it possible to use Facets to display all albums with same artist, same title, but more than one date ?

Create two facets: album and date.

Add a Statistics/Dates column to the first facet, and sort by it in descending order.

Quote
Is it possible to use Facets to display duplicate albums, regardless of the date ?

If you can come up with some titleformat expression to differentiate between duplicate albums in the second facet, sure.

For example, $replace(%path%,%filename_ext%,) should do it.

Best solution for seeing / managing duplicates ?

Reply #6
If you use MusicBrainz for tagging, you midgt find you have files (nearly) "bit per bit" equal songs, as, especially soundtracks, tend to have a different album artist than performer.

Music Magic Tagger can perform a quite good duplicate search, which don't work its way through tags, but the audio data itself.

Best solution for seeing / managing duplicates ?

Reply #7
Create two facets: album and date. Add a Statistics/Dates column to the first facet, and sort by it in descending order.

Thanks, it works perfectly. I should have thought of creating two facets instead of one single facet.

Quote
Is it possible to use Facets to display duplicate albums, regardless of the date ?

If you can come up with some titleformat expression to differentiate between duplicate albums in the second facet, sure. For example, $replace(%path%,%filename_ext%,) should do it.

What do you mean exactly ? Sorry but I can't figure the two facets, and the place where I should enter the titleformatting expression.

For example, I have a first facet with "Artist" and "Statistics / Albums" which I can sort in descending order.
This way I see that the artist "Pink Floyd" has 26 albums for instance. And so on for every other artist.
But how can I differenciate between duplicate albums without having to go manually through all of them ?

@Nemphael : thanks, I'll try that out one day, but for the moment I'm talking about tagging.

Best solution for seeing / managing duplicates ?

Reply #8
The first facet is supposed to display albums, as mentioned in the first part of my reply.

Use the titleformat expression as column pattern of the second facet.

It's basically the same as the "album and date" approach, just with a different second facet.

Best solution for seeing / managing duplicates ?

Reply #9
OK, making some progress. Almost there, but there's still something that bugs me.

e.g. I'm displaying my "Pop" genre :
It has 99 artists, 225 albums, and... 230 paths (Facets column =  $replace(%path%,%filename_ext%,) )
That means there should be 5 duplicates, right ? Let's try to find them.

So I take a look artist by artist :
Artist 1 : 24 albums, 24 paths. Nope, it's not this one.
Artist 2 : 14 albums, 14 paths. Neither.
...
Last artist : 1 album, 1 path.

The problem is that I've been through all my 99 artists, and I've always found exactly the same number of albums and paths. That makes 225 albums total, and of course 225 paths total.

So where are gone the 5 more paths that show only when all albums are selected, but seem to disappear when we take albums one by one ?

EDIT :I've found a way to make it work !!! 
- First facet : album
- Second facet : path
- Then in the first facet, add a column "Statistics / Paths"
- Optionally, in the second facet you can add a column "artist" to make it even clearer

Best solution for seeing / managing duplicates ?

Reply #10
Concerning disappearing paths in an artist, album, paths setup:

I assume number of paths refers to what is displayed in the summary item of the path facet: "All (x paths)".

1. Start with no selection at all. Note the number of paths.
2. Select "All" in the artist facet. If the number of paths decreases, it means that you have some tracks without an artist tag.
3. Select "All" in the album facet. If the number of paths decreases, it means that you have some tracks with an artist tag but without an album tag.

Explanation: The "All" item of facets using a [%fieldname%] pattern (i.e. with brackets) only contains tracks which have the specified field.

Best solution for seeing / managing duplicates ?

Reply #11
Your assumption is right.

For the rest, I think I've understood. And I'm pleased to say that all my 20.000+ tracks seem to be correctly tagged (I mean without "artist" or "album" missing tags). Although there are easier ways to see this of course (Playlist View, sorting by Artist or Album columns for instance).

What to say now ? (besides thanks of course). Well, Facets really is a powerful tool, no doubt about it. You can do almost everything (and anything  ) with it. But you really have to understand how it works to make it work exactly the way you want. So my guess is that basic users (like me, like me) will probably be puzzled with things like the variation of values in the summary items of each facet, depending of what you have selected or not. I still am sometimes, and I will probably have new questions in the future.

Now that I'm beginning to understand how this thing works, I also understand that, like every other powerful tool, Facets can serve you well, or else it can make you make mistakes if you don't understand exactly what it's trying to say to you. In other words, Facets shows exact results, but these results may not correspond to what you think they correspond to ! 

So maybe that's the only negative aspect of the whole thing IMHO : it looks very easy to use, it is very easy to use for basic tasks, but it is NOT so easy to use / understand when you try to unleash its full power.

But besides that, I don't see how I've been able to live without Facets before...


EDIT : here's a question about the summary items for example.

Let's assume I have 4 facets : Genre / Artist / Album / Path
I see 2271 paths in the summary of the Path facet.

Now I add a "statistics paths" column to the third facet, so I have : Genre / Artist / Album + Statistics paths / Path
I still see 2271 paths, in both the Statistic paths column and the Path facet

Now I add a "Artist" column to the fourth facet, so I have : Genre / Artist / Album + Statistics paths / Path + Artist
Now I see 2289 paths in the Statistic paths column, but still 2271 paths in the Path + Artist facet.
My Artists and Albums figures remain unchanged through all the process (610 artists, 1755 albums).

All this without selecting anything in any facet (except "All genres" in the first facet of course).

What does this +18 paths increase mean ?

Best solution for seeing / managing duplicates ?

Reply #12
i actually stumbled upon a very similar setup myself, glad to see I am not loosing my mind then

I really think there would be a niche for a component that would look up album names and display duplicates with the path / bitrate / location all displayed for us to go ahead and ticket which ones to delete...

I have no programming experience but surely some of you clever guys could whip something up... it doesn't have to be pretty...

please....

Best solution for seeing / managing duplicates ?

Reply #13
What does this +18 paths increase mean ?

It means that some of your albums contain multiple artists. Suppose you have a track with Artist Name = A; B residing in some folder C:\music\artist\album, then you would get two entries in that path + artist facet:

C:\music\artist\album | A
C:\music\artist\album | B

And that is what the statistics column displayed in the album facet displays: the number of items. Unfortunately, it is named based on one of the columns found in the next facet, which can indeed be misleading. Maybe a more general "items" label would be more appropriate in such cases.

More specifically, if that artist column uses the [%<album artist>%] pattern, which it does by default, it means that you have some albums to which you should add an album artist field.

Best solution for seeing / managing duplicates ?

Reply #14
Suppose you have a track with Artist Name = A; B residing in some folder C:\music\artist\album, then you would get two entries in that path + artist facet:
C:\music\artist\album | A
C:\music\artist\album | B

And what is considered by Facets as multiple artists ? What are the separators understood by Facets ?
I mean : is "Bruce Springsteen & The E Street Band" considered as two separate artists "Bruce Springsteen" and "E Street Band" ? That would be wrong IMHO. But if "&", "feat." and "and" are NOT considered as separators by Facets, then it's OK with me. 

But anyway, this can't be my case, because I have chosen to remove ALL [] and <> in Facets columns prefs. Why ?
- I have removed [] because I want Facets to display everything, including albums with missing tag information, so I can properly tag them.
- I have removed <> because if a song is tagged "Pop, Rock" I like it to be displayed "Pop, Rock", not "Pop" and "Rock". And if a song is tagged with several artist names (e.g. Artist = "Michael Jackson & Janet Jackson"), I always define an Album artist field where I enter the main artist for that given album (e.g. Album artist = "Michael Jackson"). So I have replaced the Facets "Artist" column with an "Album artist" column. And in that case I don't need <> anymore. EXCEPT when I want to find particular artists inside VA albums (tagged with Album artist = "VA") of course. But then I can replace the "Album artist" column with an "Artist" column in the relevant facet. Or simply type the name I'm looking for in the search field.

So, well, the fact is that I don't know where this +18 paths difference can come from... 


This being said, I have a suggestion (unless there's some other way to do this) :

Suppose I want to find the number of duplicates of each song.
I create a single facet, with 3 columns : Artist, Title, and Statistics tracks.
That way, I can see that I have 4 times the following song :
Artist = Mylène Farmer, Title = Rêver (Live)
I also see that I have 2 times the following song :
Artist = Mylène Farmer, Title = Rêver
Of course it's the same song (two studio files with probably one duplicate, four live files where I'll have to look manually to determine if they are four different versions or not).

So I'd prefer if Facets could tell me that I have 6 times this song, regardless of the dimmed brackets part ( (Live), (Unplugged), (Acoustic), etc.). That could be very useful if I want to make a quick playlist with several versions of the same song.

Of course we can sort the facet by the "Artist" column, so the "Rêver" and "Rêver (Live)" rows become closer to each other, making it quite easy to select them both. But I'd still like if Facets could understand that it's the same song, and tell me that there are 6 versions of it (may them be duplicates or not).

What do you think ?

Best solution for seeing / managing duplicates ?

Reply #15
And what is considered by Facets as multiple artists ? What are the separators understood by Facets ?

None. Facets doesn't separate any values on its own, the tags have to be written this way in order for this to happen. If you enter "Pop; Rock" (note the semicolon) into the genre field of foobar2000's properties dialog, the result is not a single genre field with "Pop; Rock" written to the audio file, but two independent ones. One genre="Pop", and one genre="Rock".

Now if you use %genre%, for example in the playlist, those two values are listed comma-separated as "Pop, Rock". That's just a representation, not how they are stored in the file. But since the whole point of tagging your files this way is to get two separate genres, both album list and facets offer a special syntax to properly display them as separate items.

Values with "&", "and", "feat.", and so on are kept as they are.

Quote
So, well, I don't know where this +18 paths difference can come from...

Well, again, either from a single file with multiple artists and without an album artist tag, or from multiple files in the same folder which have different artists but no common album artist tag. There is no other explanation.

Quote
So I'd prefer if Facets could tell me that I have 6 times this song, regardless of the brackets ( (Live), (Unplugged), (Acoustic), etc.)

You could remove the text in brackets from display by using a column pattern like this:

Code: [Select]
$if2($cut(%title%,$sub($strchr(%title%,'('),2)),%title%)

But we are getting to a point where fuzzy comparison mechanisms come into play, and that's something I wouldn't want to add to facets, because it is, as you have already pointed out, about showing exact results. I guess anything that goes beyond this should rather be handled by a solution that specializes in finding duplicates.

Best solution for seeing / managing duplicates ?

Reply #16
Well, everything is clear now. I understand the way you are seeing things. 

Quote
So, well, I don't know where this +18 paths difference can come from...

Well, again, either from a single file with multiple artists and without an album artist tag, or from multiple files in the same folder which have different artists but no common album artist tag. There is no other explanation.

All right then. I'll post again once I have found the 18 "faulty" files.
Too bad Facets can't help me more with this : it shows me that there are 18 "faulty" files, it knows where these files are (since it has been able to find them !), but it just can't "isolate" those files and show them to me ? So now I have to look for 18 files among 20.000 + files ! Just imagine the amount of work... 

But like you said, Facets is not a specialized tool for finding duplicates, so I understand.

Best solution for seeing / managing duplicates ?

Reply #17
You can find those files with the same approach used to display duplicate albums earlier. The two problems are similar in that both are about asking "Which X have more than one Y?" Whether you call X=albums and Y=date, or X=paths and Y=artist, the way to find your answer is the same.

I'll leave you with that.

Best solution for seeing / managing duplicates ?

Reply #18
Once I have the duplicate list, is there anyway to then delete/select the songs with a lower %play_counter% than their duplicate counterparts?

Best solution for seeing / managing duplicates ?

Reply #19
After many attempts I have managed to use this technique successfully. Whether I want something different to the previous posters, or something in Foobar or Facets has changed, or I'm just very hard of understanding, I'm not sure, so I'm posting what worked for me, in case it helps someone else.

To find duplicate albums:
In one facet, right-click on the heading, & with Multiple columns enabled, turn on columns Artist & Album. Then turn on the Statistic Subitems. Drag the three columns into that order, if they aren't already.
Click on the Subitems column heading to sort descending, bringing potential duplicates to the top.

Optionally, to cope with multiple discs of an album in separate directories, where the album tags do not include a specific disc identifier (e.g. %album% = "Albumname disc A"), add a Disc column before Subitems. Define Disc under columns preferences as %discnumber%.

I expect this will fail if an album is duplicated within the same directory (e.g. as both Flac & MP3, or as both Flac & CUE without one being excluded under Foobar's library config). A %bitrate% or %codec% (or %__codec_profile%) column would often overcome this, but I have come across albums where it wouldn't (e.g. a freebie download I have, released one track per week, which switched from VBR to CBR after a few weeks)*.

To find duplicate tracks:
Create a facet with columns Artist & Title, & statistic Items.

As above I expect this to fail if the duplicates are in the same directory.

In both cases, especially if you don't have a Location panel displayed in Foobar, create a second facet to the right of the first, with one column, Path. Define Path under columns preferences as $replace(%path%,%filename_ext%,).

*I can quite see why duplicate finding isn't built in out of the box - there are just so many variables, even your degree & type of obsession with tagging style & accuracy.

Best solution for seeing / managing duplicates ?

Reply #20
Is it possible to filter the results only to those items that have subitems >= 2?

Best solution for seeing / managing duplicates ?

Reply #21
Is it possible to filter the results only to those items that have subitems >= 2?
You could perhaps try foo_uie_sql_tree. I haven't actually tested, but what you ask should be doable.

HTH.

Alessandro