Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Filtering Duplicate FLACs Using MD5 Hash (Read 1315 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Filtering Duplicate FLACs Using MD5 Hash

I am trying to identify duplicate FLAC files/albums from my collection using the MD5 hash in the header.

I would like to do this not through the library (I haven’t set one up) but using a playlist (.fpl) I have created of all tracks with the tracknumber tag value “1”.

What I want to achieve is a filter that contains only files and every file where two or more contain identical MD5 hash values [$info(md5)].

I’m not good technically so would really appreciate if someone could talk me through exactly what I would need to do to filter my playlist accordingly. I’m happy to explain why I’d like to do it in the way I’ve described if it is not obvious.

Thanks.

Filtering Duplicate FLACs Using MD5 Hash

Reply #1
Is it even possible  to do what I'm asking?

Filtering Duplicate FLACs Using MD5 Hash

Reply #2
It's not possible with a playlist. You need to enable your media library and then install foo_facets. Configure with it with a custom column containing [$info(md5)]. Then right click the column header to enable statistics>Items.. You'll want to reverse sort this so anything with more than one match appears first. Then in the next pane to the right, you can configure it with a custom %path% column so you can see the location of your duplicate files.

I don't have any duplicate files but I made a copy just to show it as an example:


Filtering Duplicate FLACs Using MD5 Hash

Reply #3
It's not possible with a playlist. You need to enable your media library and then install foo_facets. Configure with it with a custom column containing [$info(md5)]. Then right click the column header to enable statistics>Items.. You'll want to reverse sort this so anything with more than one match appears first. Then in the next pane to the right, you can configure it with a custom %path% column so you can see the location of your duplicate files.

Thanks very much - that is very helpful. I'll give this a go but I don't think that my computer can handle my entire media library as it is extremely large, hence why I was trying the playlist route with only the first tracks included. If that fails, I'll have to go through the playlist manually.

Re: Filtering Duplicate FLACs Using MD5 Hash

Reply #4
It's not possible with a playlist. You need to enable your media library and then install foo_facets.
I just wanted to thank you again for your help. That solution worked perfectly and highlighted loads of bit for bit identical files that needed flushing out.

Also, I thought I would correct you on one small point which is that you can use foo_facets on a playlist although in the event, I was surprised to find my server coped with the entire collection, so happily I now have a workable and dynamic music library.

Re: Filtering Duplicate FLACs Using MD5 Hash

Reply #5
Yes you are right. I had been aware of that in the past but I completely forgot.

Re: Filtering Duplicate FLACs Using MD5 Hash

Reply #6
I just create column in playlist, in pattern I set $info(md5)
Spoiler (click to show/hide)
I have another question. Is it possible to automatically remove lines from the playlist that do not have duplicates with the same values ​ ​ in a specific column? In this case, I would like only groups of duplicates to remain in the playlist. Like this:
Spoiler (click to show/hide)

Re: Filtering Duplicate FLACs Using MD5 Hash

Reply #7
That's what facets as described above will give you.

If it were me, I'd be using foo_sqlite. Using this you can run sql against your media library/playlist to identify duplicates (based on md5) and then push the resulting tracks out to a playlist.

Code: [Select]
drop table if exists Playlist_Updatable;
create virtual table Playlist_Updatable USING MetaDB_Module(no_multivalue_split,playlist);

insert into Playlist_Updatable (path, playlist_name)
  select path,
         'Duplicate Tracks'
  from Playlist
  where playlist_index = active_playlist()
            and md5 in (select md5
                  from Playlist
                  where playlist_index = active_playlist()
                  group by md5
                  having count(*)>1
               );

drop table Playlist_Updatable;

If you want to run it against your entire media library, just use
Code: [Select]
drop table if exists Playlist_Updatable;
create virtual table Playlist_Updatable USING MetaDB_Module(no_multivalue_split,playlist);

insert into Playlist_Updatable (path, playlist_name)
  select path,
         'Duplicate Tracks'
  from mediaLibrary
  where md5 in (select md5
                  from mediaLibrary
                  group by md5
                  having count(*)>1
               );

drop table Playlist_Updatable;

Re: Filtering Duplicate FLACs Using MD5 Hash

Reply #8
With the latest version of foo_sqlite you should omit
Code: [Select]
drop table if exists Playlist_Updatable;
create virtual table Playlist_Updatable USING MetaDB_Module(no_multivalue_split,playlist);
at the beginning and
Code: [Select]
drop table Playlist_Updatable;
at the end of your scripts and use PlaylistUpdatable instead of Playlist_Updatable. PlaylistUpdatable is now automatically created.

Re: Filtering Duplicate FLACs Using MD5 Hash

Reply #9
Thank you. I tried execute both commands, but SQLite console says "SQLite Error:(1) no such column: md5"

 

Re: Filtering Duplicate FLACs Using MD5 Hash

Reply #10
Thank you. I tried execute both commands, but SQLite console says "SQLite Error:(1) no such column: md5"
foo_sqlite comes with a bunch of default tags/columns set up by default; to add additional tags that exist in your media library go to 'File > Preferences > Media Library > SQLite viewer' and select 'MediaLibrary table', then click 'Add defaults' and 'Apply'. Does this add md5 to the list of columns available? Repeat for 'Playlist table'.

If that doesn't add it, can you let me know which version of foobar you're using, and if you have foobar configured with a media library of if you're just adding files to foobar via playlists?

I'm out for a few hours now, but will take a look when I'm back later if that doesn't work.

Re: Filtering Duplicate FLACs Using MD5 Hash

Reply #11
foo_sqlite comes with a bunch of default tags/columns set up by default; to add additional tags that exist in your media library go to 'File > Preferences > Media Library > SQLite viewer' and select 'MediaLibrary table', then click 'Add defaults' and 'Apply'. Does this add md5 to the list of columns available?
Yes, now there are tags in the MediaLibrary table, including md5. I executed this script:
Quote
insert into Playlist_Updatable (path, playlist_name)
  select path,
         'Duplicate Tracks'
  from mediaLibrary
  where md5 in (select md5
                  from mediaLibrary
                  group by md5
                  having count(*)>1
               );
Now I understand, that SQLite takes the list of tracks from medialibrary, I configured it too, it has status monitoring folders wich have duplicates.
Spoiler (click to show/hide)
Result can see:
Spoiler (click to show/hide)
Thank you very much for help, plugin and scripts  :)

P.S. I also tried execute this script:
Quote
insert into Playlist_Updatable (path, playlist_name)
  select path,
         'Duplicate Tracks'
  from Playlist
  where playlist_index = active_playlist()
            and md5 in (select md5
                  from Playlist
                  where playlist_index = active_playlist()
                  group by md5
                  having count(*)>1
               );
But it still says "no such column: md5":
Spoiler (click to show/hide)

Re: Filtering Duplicate FLACs Using MD5 Hash

Reply #12
foo_sqlite comes with a bunch of default tags/columns set up by default; to add additional tags that exist in your media library go to 'File > Preferences > Media Library > SQLite viewer' and select 'MediaLibrary table', then click 'Add defaults' and 'Apply'. Does this add md5 to the list of columns available? Repeat for 'Playlist table'.
Did you also 'Add defaults' to the 'Playlist table' too?


Re: Filtering Duplicate FLACs Using MD5 Hash

Reply #14
Glad you got it sorted.

It's a great plugin, and is the only tool I know of that enables you to perform queries across groups of tracks e.g. at the album level.

I use it a lot for QA purposes, things like checking that there's only one artistsort per artist, but have also used it for updating tags. There was a guy on the MinimServer forum that wanted to create a tag that contained the album duration so he could show it in his control point, and this plugin made that trivial. I also added DISCTOTAL to my library as I hadn't included it originally but found a use for it when I was performing conversions.


Re: Filtering Duplicate FLACs Using MD5 Hash

Reply #15
With the latest version of foo_sqlite you should omit
Code: [Select]
drop table if exists Playlist_Updatable;
create virtual table Playlist_Updatable USING MetaDB_Module(no_multivalue_split,playlist);
at the beginning and
Code: [Select]
drop table Playlist_Updatable;
at the end of your scripts and use PlaylistUpdatable instead of Playlist_Updatable. PlaylistUpdatable is now automatically created.
In my recent scripts I had moved over to using PlaylistUpdatable, but I hadn't realised it was automatically available so that simplifies the scripts a bit.

Thanks again for the plugin.