Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Extracting the MD5 for all WavPack files and including the filepath (Read 1474 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Extracting the MD5 for all WavPack files and including the filepath

Running
Code: [Select]
wvunpack -f *.wv 
lists files with their attributes including MD5, but omits the full path to the file.

Is there a way to have wvunpack return the data points including the full file path rather than just the filename?

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #1
Presuming you're talking Windows, you could pass each file into wvunpack individually, and report both the file path and the wvunpack results.

Something like (untested):

Code: [Select]
for %G in (*.wv) do ( set /P _dummy=%~pG <nul ; wvunpack -f %G )

The "set /P..." is a means to echo a string without the CRLF at the end, so it provides a prefix to anything else getting echoed – in this case %~pG is the path to the entity in %G, so that when wvunpack reports the filename, it has been prefixed by the path.

If you're going to put this into a .BAT rather than just use on the command line, you will need to substitute %%G instead of %G.

This might need some tweaking according to exactly what wvunpack outputs and exactly what you want to see.  On the other hand, as you seem to be running wvunpack in a particular directory, I would have thought it was pretty easy to keep track of the path.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #2
Hmm. The problem is that WavPack doesn't get the full filepath in that case. I suppose there is a non-portable way to get that, but I never looked into it.

However, what you can do is specify the full filepath that you want included. For example this worked for me (on Linux):

Code: [Select]
wvunpack -f /home/david/Music/Miloš\ Karadaglić/Baroque/*.wv


Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #3
Hmm. The problem is that WavPack doesn't get the full filepath in that case. I suppose there is a non-portable way to get that, but I never looked into it.

However, what you can do is specify the full filepath that you want included. For example this worked for me (on Linux):

Code: [Select]
wvunpack -f /home/david/Music/Miloš\ Karadaglić/Baroque/*.wv



Thanks.  I’m also using Linux.  Hoping to trawl my music drives and grab the full path and md5 for *.wv in the tree.  It’s a great way for finding duplicated tracks and albums (sort by md5 and concatenate into a long string), regardless of metadata. Looks like I might have to cobble something together with a bash script.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #4
Thanks.  I’m also using Linux.  Hoping to trawl my music drives and grab the full path and md5 for *.wv in the tree.  It’s a great way for finding duplicated tracks and albums (sort by md5 and concatenate into a long string), regardless of metadata. Looks like I might have to cobble something together with a bash script.
Yes, a bash script is the way to go, along similar lines as what I suggested for Windows.  I would have offered a recursive "for" had you been more specific.

However, I don't think there's an easy way to recurse directories in a for loop (unlike Windows).  I think you'll have to use 'find' to locate the directories, feed that output into a 'for' loop on *.wv into 'wvunpack'.  I believe (eg) $1 as a parameter (representing a file found by 'for') is the whole file path, so feeding that into 'wvunpack' should (according to the above posts) produce your prefixed output.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #5
Presuming you're talking Windows

Thanks for the detailed response - I’ll need to do something similar in Linux.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #6
Check reply above.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #7
Thanks.  I’m also using Linux.  Hoping to trawl my music drives and grab the full path and md5 for *.wv in the tree.  It’s a great way for finding duplicated tracks and albums (sort by md5 and concatenate into a long string), regardless of metadata. Looks like I might have to cobble something together with a bash script.
Depending on the size of the music collection you're talking about, something as boneheaded as this might work (and this was on a Raspberry Pi):
Code: [Select]
pi@reprise-center:/mnt/usb $ wvunpack -f */*/*.wv */*/*/*.wv | wc -l
9009

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #8
Thanks, I tried that in a current tree and it through errors.  Here's something I conjured:

Code: [Select]
#!/bin/bash
# bash script to find all .wv files in a given tree

# Define the search directory (default to current directory if not provided)
SEARCH_DIR="${1:-.}"

# Find all .wv files and process them
find "$SEARCH_DIR" -type f -name "*.wv" | while read -r file; do
    full_path="$(realpath "$file")"
    echo -n "$full_path | "
    wvunpack -f7 "$file"
done

The output is easily redirected to a CSV file and imported into a database table.

 

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #9
I've enhanced it a little further to make it plug and play for my SQL script that finds duplicate albums:

Code: [Select]
#!/bin/bash
# bash script to find all .wv files in a given tree and return the full path including filename, the path excluding filename, the filename and the wavpack md5 of the audio stream

# Define the search directory (default to current directory if not provided)
SEARCH_DIR="${1:-.}"

# Find all .wv files and process them
find "$SEARCH_DIR" -type f -name "*.wv" | while read -r file; do
    full_path="$(realpath "$file")"
    dir_path="$(dirname "$full_path")"
    file_name="$(basename "$full_path")"
    echo -n "$full_path|$dir_path|$file_name|"
    wvunpack -f7 "$file"
done

If you redirect the output of that script to a csv file using >> to append the file for every subtree you add, you can find duplicate albums by running the following code in Sqlite (obv. you need to create a database first):
Code: [Select]
CREATE TABLE alib (
    __path     TEXT,
    __dirpath  TEXT,
    __filename TEXT,
    __md5sig   TEXT
);

Import the csv file into the table

Code: [Select]
DROP TABLE IF EXISTS __dirpath_content_concat__md5sig;

CREATE TABLE __dirpath_content_concat__md5sig (__dirpath TEXT, concat__md5sig TEXT);

INSERT INTO
  __dirpath_content_concat__md5sig (__dirpath, concat__md5sig)
SELECT
  __dirpath,
  group_concat (__md5sig, " | ")
FROM
  (
    SELECT
      __dirpath,
      __md5sig
    FROM
      alib
    ORDER BY
      __dirpath,
      __md5sig
  )
GROUP BY
  __dirpath;

DROP TABLE IF EXISTS __dirpaths_with_same_content;

CREATE TABLE __dirpaths_with_same_content (killdir TEXT, __dirpath TEXT, concat__md5sig TEXT);

INSERT INTO
  __dirpaths_with_same_content (__dirpath, concat__md5sig)
SELECT
  __dirpath,
  concat__md5sig
FROM
  __dirpath_content_concat__md5sig
WHERE
  concat__md5sig IN (
    SELECT
      concat__md5sig
    FROM
      __dirpath_content_concat__md5sig
    GROUP BY
      concat__md5sig
    HAVING
      count(*) > 1
  )
ORDER BY
  concat__md5sig,
  __dirpath;

DROP TABLE IF EXISTS __dirpaths_with_albums_to_kill;

CREATE TABLE __dirpaths_with_albums_to_kill (__dirpath TEXT, concat__md5sig TEXT);

INSERT INTO
  __dirpaths_with_albums_to_kill (__dirpath, concat__md5sig)
SELECT
  __dirpath,
  concat__md5sig
FROM
  __dirpaths_with_same_content
WHERE
  rowid NOT IN (
    SELECT
      max(rowid)
    FROM
      __dirpaths_with_same_content
    GROUP BY
      concat__md5sig
  );

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #10
If you don't feel like scripting it - after all, you will have to process the file it generates too:
* Wine up foobar2000.
* Set up a ReFacets panel with a number-of-items column and a MD5 column. Will count the number of entries for each MD5 sum it encounters.
* Sort the number column. You'll get the duplicates/multiplicates first.

But: if you have CDs as images with cuesheets, then fb2k reads them as one entry per track. You should then probably first filter to get those with track number not greater than 1. (That catches the missing ones.) Other caveats apply, like HTOA tracks. And say your ripping/tagging application starts enumeration on track 2 for audio on "Playstation" type CDs (whey have the data session as track number 1.)
And WavPack calculating the MD5 with source file's endianness, so if you have the same audio from AIFF and WAVE, etc.

Re: Extracting the MD5 for all WavPack files and including the filepath

Reply #11
Actually, if you use tracks: I have quite a few duplicates (that contain truly identical audio). CDs that have these silent tracks until a bonus track or two at the end.
Biggest number of hits have an MD5 of 1234DD57F3AF7775D57493B54D59BCEB . That is precisely 4 seconds - 176 400 samples - of silence. Hundreds of "tracks", actually. Also more than one CD has a silent track of 178 164 samples.

Think twice over the need to dedupe them - they don't take up much space, and deleting them will make it much harder to do AccurateRip verification and CUETools repairs.