Skip to main content

Topic: (possible feature request) field for audio hash (Read 2728 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • Nisto
  • [*][*]
(possible feature request) field for audio hash
Hi. Is there a way to access a hash (of any kind), or actually just any way to identify the actual audio of a file? I'm using the customdb component, and it needs key(s) in order to know which tags goes to which tracks. I have only used tags as keys so far, but I realized I come across tag-less files--and even files incapable of being tagged--every so often, so that of course means any data for these type of files, from customdb, will be merged with other tag-less files. But that's really just one of the ways tags or general file information can crash (for me anyway).

Anyway... as far as I know, foobar2000 doesn't really provide a simple "%hash%" field right? (Sure, there's %__md5% but not all containers define MD5 hashes...) If that's really the case, then I'm guessing the reason it hasn't been implemented yet, is either because you didn't see any use for it, or because it would slow down the software a bit? But how would it be if you at least allowed users to "enable" a hash field through a standard component (e.g. optional when installing fb2k)?

I did read this topic by the way, but I don't like the idea of storing these IDs in the tags.

Thanks!

  • mudlord
  • [*][*][*][*][*]
  • Developer (Donating)
(possible feature request) field for audio hash
Reply #1
So you basically want a component that adds a audio hash for the raw PCM/data from the decoders, right?

  • Nisto
  • [*][*]
(possible feature request) field for audio hash
Reply #2
So you basically want a component that adds a audio hash for the raw PCM/data from the decoders, right?


Yeah! Is it doable?

  • kode54
  • [*][*][*][*][*]
  • Administrator
(possible feature request) field for audio hash
Reply #3
It would need to scan every file at least once, and then it would need to store the hash somewhere. And then I'm not really sure how it would display it, if it were not stored as a normal metadata field.

  • Nisto
  • [*][*]
(possible feature request) field for audio hash
Reply #4
It would need to scan every file at least once, and then it would need to store the hash somewhere. And then I'm not really sure how it would display it, if it were not stored as a normal metadata field.


Well, as I said, I don't want hashes stored in the tags anyway, so that wouldn't help me very much... If I were to do it that way, then I actually wouldn't even be using customdb at the moment: the reason I AM using the component is because, FLAC for example, doesn't (yet) have some standard fields which I make use of in customdb officially specified.

Isn't there a way you can fingerprint the audio "on-the-fly" somehow? Perhaps if you just hashed the first/last few x bytes of the audio, would that make it more lightweight?

  • Kohlrabi
  • [*][*][*][*][*]
  • Global Moderator
(possible feature request) field for audio hash
Reply #5
Well, as I said, I don't want hashes stored in the tags anyway, so that wouldn't help me very much... If I were to do it that way, then I actually wouldn't even be using customdb at the moment: the reason I AM using the component is because, FLAC for example, doesn't (yet) have some standard fields which I make use of in customdb officially specified.
Is there a (compelling) reason why you don't want to store the hash metainfo in a metadata field, but rather insist on some other "official" solution? Sounds like you make life harder for yourself than it needs to be.

That said, this problem might be interesting and easy enough to hone my f2k-component coding skills. Or Peter could expand the verifier component.
  • Last Edit: 28 October, 2012, 07:23:22 PM by Kohlrabi
It's only audiophile if it's inconvenient.

  • romor
  • [*][*][*][*][*]
(possible feature request) field for audio hash
Reply #6
Hi. Is there a way to access a hash (of any kind), or actually just any way to identify the actual audio of a file?

foo_biometric can write audio fingerprint to tag

I'm using the customdb component, and it needs key(s) in order to know which tags goes to which tracks.

can you use path?
$crc32(%path%)
maybe append subsong index in case you use cue sheets or chapters and similar?

  • Nisto
  • [*][*]
(possible feature request) field for audio hash
Reply #7
Is there a (compelling) reason why you don't want to store the hash metainfo in a metadata field, but rather insist on some other "official" solution?

One other reason why I don't want to store things in the tags is because that means I'll have to do that for anything new I ever rip / download. Imagine if I forget to tag the track(s) before playing or rating anything? If I rate a track when the hash tag has not yet been applied, then there's still an easy possibility of a crash with tag/key-less tracks. I just don't see it becoming a habit on my end... It would be much easier if some sort of identification was available directly. Actually, I think even the value of the first or last non-null byte of the audio would be enough, because I can couple it with the sample count (and the sample count (%length_samples%) of a file does crash with a few other files in my collection when only using that as the key with customdb, but not many at all). Is it still too much to ask.. ?

That said, this problem might be interesting and easy enough to hone my f2k-component coding skills.

I would really appreciate the help!

foo_biometric can write audio fingerprint to tag

Please read my previous posts fully (I know of foo_biometric already).

can you use path?
$crc32(%path%)
maybe append subsong index in case you use cue sheets or chapters and similar?

I'm afraid not :/ Usually when I download or rip stuff, I put the files in a temporary folder, then I re-tag the files (which I rarely do right away) and put it in my proper music folder. By that time I'm sure to have played the tracks at least a few times, and maybe even rated them, so...
  • Last Edit: 29 October, 2012, 05:31:09 AM by Nisto

  • Kohlrabi
  • [*][*][*][*][*]
  • Global Moderator
(possible feature request) field for audio hash
Reply #8
One other reason why I don't want to store things in the tags is because that means I'll have to do that for anything new I ever rip / download.
How is that affected by where this information is stored?

It would be much easier if some sort of identification was available directly.
What does "directly" mean? Hash it on-the-fly? That seems excessive and highly impractical.

Actually, I think even the value of the first or last non-null byte of the audio would be enough, because I can couple it with the sample count (and the sample count (%length_samples%) of a file does crash with a few other files in my collection when only using that as the key with customdb, but not many at all).
The method of hashing is completely unrelated to the means of storing the hash.

That said, this problem might be interesting and easy enough to hone my f2k-component coding skills.

I would really appreciate the help!
I can't promise anything, since I don't have much free time this week, and my skills are rather undeveloped and rusty.
It's only audiophile if it's inconvenient.

  • maruseru
  • [*]
(possible feature request) field for audio hash
Reply #9
I'd prefer CRC

  • Nisto
  • [*][*]
(possible feature request) field for audio hash
Reply #10
How is that affected by where this information is stored?

Because actually storing a hash in the file means it'll have to be done manually for everything I open? Even if that could be automatically done, I just don't like it...

What does "directly" mean? Hash it on-the-fly? That seems excessive and highly impractical.

Yes. If it's impractical, can you tell me something that IS practical? As I've said like three times already, it doesn't actually need to be a hash, and not even of the whole audio chunk. Anything to further identify something of the actual audio. Like the peak dB (though I don't use ReplayGain or anything, so not sure that's possible...) or something.

  • naturfreak
  • [*][*][*]
(possible feature request) field for audio hash
Reply #11
I'd prefer CRC

Hmm. Problem: Many possible collisions -> Different audio files could have the same hash values.
A hash should be at least 64 Bits long to avoid probality of such collisions.

  • Kohlrabi
  • [*][*][*][*][*]
  • Global Moderator
(possible feature request) field for audio hash
Reply #12
How is that affected by where this information is stored?

Because actually storing a hash in the file means it'll have to be done manually for everything I open? Even if that could be automatically done, I just don't like it...
I see only two methods of doing it: Analysing the file upon playback, or analysing a selected group of songs by manually invoking a scan of the data. This is for example how Zao's seekbar does it, as far as I know. But then this information will only be useful if it can be stored somewhere, and thus be accessible not only during playback. I think Zao stores the waveform information in his own serialization container/database, since attaching that info into the metadata would be quite excessive. Also the audio stream itself is essentially the same information, making it quite redundant, too. So, one could come up with a component which stores all the hashed information in its own database, so files don't get altered. I just don't know how the hash can then be made accessible to title formatting functions, so I'd still prefer a tag, since it is essentially only some bytes of data, and can be transparently accessed by any function or component which can use tags/title formatting. But I guess there is a way to "create" title formatting field references, since playback statistics do that.

I should just start doing it I guess.
  • Last Edit: 29 October, 2012, 06:36:19 PM by Kohlrabi
It's only audiophile if it's inconvenient.

  • mudlord
  • [*][*][*][*][*]
  • Developer (Donating)
(possible feature request) field for audio hash
Reply #13
I'd prefer CRC

Hmm. Problem: Many possible collisions -> Different audio files could have the same hash values.
A hash should be at least 64 Bits long to avoid probality of such collisions.


Same logic applies to tons of things like MD5. So something like the SHA-1 standard would be needed. Or whatever people would want in such a component if one makes it.
  • Last Edit: 30 October, 2012, 07:18:49 PM by mudlord

  • mudlord
  • [*][*][*][*][*]
  • Developer (Donating)
(possible feature request) field for audio hash
Reply #14

  • Nisto
  • [*][*]
(possible feature request) field for audio hash
Reply #15
What type of hash is this? Also, can I access the hash somehow, without scanning a file first?

  • mudlord
  • [*][*][*][*][*]
  • Developer (Donating)
(possible feature request) field for audio hash
Reply #16
Hash is SHA-1, and no, you must scan a file first.