Skip to main content

Topic: Collecting ideas for a free, perfect hashing tool (Read 13538 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • sn0wman
  • [*][*]
Collecting ideas for a free, perfect hashing tool
please submit your ideas which you can't find implemented in any hashing tool around, or which feature is so important that you would not use a program without it.

many thanks, sn0wman.

  • RedFox
  • [*][*]
Collecting ideas for a free, perfect hashing tool
Reply #1
There are already nice hashing & checksum tools, eg: fsum or par2 (includes recovery), but what I miss most is the ability to calculate a hash of audio files that applies only to the audio part.
Ie: I store & update tag values in the files, so any hash calculated for the file would be incorrect after I change the value of a tag in that file.
Some lossless formats include verification (eg: flac), but iirc, mp3 doesn't.
Best audio player for the power user: foobar2000

  • sn0wman
  • [*][*]
Collecting ideas for a free, perfect hashing tool
Reply #2
thats the main feature of my oss application, be patient .
i want to do as much as possible before releasing it, thats why i posted this topic, now i am asking myself is that only me who wants so much from a hashing utility ? many algorithms implemented, unicode, regular expression for searching the files, shell integration including own context hashing 'profiles', cumulative folder content logging etc ? i hope other ppl will find it usefull, saying nothing of the mentioned extraordinary audio features.

  • zima
  • [*][*][*]
Collecting ideas for a free, perfect hashing tool
Reply #3
I wonder...how far will you take shell integration? Few steps I imagine:
1) "check integrity of file" option in right click menu
2) field in right click menu that automatically shows whether file is correct when right-clicking (possible?)
3) something in tray that automatically checks files (can be limited/predermined to, for example, only from removable media) and marks their icons "yeah, this one's ok" (possible?)

  • legg
  • [*][*][*]
Collecting ideas for a free, perfect hashing tool
Reply #4
An online hash database. Even when this is risky, it might be of help for those that rip extremely damage cds.

  • kwanbis
  • [*][*][*][*][*]
  • Developer (Donating)
Collecting ideas for a free, perfect hashing tool
Reply #5
isn't that accuraterip?

  • legg
  • [*][*][*]
Collecting ideas for a free, perfect hashing tool
Reply #6
isn't that accuraterip?


Dunno, the idea just crossed my mind. But nevertheless it might be a good feature for his tool.

  • sn0wman
  • [*][*]
Collecting ideas for a free, perfect hashing tool
Reply #7
isn't that accuraterip?


doesn't accurate rip base on a wav's checksum ?
so using it (?) for ordinary hashing is useless.
however, the application is also able to calculate the lossless files fingerprints so maybe this would be the place for it, nice .

thanks for all sugesstions, other are welcome.

  • emtee
  • [*][*][*]
Collecting ideas for a free, perfect hashing tool
Reply #8
1) Integrity check of md5, sfv, par, par2 files.
2) Directory recursive.
3) Multiplatform.
4) GUI-based.

These would be awesome

  • krmathis
  • [*][*][*][*][*]
Collecting ideas for a free, perfect hashing tool
Reply #9
What emtee mention. In addition to the 'hash of audio files that applies only to the audio part' which RedFox suggested.
A command line application is fine for me, since I always have Terminal open anyways..

Collecting ideas for a free, perfect hashing tool
Reply #10
How about having the resulting hash be simple enough that it can be included as a field of the metadata of the file itself?

I'm not sure what I mean by "simple"... I guess I'm leaving it up to the reader to decide.

It'd be nice if this is something that would eventually find native support in all major music players.

The hash should also ideally be immune to changes to the audio stream that don't affect the decoded output, like audio that's been padded with digital silence should have the same hash as audio without silence.

  • Triza
  • [*][*][*][*]
Collecting ideas for a free, perfect hashing tool
Reply #11
1) Integrity check of md5, sfv, par, par2 files.
2) Directory recursive.
3) Multiplatform.
4) GUI-based.

These would be awesome


Actually No.

New 4) 1st we need a COMMANDLINE based. Then someone can create a wrapper on the top of that.

5) Open source
6) cross-platform

Otherwise I won't be able to use it.

Triza

  • p0l1m0rph1c
  • [*][*]
Collecting ideas for a free, perfect hashing tool
Reply #12
How about having the resulting hash be simple enough that it can be included as a field of the metadata of the file itself?

I'm not sure what I mean by "simple"... I guess I'm leaving it up to the reader to decide.

It'd be nice if this is something that would eventually find native support in all major music players.

The hash should also ideally be immune to changes to the audio stream that don't affect the decoded output, like audio that's been padded with digital silence should have the same hash as audio without silence.


Heh, what can be simpler than 52E2B834 (CRC32) or D75909AF25EF3788957459263AD0D74D (MD5)?
Easily fits into any type of tag.

  • norz
  • [*]
Collecting ideas for a free, perfect hashing tool
Reply #13
thats the main feature of my oss application, be patient .

Maybe you could base the audio decoding part on existing plugins (eg: 1by1 player uses winamp plugins)
Or maybe -given that it's oss- it's better to include existing libraries?
Just a thought, I'm not a developer

  • pepoluan
  • [*][*][*][*][*]
Collecting ideas for a free, perfect hashing tool
Reply #14
While we're on the topic of hashes...

Why must audio files be hashed using MD5? It's too complicated I think. I mean, MD5's main function is not for error-checking, rather to prevent willful tampering.

For the normal damages that happen to audio files, CRC32 is enough. Perhaps 2 CRC32 values with different polynomials. Should be quite robust. And it's easier to implement. Not to mention a wholelottafaster.
Nobody is Perfect.
I am Nobody.

http://pandu.poluan.info

  • sn0wman
  • [*][*]
Collecting ideas for a free, perfect hashing tool
Reply #15
for the firsth, i am not collecting ideas for something i gonna start with, but for something i have started about 1 year ago, so the work is in very advanced stadium. that implies:
    - application
wont be (too far) cross-platform, however i will try to make its engine (there is one) to be;
- application is GUI based, but commandline parameters passing is on the TODO list, standalone
commandline version also, and it may (?) be cross-platform;
- application already features MD5, CRC16&32 and many others;[/li][/list]and now:
    - i
like legg's ideas of making use of accuraterip database, online and offline, also par/par2 file checking/creating is a new idea for me.
- i like  zima's idea about the tray icon. i just like, not say i will do that  !
- i dont like  zima's idea of showing the result in context menu - it sounds very interesting also for me, but we cant forget that showing it (menu) used to be an instant action, we cant wait for the system context menu (hashing !);
- application will store audio hash in a tag (already on TODO);[/li][/list]
Quote
Maybe you could base the audio decoding part on existing plugins (eg: 1by1 player uses winamp plugins)

what you mean by that ? audio hash doesnt need encoding, fingerprint does.
  • Last Edit: 23 April, 2006, 10:59:34 AM by sn0wman

Collecting ideas for a free, perfect hashing tool
Reply #16
- application will store audio hash in a tag (already on TODO);

Just an elaboration of my vague comment a couple posts above. It'd be nice if there were a standard hash (fingerprint?) tag for the the audio just like there is a replaygain value for loudness. I may be thinking along different lines from your original intention. I'm thinking on the level of making like... a new RFC, while I think you're talking about just an application.

  • p0l1m0rph1c
  • [*][*]
Collecting ideas for a free, perfect hashing tool
Reply #17
While we're on the topic of hashes...

Why must audio files be hashed using MD5? It's too complicated I think. I mean, MD5's main function is not for error-checking, rather to prevent willful tampering.

For the normal damages that happen to audio files, CRC32 is enough. Perhaps 2 CRC32 values with different polynomials. Should be quite robust. And it's easier to implement. Not to mention a wholelottafaster.


Well, no one forced whoever to use MD5 for hashing. And well, MD5 is still the hash algorithm which attains the best speed/security ratio. You could use MD4, which is faster but is known to be flawed. Yeah, you could use CRC32, but you have the probability of 1 in 4 billion that the error will not be detected.

Long shot, but why risk it when you can use MD5 (or whatever, like SHA-1 or <insert hash algo here>). The speed is not that whole better. You can probably go 2x faster with CRC32 than with MD5. Maybe a little more. Either way, your speed is bounded by hard drive speed, not by the algorithm.

  • SebastianG
  • [*][*][*][*][*]
  • Developer
Collecting ideas for a free, perfect hashing tool
Reply #18
Well, no one forced whoever to use MD5 for hashing. And well, MD5 is still the hash algorithm which attains the best speed/security ratio. You could use MD4, which is faster but is known to be flawed.

So is MD5 IIRC (flawed in terms of security against an intelligent attacker who intentionally wants to create collisions). But If you just want to protect files against "random corruption" CRC32 is fine, too.

However, if you also plan to use the "hash" as some kind of key in a database it better be large (160 bits or more). Note that the probability of a collision with 2^X randomly generated codes of 2X bits length is around 50%.

Sebi

  • p0l1m0rph1c
  • [*][*]
Collecting ideas for a free, perfect hashing tool
Reply #19
Well, yeah. So is SHA-1 (conceptually, not everyone will bother to do 2^63 iterations, heh). My point there was speed. The advantages of MD5 for other uses other than checksumming (you mentioned databases as example), overcome the not-too-large speed penalty over say, CRC32.

  • rjamorim
  • [*][*][*][*][*]
Collecting ideas for a free, perfect hashing tool
Reply #20
Yeah, you could use CRC32, but you have the probability of 1 in 4 billion that the error will not be detected.


Since nobody is trying to detect intentional tampering here (why would someone bother to tamper the signatures of your music collection? Insert subliminal messages?), I don't see the point of going with full-blown MD5, SHA or WhirlPool. If the case against CRC32 is avoiding collision once every 4 billion times, let's go with CRC64, or CRC256, or CRC65536 if you're really insane

Also, CRC gives you an opportunity to implement some error correction if your stream has few errors ("Correction can also be done if information lost is lower than information held by the checksum"). Cryptographic hashes throw that opportunity out of the window.
  • Last Edit: 24 April, 2006, 09:12:15 PM by rjamorim
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org

  • norz
  • [*]
Collecting ideas for a free, perfect hashing tool
Reply #21
Quote

Maybe you could base the audio decoding part on existing plugins (eg: 1by1 player uses winamp plugins)

what you mean by that ? audio hash doesnt need encoding, fingerprint does.

My mistake: I thought you'd have to decode the audio to hash it (hence to idea of using existing plugins), but I guess you'll just take the audio bits and hash them without any prior processing.

edit: spelling
  • Last Edit: 25 July, 2006, 03:23:40 PM by norz

  • norz
  • [*]
Collecting ideas for a free, perfect hashing tool
Reply #22
@sn0wman: Any news on your project?

  • norz
  • [*]
Collecting ideas for a free, perfect hashing tool
Reply #23
@sn0wman: Any news on your project?

A workaround solution until sn0wman's program is released:
Use a decoder and a hashing program that supports pipes.

Example (on windows):
madplay.exe --output=wave:- "mysong.mp3" | md5sum
This will send a 16bit pcm wave stream to md5sum.
md5sum is a port of gnu utils, from here I think.

I have tested this by replacing some characters in the tags with foobar.
Original and modified files:
- have same size
- have different md5 checksums
- produce decoded wave streams that have the same checksum

---edit begin:
I'm using madplay 0.15.2 (beta).

Regarding tags: my foobar2000 writes id3v1 and ape2 tags to the mp3, and madplay doesn't like this: on those files it will display an error message saying: "error: frame 999: lost synchronization", where 999 is the last decoded frame. However, the md5 checksum will stay the same for an mp3 file without ape2 tags, and after foobar2000 has applied ape2 tag to it.

I've changing my command line a bit:
madplay.exe --output=wave:- --verbose --display-time=remaining %1 | md5sum > %1.md5
This will display remaining time (on terminal) as it processes the file,
and write a .md5 file automatically, which makes it better suited to be called by a batch script to produce .md5 checksums (eg: with sweep)
---edit end
  • Last Edit: 25 July, 2006, 04:44:47 PM by norz

  • sn0wman
  • [*][*]
Collecting ideas for a free, perfect hashing tool
Reply #24
i am not dead, just on holidays now
see you soon with some news, ok. (little basic cmd alpha testing version ? [testing the tag-independent engine])

best regards, sn0wman