Hi all!
I've got a project in development, in which I aim at providing a website that recommends cds to users, and I'm just hacking away at the EAC database-format right now. The file 'cddb.dat', that is.
I currently have the following regex'es, in slightly "Pedantically Eclectic Rubbish-Lister" format:
Hex := \x(00-ff){2}
Delim := \x00
Padding := Delim{3,}
CDDB_ID := Hex{4}
Header := Hex{2} Delim{2} CDDB_ID Hex{3}
Artist := Delim Text+
Album := Delim Text+
Title := Artist Album
Flag := Delim{1,2} \x96 Delim{3}
More := Delim Hex{3} Delim Hex{7} Delim (Unknown | Flag)
Track := Text+ (Hex{2} Delim{2})|(Hex{3} Delim)
LastTrack := Text+
Genre := Delim{2} Hex Delim Text+
CD_info := Header Title More Track+ LastTrack Genre
What I hope some might be able to tell me, is the format of the "Unknown" expression. It appears to me that this section can hold quite a lot of data, so just to make a shortcut, I'd like to hear from anyone who might have any ideas as to the format.
Also, I suspect the hex following the CDDB_ID is some kind of composite, in which case I expect I will have trouble figuring it out.
Any takers?
EDIT:
Oh well, I don't really need any more data than:
* cddb
* artist
* album
, so this suffices:
# basic grammar
my $delim = '\x00';
my $special_1 = '[\x00-\x02]';
my $hex = '[\x00-\xff]';
my $text = '\p{IsPrint}';
# grammar-constructed expressions
my %patterns = (
'head_of_cd_record' => qr/$delim{2}($hex{4})$hex{3}/, #cddb_id in little endian
'head_of_artist_album' => qr/$special_1/,
'artist_album' => qr/($text+?)$delim($text+?)$delim/,
'end_of_cd_record' => qr/$delim{4,}/,
);
It extracts the data correctly for a cddb.dat-file having 307 cds, with only one error, which may or may not be due to EAC itself, so I'm satisfied with that.
The error, in case anyone wonders, is:
0000de0: 0000 0000 0000 0000 0000 0000 0000 f200 ................
0000df0: 0000 0a9c 0a82 4fda c800 b45e acc2 00a5 ......O....^....
0000e00: 7db3 be00 0a0b 0a00 ffff ffff 4d1c 0300 }...........M...
0000e10: 0096 0000 00a5 7db3 be00 0070 7e00 00b3 ......}....p~...
0000e20: 73a6 5ebb f5b3 a3a4 a3b5 b9a7 dab6 dc3f s.^............?
0000e30: 0000 95c7 0000 b367 a4df 0000 4613 0100 .......g....F...
0000e40: b6fa a7aa 0000 3344 0100 a670 b9da aaec ......3D...p....
0000e50: bff4 0000 2f8c 0100 b25c bff4 0000 22aa ..../....\....".
0000e60: 0100 a677 bca2 0000 78fc 0100 a4d1 a8cf ...w....x.......
0000e70: 0000 b548 0200 b750 c1c2 a741 a5ce a4df ...H...P...A....
0000e80: b752 a7da 0000 f0e2 0200 bc5a b8a8 0000 .R.........Z....
0000e90: 0200 0000 0000 0000 0000 0000 0000 0000 ................
, which gives the output of (probably illegible):
´^¬Â ¥}³¾