HydrogenAudio

Hydrogenaudio Forum => Scientific Discussion => Topic started by: DrDoogie on 2004-02-27 23:50:48

Title: Exact Audio Copy database format?
Post by: DrDoogie on 2004-02-27 23:50:48
Hi all!

I've got a project in development, in which I aim at providing a website that recommends cds to users, and I'm just hacking away at the EAC database-format right now. The file 'cddb.dat', that is.

I currently have the following regex'es, in slightly "Pedantically Eclectic Rubbish-Lister" format:

Quote
Hex  := \x(00-ff){2}
Delim   := \x00
Padding  := Delim{3,}
CDDB_ID  := Hex{4}
Header  := Hex{2} Delim{2} CDDB_ID Hex{3}
Artist  := Delim Text+
Album  := Delim Text+
Title  := Artist Album
Flag  := Delim{1,2} \x96 Delim{3}
More  := Delim Hex{3} Delim Hex{7} Delim (Unknown | Flag)
Track  := Text+ (Hex{2} Delim{2})|(Hex{3} Delim)
LastTrack   := Text+
Genre  := Delim{2} Hex Delim Text+

CD_info  := Header Title More Track+ LastTrack Genre


What I hope some might be able to tell me, is the format of the "Unknown" expression. It appears to me that this section can hold quite a lot of data, so just to make a shortcut, I'd like to hear from anyone who might have any ideas as to the format.
Also, I suspect the hex following the CDDB_ID is some kind of composite, in which case I expect I will have trouble figuring it out.

Any takers?

EDIT:
Oh well, I don't really need any more data than:
* cddb
* artist
* album

, so this suffices:
Code: [Select]
# basic grammar
my $delim = '\x00';
my $special_1 = '[\x00-\x02]';
my $hex = '[\x00-\xff]';
my $text = '\p{IsPrint}';

# grammar-constructed expressions
my %patterns = (
    'head_of_cd_record' => qr/$delim{2}($hex{4})$hex{3}/, #cddb_id in little endian
    'head_of_artist_album' => qr/$special_1/,
    'artist_album' => qr/($text+?)$delim($text+?)$delim/,
    'end_of_cd_record' => qr/$delim{4,}/,
 );


It extracts the data correctly for a cddb.dat-file having 307 cds, with only one error, which may or may not be due to EAC itself, so I'm satisfied with that.

The error, in case anyone wonders, is:
Code: [Select]
0000de0: 0000 0000 0000 0000 0000 0000 0000 f200  ................
0000df0: 0000 0a9c 0a82 4fda c800 b45e acc2 00a5  ......O....^....
0000e00: 7db3 be00 0a0b 0a00 ffff ffff 4d1c 0300  }...........M...
0000e10: 0096 0000 00a5 7db3 be00 0070 7e00 00b3  ......}....p~...
0000e20: 73a6 5ebb f5b3 a3a4 a3b5 b9a7 dab6 dc3f  s.^............?
0000e30: 0000 95c7 0000 b367 a4df 0000 4613 0100  .......g....F...
0000e40: b6fa a7aa 0000 3344 0100 a670 b9da aaec  ......3D...p....
0000e50: bff4 0000 2f8c 0100 b25c bff4 0000 22aa  ..../....\....".
0000e60: 0100 a677 bca2 0000 78fc 0100 a4d1 a8cf  ...w....x.......
0000e70: 0000 b548 0200 b750 c1c2 a741 a5ce a4df  ...H...P...A....
0000e80: b752 a7da 0000 f0e2 0200 bc5a b8a8 0000  .R.........Z....
0000e90: 0200 0000 0000 0000 0000 0000 0000 0000  ................


, which gives the output of (probably illegible):
Code: [Select]
´^¬Â    ¥}³¾