HydrogenAudio

Lossless Audio Compression => Lossless / Other Codecs => Topic started by: spoon on 2007-12-13 15:12:14

Title: CD TOC storage in lossless files
Post by: spoon on 2007-12-13 15:12:14
As it stands very few (tagging formats) have provisions to store CD TOC (CD Table of Contents) in the ID Tags, why is it important? the TOC allows for precise identification of an audio file, freedb ID / any online meta data provider, even Accuraterip IDs can be all calculated with the TOC without the need for the CD. Time to clear up the mess that exists.

Those with CD TOCs storing abilities:

Any Id3v2 format (mp3)
WMA **
Wave ++
iTunes Tagging (m4a)

Those without:

Vorbis Comments (ogg, flac)
(ape2) Wavpack, musepack

------
2 'Standards' of storing the CD TOC:

CD drives give out 2 table of contents, either LBA (logical block address) or MSF (minute seconds frames), one can be calculated from the other: for example: 0 LBA = 0 / 2 / 0 MSF (or 2 seconds in). This is given out by the CD drive its self.

Here is where it gets complex, 2 seconds in (MSF) really is 150 LBA (if you take the disc start into account), the reason CD drives return 0 LBA, is for a valid audio CD it has to have 2 seconds of lead-in, anything less is not valid, so everything is referenced from 0 LBA, not 150 LBA.

Many metadata providers use 150 as the starting point when calculating the database identifier (such as freedb id), there is considerable confusement about which should be stored. After much research (existing programs) it seems there are 2 standards:

If the raw (as I call it, or untouched LBA toc is to be stored, ie a binary TOC) it should be stored as is, no adding of 150, to my best knowledge Cdex and Audiograbber both do this in ID3v2.

If a string representation is to be used (ie like WMA which is the only one), then that standard is:

TrackCount + (LBA0+150) + (LBA1+150)

..but stored in Hex, so a 2 track cd might be:

2+96+4901+811E

There is a slight issue if a CD-extra TOC is to be stored, this is another matter and I will post about it later.

-----------

Proposed new standards (we will implement these by Jan 2008)

APE2: Nice an easy, lets call the field 'cdtoc' (unless someone is storing already) and store as a binary item (raw untoched CD TOC).

Ogg Vorbis: Vorbis comments are 'stuck in a rut' if you ask me, not even able to decide simple standards such as Album Artist, Ratings, Album Art, Disc Count, Track Total.
Anyhow, VC cannot have binary items, so go with the WMA string representation as above, call the field 'cdtoc'.

FLAC: 2 choices here, either go the same as Ogg Vorbis, or make a special 'chunk' area like with Album Art, Cue Sheets, etc, it should not really go in RIFF chunks as they do not need to go back to wav files.

iTunes Tagging: Seems to be done, a little like WMA using the rare field iTunes_CDDB_1, Chungalin says it is:
- CDDB ID
- Leadout sector LBA address
- Number of tracks
- For each track: starting track sector LBA address

ie: 9D07F70B+153089+11+150+12007+27749+43216+55300+66449+81003+96490+10955 1+125279+142145

------

** Highlighting WMA as Microsoft's left hand, does not know what the right hand is doing:

http://msdn2.microsoft.com/en-us/library/aa391939.aspx (http://msdn2.microsoft.com/en-us/library/aa391939.aspx)

Basically it says that WM/MCDI should be the same as ID3v2, ie a binary dump, but WMP11 is storing as a unicode string, nothing binary about it at all, also it is worth noting WMP is adding 150 to each LBA address (see above).

-----

++ There are a number of wave tagging standards, BWF, Cart, INFO List, even 'tag ' (id3v2 chunk). There is  provision to store the toc if 'tag ' is used, so wave is covered.

-----

References:

http://musicbrainz.org/doc/DiscIdsAndTagging (http://musicbrainz.org/doc/DiscIdsAndTagging)
Title: CD TOC storage in lossless files
Post by: benski on 2007-12-13 15:30:29
It is also important to save the original track number somewhere safe.  It's quite conceivable that a user might have purposefully or accidently changed the track number (such as re-tagging a song from a compilation with metadata from the album release, or renumbering Disc 2 of a 2CD set to follow the last track # of Disc 1).
Title: CD TOC storage in lossless files
Post by: skamp on 2007-12-13 18:09:34
Ahem (http://www.hydrogenaudio.org/forums/index.php?showtopic=57010). Where have you been? I spent many hours on that, asked for your input, to no avail. And now, over 3 months later, you wake up and decide to take matters into your own hands?
Title: CD TOC storage in lossless files
Post by: jcoalson on 2007-12-13 19:29:17
spoon aside from multi-session discs, is there anything in the CD TOC that is not represented in the FLAC CUESHEET metadata block?  I thought I covered everything.
http://flac.sourceforge.net/format.html#me..._block_cuesheet (http://flac.sourceforge.net/format.html#metadata_block_cuesheet)

because CUESHEET has special support in libFLAC, it allows the encoder to do neat things like create optimal seek points, and the decoder to easily cue by track/index, compute disc IDs, etc without a lot of extra code.
Title: CD TOC storage in lossless files
Post by: skamp on 2007-12-13 19:56:41
aside from multi-session discs

That's a pretty big aside. You can't identify any so-called "Enhanced-CD" (and there are many of those out there) just from cue sheets, internal or external.

is there anything in the CD TOC that is not represented in the FLAC CUESHEET metadata block?  I thought I covered everything.

Cue sheets do not contain information that could be used reliably to determine LBAs, especially when there are exotic discs with negative values.

because CUESHEET has special support in libFLAC, it allows the encoder to do neat things like create optimal seek points, and the decoder to easily cue by track/index, compute disc IDs, etc without a lot of extra code.

Cue sheets and TOCs have different uses. The former are indeed very practical (along with FLAC's neat API and tools) for cutting tracks, specifying pre-emphasis flags, etc... The latter only serve as reliable identification for use with internet databases such as MusicBrainz and FreeDB, or even local ones.
Title: CD TOC storage in lossless files
Post by: jcoalson on 2007-12-13 22:49:45
as I understand it, LBA numbering is done by the drive, it's not an inherent feature of the disc.  the FLAC CUESHEET has the lead-in size from the absolute start of the disc, so it can be used to compute all disc IDs according to their assumption about the program area starting at LBA 0 or 150.

I thought enhanced CDs are just single session with a data track as the last track.  FLAC's CUESHEET handles that case.

anyway for multi-session discs is the plan to store multiple TOCs?  I don't see that addressed in spoon's or skamp's proposal.
Title: CD TOC storage in lossless files
Post by: eevan on 2007-12-13 23:12:30
If we are talking about audio CDs that can be played in an ordinary CD player, than only the first session of a multi-session CD is seen by the player. So I think that there's no point in storing TOCs for other sessions.
Title: CD TOC storage in lossless files
Post by: greynol on 2007-12-13 23:19:35
as I understand it, LBA numbering is done by the drive, it's not an inherent feature of the disc.  the FLAC CUESHEET has the lead-in size from the absolute start of the disc, so it can be used to compute all disc IDs according to their assumption about the program area starting at LBA 0 or 150.
The length of the disc is also provided in the TOC and is necessary in order to calculate a disc ID but is not included in a CUE sheet and apparently has no provision in the organization of your metadata.

I thought enhanced CDs are just single session with a data track as the last track.  FLAC's CUESHEET handles that case.
The enhanced data is stored in a new session.
Title: CD TOC storage in lossless files
Post by: Eli on 2007-12-13 23:23:56
If we are talking about audio CDs that can be played in an ordinary CD player, than only the first session of a multi-session CD is seen by the player. So I think that there's no point in storing TOCs for other sessions.


the point is to be able to uniquely identify the data that came from THAT disc and associate meta data with it.
Title: CD TOC storage in lossless files
Post by: jcoalson on 2007-12-13 23:27:51
as I understand it, LBA numbering is done by the drive, it's not an inherent feature of the disc.  the FLAC CUESHEET has the lead-in size from the absolute start of the disc, so it can be used to compute all disc IDs according to their assumption about the program area starting at LBA 0 or 150.
The length of the disc is also provided in the TOC and is necessary in order to calculate a disc ID but is not included in a CUE sheet and apparently has no provision in the organization of your metadata.
FLAC's CUESHEET also stores the CD TOC's leadout track (offset, track number, etc) from the which the length can be computed according to the disc id methodology.
Title: CD TOC storage in lossless files
Post by: greynol on 2007-12-13 23:40:04
Then you have the added complication that gaps may be left out, ripped individually or prepended to the current track; or is this discussion limited to single-file images?
Title: CD TOC storage in lossless files
Post by: jcoalson on 2007-12-14 01:28:39
ah yes, the CUESHEET block is only for single file images.  that's a problem if you want to store the TOC in individual tracks.
Title: CD TOC storage in lossless files
Post by: skamp on 2007-12-14 02:30:54
FLAC's CUESHEET also stores the CD TOC's leadout track (offset, track number, etc) from the which the length can be computed according to the disc id methodology.

No, it actually stores the point at which the audio ends, which isn't even the CD's first session real lead-out value. And it doesn't give any useful value about the second session either, even with an external cue sheet.

I've spent many hours browsing through documentation, testing, coding, etc... I wish we wouldn't waste time AGAIN debating the merits of FLAC's CUESHEET metadata block or MCDI's binary TOC.
Title: CD TOC storage in lossless files
Post by: jcoalson on 2007-12-14 04:27:18
FLAC's CUESHEET also stores the CD TOC's leadout track (offset, track number, etc) from the which the length can be computed according to the disc id methodology.
No, it actually stores the point at which the audio ends, which isn't even the CD's first session real lead-out value.
for CDDA the FLAC leadout track is track 170 in the CD TOC.  if you 'metaflac --list' a flac image with a CUESHEET block you'll see the track 170.  if that's not where the leadout is specified in the CD TOC, then where is it?
Title: CD TOC storage in lossless files
Post by: Eli on 2007-12-14 11:26:17
ah yes, the CUESHEET block is only for single file images.  that's a problem if you want to store the TOC in individual tracks.


We are talking about single tracks.
Title: CD TOC storage in lossless files
Post by: spoon on 2007-12-14 15:25:17
Depends on what the CD TOC identifier is to be used for, if it is later lookup of online databases the a string tag of:

10+150+1023+1204...

Is fine, even for CD extra. To preserve a CD extra disc 100%, you would need an extended TOC (or full session TOC), which I think is beyond the scope of this and complicates it too much.
Title: CD TOC storage in lossless files
Post by: Eli on 2007-12-22 14:21:20
what is the status on this? Is there a consensus?
Title: CD TOC storage in lossless files
Post by: skamp on 2007-12-22 18:15:36
I guess we'll have to wait another 3 months to find out 
Title: CD TOC storage in lossless files
Post by: Eli on 2008-01-20 15:28:57
I would like to bring this up again as it has again been dropped with out any final resolution that I am aware of.
Title: CD TOC storage in lossless files
Post by: Eli on 2008-02-17 13:24:25
nearly a month since my last bump and its still unclear if this has been resolved.
Title: CD TOC storage in lossless files
Post by: spoon on 2008-02-18 13:38:44
When I next update FLAC and Ogg (under 4 weeks) I will side with 1 of the methods text methods, it looks like the most compatible with other taggers. The only choice is either the WMA or m4a iTunes standard.
Title: CD TOC storage in lossless files
Post by: Eli on 2008-02-29 16:26:53
ok, just wanted to be sure something has been / is being done.
Title: CD TOC storage in lossless files
Post by: spoon on 2008-04-03 14:30:04
Now done, we have a new tag:

CDTOC

see: http://forum.dbpoweramp.com/showthread.php?p=76686#post76686 (http://forum.dbpoweramp.com/showthread.php?p=76686#post76686)
Title: CD TOC storage in lossless files
Post by: skamp on 2008-04-03 14:54:43
7 months for *that*? Gee spoon, thanks!
As usual, you've gone at it on your own. Now, I'd like to know how we're supposed to compute FreeDB DiscIDs without a lead-out address? Without first and last track numbers, how are we supposed to compute MusicBrainz DiscIDs? How are we supposed to handle discs with a data track as track 01? Discs with negative LBAs?
Title: CD TOC storage in lossless files
Post by: spoon on 2008-04-03 15:14:02
>As usual, you've gone at it on your own.

The whole point of this post 7 months ago was to allow an official standard, didn't come so of course I had to do something on my own...dammed if you don't....dammed if you do?

>Discs with negative LBAs?

They do not exist, stored as raw LBA address + 150, so if you had a disc which started 1 second in, not 2 seconds, if there was a disc it would start as 75.

>How are we supposed to handle discs with a data track as track 01?

I have yet to double check how Windows Media Player is storing, these discs are basically only Playstation CDs so I am not too concerned.

>Without first and last track numbers

The first track is 1, always is.

>how are we supposed to compute MusicBrainz DiscIDs?

Possible with the CDTOC tag, even for cd extra, as MB does not use the data track, see the CD Extra I gave an example for:

http://musicbrainz.org/show/release/detail...ow=times#discid (http://musicbrainz.org/show/release/details.html?releaseid=43294&show=times#discid)

Code: [Select]
13 51:08 4:10 3:39 3:12 4:03 3:37 3:59 4:22 3:21 3:56 4:35 4:04 4:10 3:57


LBA FirstIndex: 1 last Index: 14
Type: Audio Track 1 LBA Address: 0 Length: 00:04:10.413
Type: Audio Track 2 LBA Address: 18781 Length: 00:03:39.093
Type: Audio Track 3 LBA Address: 35213 Length: 00:03:11.534
Type: Audio Track 4 LBA Address: 49578 Length: 00:04:03.426
Type: Audio Track 5 LBA Address: 67835 Length: 00:03:36.894
Type: Audio Track 6 LBA Address: 84102 Length: 00:03:59.080
Type: Audio Track 7 LBA Address: 102033 Length: 00:04:21.773
Type: Audio Track 8 LBA Address: 121666 Length: 00:03:21.227
Type: Audio Track 9 LBA Address: 136758 Length: 00:03:56.213
Type: Audio Track 10 LBA Address: 154474 Length: 00:04:35.240
Type: Audio Track 11 LBA Address: 175117 Length: 00:04:04.000
Type: Audio Track 12 LBA Address: 193417 Length: 00:04:09.947
Type: Audio Track 13 LBA Address: 212163 Length: 00:03:55.160
Type: Data Track 14 LBA Address: 229950 Length: 00:11:29.480
Type: Lead Out LBA Address: 281511

CDTOC=D+96+49F3+8A23+C240+10991+1491C+18F27+1DBD8+216CC+25C00+2ACA3+2F41F+33D59+382D4
Title: CD TOC storage in lossless files
Post by: Eli on 2008-04-03 15:15:01
I don't know enough about this myself, so I would like to hear others weigh in on skamp's criticisms. Once implemented I would really like to see this done right from the start.
Title: CD TOC storage in lossless files
Post by: spoon on 2008-04-03 15:37:45
Thinking out loud to preserve the CD Extra lead out, the lead out could be tacked on the end, so for the example I gave:

CDTOC=D+96+49F3+8A23+C240+10991+1491C+18F27+1DBD8+216CC+25C00+2ACA3+2F41F+33D59+382D4+44C3D

The track count would stay 13, not 14, so you would know the last track is a data. This would also allow AccurateRip IDs to be calculated for CD Extra from such a tag.
Title: CD TOC storage in lossless files
Post by: greynol on 2008-04-03 15:43:44
Better.

I'd always specify the lead-out addy, even for non-enhanced discs.
Title: CD TOC storage in lossless files
Post by: spoon on 2008-04-03 15:46:11
>I'd always specify the lead-out addy, even for non-enhanced discs.

It already is, for non cd extra, see 2nd example:

http://forum.dbpoweramp.com/showthread.php?p=76690#post76690 (http://forum.dbpoweramp.com/showthread.php?p=76690#post76690)

That just leaves those cds, where the data track comes first, such as playstation discs, I do not see how the data track length can be preserved in such a case.
Title: CD TOC storage in lossless files
Post by: skamp on 2008-04-03 15:59:58
The whole point of this post 7 months ago was to allow an official standard, didn't come so of course I had to do something on my own...dammed if you don't....dammed if you do?
Damned if you ignore other people's work, especially if your own falls short of solving the problems such a standard was supposed to solve in the first place.

>Discs with negative LBAs?

They do not exist, stored as raw LBA address + 150, so if you had a disc which started 1 second in, not 2 seconds, if there was a disc it would start as 75.
Not to nitpick, but LBA + 150 isn't an LBA. But I see your point: resulting values will always be positive.

>How are we supposed to handle discs with a data track as track 01?

I have yet to double check how Windows Media Player is storing, these discs are basically only Playstation CDs so I am not too concerned.
Grand Theft Auto (http://musicbrainz.org/show/cdtoc/?cdtocid=51433) and GTA: London 1969 (http://musicbrainz.org/show/cdtoc/?cdtocid=136054) aren't Playstation CDs. And those are just the ones I happen to own; IIRC, such discs weren't uncommon before CD-Extra became a standard (can't remember which of the rainbow books).

>The first track is 1, always is.
Nope, CDs such as the ones I referred to have a data track as track 01; the first audio track is numbered 02. Besides, if I'm not mistaken, the redbook standard states that track numbers must be sequential, between 01 and 99 inclusive, but doesn't require that they start from 01.

>how are we supposed to compute MusicBrainz DiscIDs?

Possible with the CDTOC tag, even for cd extra, as MB does not use the data track, see the CD Extra I gave an example for
Not possible without the first and last track numbers, unless you assume the first track is always numbered 01. With your CDTOC tag, I couldn't identify my GTA discs. (*) I doubt MB requires those values just for fun. And btw, you failed to mention that in order to compute a MB DiscID, you need to subtract 11,400 (150 + 2 seconds) from the data track LBA: that gives you the end of the last audio track, which is the "lead-out" value that MB expects.
You haven't covered the FreeDB issue either (lack of lead-out address).

(*) Edited.
Title: CD TOC storage in lossless files
Post by: spoon on 2008-04-03 17:20:18
I have added [data][audio] cds to the standard (as well as CD Extra as mentioned in my previous post).
Title: CD TOC storage in lossless files
Post by: frozenspeed on 2008-04-03 17:23:49
I have added [data][audio] cds to the standard (as well as CD Extra as mentioned in my previous post).


Do you have a wiki or some other reference to this on your site rather than just the forums? Since it sounds like you wouldn't mind (or better yet encourage) other people to adopt this as well...
Title: CD TOC storage in lossless files
Post by: spoon on 2008-04-03 17:27:40
I have sticked the post and referenced the post in our version changes.