Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Last call on FLAC specification (Read 10070 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Last call on FLAC specification

Hi all,

A working group of the IETF started working on turning the FLAC specification into an RFC a few years ago. The document as published on the FLAC website was taken as starting point and improved upon.

The working group now considers this document ready to be submitted for IETF processing to become a standards-track RFC. The document has now entered a two-week period called working group last call.

I would like to invite anyone interested to read the document and comment on it. It can be found here: https://datatracker.ietf.org/doc/draft-ietf-cellar-flac/

There are non-monospaced font versions.
HTML: https://www.ietf.org/archive/id/draft-ietf-cellar-flac-05.html
PDF: https://www.ietf.org/archive/id/draft-ietf-cellar-flac-05.pdf

If you have any remarks, you can send an email to the CELLAR working group mailing list, open an issue or PR at https://github.com/ietf-wg-cellar/flac-specification/ or reply here.
Music: sounds arranged such that they construct feelings.

Re: Last call on FLAC specification

Reply #1
For people interested in the inner workings of FLAC but not feeling up to the challenge of reading a specification document (which can be very technical), I recommend skimming over appendix D, which contains examples in which FLAC files are decoded by hand.
Music: sounds arranged such that they construct feelings.

Re: Last call on FLAC specification

Reply #2
I can surely open an issue, but could I come up with one here? My emphasis:

Quote
The introduction of this was specifically aimed at improving compression of 24-bit PCM audio and compression of 16-bit PCM audio only rarely benefits from using a 5-bit Rice parameters. Therefore, when maximum compatibility with decoders is desired it is RECOMMENDED to only use 4-bit Rice parameters when encoding audio with a bit depth higher than 16 bits.

What I think you want to RECOMMEND: that 5-bit Rice is avoided for 16 bits and below, am I right? In which case it should read
"use 5-bit Rice parameters only when encoding audio with a bit depth higher than 16 bits"

Note also the unnecessary ambiguity in the phrase "only use". The italicized sentence can be both read as
when encoding > 16 bits: refrain from using anything else than 4-bit Rice [i.e.: > 16 bits => 4-bit Rice]
and as
only when encoding > 16 bits you should use 4-bit Rice [i.e.: at most 16 bits => stay away from 4-bit Rice]


And finally I wonder if escaped Rice is to be understood as Rice.

Re: Last call on FLAC specification

Reply #3
I can surely open an issue, but could I come up with one here?
Sure

Quote
What I think you want to RECOMMEND: that 5-bit Rice is avoided for 16 bits and below, am I right? In which case it should read
"use 5-bit Rice parameters only when encoding audio with a bit depth higher than 16 bits"
Thanks for noticing. That would indeed be an improvement.

Quote
And finally I wonder if escaped Rice is to be understood as Rice.
For each subframe with a predictor, a entropy coding method (i.e. rice coding method) is chosen once. An escape code can be used for every partition, so that is one level deeper.

By the way, using escape codes is handled one section later: https://www.ietf.org/archive/id/draft-ietf-cellar-flac-05.html#name-rice-escape-code
Music: sounds arranged such that they construct feelings.

Re: Last call on FLAC specification

Reply #4
I read it.


A couple of typos first:
"Hertz" on three occasions, should be "hertz"; when written out, SI units are not capitalized. Not even those named after people (ampere, henry, hertz, kelvin, ...)
"bocksize" is something worth discussing when we are closing in on Oktoberfest, but you need an "l" in that link in C4.


Then an obsolete piece of information:
* 9.2, bottom of page 13 says that the reference implementation only supports up to 24 bits per sample.


And then on to Subset considerations.
* First capitalizing nitpickery: it must be hard to read all those "subset of" that don't refer to format Subset. If the document allows for emphasis, may I suggest emph'ed Subset? And even if not, the Section 8 header (and thus the TOC) should have capital S, and then in 9.6.2 it says "non-subset".
Also, if some of the "subset of" could be rephrased as "special case of", or in the case of "decoders only implement a subset of FLAC features": what about "only implement part of the full FLAC specification" or ...?

* Also the definition is not clear, since 9.6.2 says that tag-defined channel mask violates streamability and is "i.e. non-subset" - and that restriction is nowhere found in Section 8.
Indeed, consider whether Section 8 should be clearer about the fact that there are indeed signals that can be encoded to Flac but not to Subset (even standard left+right 2ch, those > 655350).

* Then this:
Quote
And finally I wonder if escaped Rice is to be understood as Rice.
For each subframe with a predictor, a entropy coding method (i.e. rice coding method) is chosen once. An escape code can be used for every partition, so that is one level deeper.
Yes, but (I might have mentioned this before, because at least I have thought I'd mention it):
Does Subset disallow for a "Rice partition order" of 0b1111? It can be read as "no, that is a number > 8" but alternatively it can be read as "that is fine - as 0b1111 does not signify a Rice partition order, it can obviously not signify a too big Rice partition order".  A simple "MUST NOT be escaped" or "or be escaped" in the last sentence of section 8 would clear up.

* Then: I find nowhere that it is RECOMMENDED to use Subset when interoperability is a concern (provided the signal allows for Subset, that is).
You have probably thought of it, but it looks strange not to recommend.


And finally a few that you might argue don't need to be clarified at that level of nitpickery (which you might already have been thinking a few times already during this post).
First, on a general note, I don't have the knowledge to tell whether all "invalid" are strongly enough declared invalid. Say, I have no idea whether all/any of the "MUST be a multiple of 8" could be 0.
Or, say: could the last frame be 0 samples?

Some specific ones:

* 5.3: Should there be a first sentence that an encoder MAY use any of these for any subframe at its discretion subject to [rest of the requirements, including all the quirks on 32 bit]?
(Although this section describes common uses, it might otherwise appear as prescriptive. Imagine if some ultra-low power CPU only wants to receive an audio stream and store it uncompressed, uses FLAC for other reasons than compression and stores everything Verbatim - oh fine!)

* 9.6.2 again. Should one write that the WAVEFORMATEXTENSIBLE_CHANNEL_MASK field MAY be used even when it agrees with the channel bits? And then that it isn't streamable when defined through a WAVEFORMATEXTENSIBLE_CHANNEL_MASK tag that differs from the channel bits ...?
And also: what shall(/SHALL, or RECOMMENDED or ...) a decoder do if it gets a dropout in a multichannel stream where it knew (from the beginning) that it had a channel mask tag? Default to channel bits even at the slightest dropout? Or that MAY be up to the implementation?

* C5, RECOMMENDED to pad to "a whole number of bytes", would it be better to have "the next whole number of bytes" or are devs not dumb enough to think "oh, but let's use 32 as a catch-all!"?

Re: Last call on FLAC specification

Reply #5
I read it.
Many thanks for your comments, they are much appreciated. I will file a PR with fixes soon.

Quote
Does Subset disallow for a "Rice partition order" of 0b1111?
Yes

Quote
It can be read as "no, that is a number > 8" but alternatively it can be read as "that is fine - as 0b1111 does not signify a Rice partition order, it can obviously not signify a too big Rice partition order".
You are confusing two levels here. It is not the partition order (which is related to the number of partitions) that is escaped, individual partition are. So, a partition order of 0b1111 means 2^15 partitions.

The Rice parameter can be 'escaped'. So, for a Rice-coded residual with 4-bit parameters, a certain partition has a rice parameter between 0 and 14 (inclusive) or is escaped (0b1111), in which case the number of bits follows.

If this explanation still leaves questions, please study the audio frame of example 3. Here you can see a Rice-coded residual split up in 4 partitions, of which only the second is escaped.
Music: sounds arranged such that they construct feelings.

Re: Last call on FLAC specification

Reply #6
* First capitalizing nitpickery: it must be hard to read all those "subset of" that don't refer to format Subset.
I can only find two, both in the same context. I don't think that is too much of a problem?

Quote
First, on a general note, I don't have the knowledge to tell whether all "invalid" are strongly enough declared invalid.
What do you mean, specifically?

Quote
Say, I have no idea whether all/any of the "MUST be a multiple of 8" could be 0.
Yes. Try the following on a FLAC file
Code: [Select]
metaflac --remove --block-type=PADDING --dont-use-padding test.flac
metaflac --add-padding=0 test.flac
metaflac --list test.flac
And you will see a padding metadata block with no padding.

Quote
Or, say: could the last frame be 0 samples?
It is not possible to represent that. Block size is stored minus, so when value 0 is stored in uncommon block size, that is actually a frame with 1 sample. See the frame header of example 1
Music: sounds arranged such that they construct feelings.

Re: Last call on FLAC specification

Reply #7
* First capitalizing nitpickery: it must be hard to read all those "subset of" that don't refer to format Subset.
I can only find two, both in the same context. I don't think that is too much of a problem?
You are right. I was probably hung up with the old "a subset of itself as the Subset" and thought that ... well I might have wrapped search around. Sorry.

Quote
First, on a general note, I don't have the knowledge to tell whether all "invalid" are strongly enough declared invalid.
What do you mean, specifically?
Say, block size is stored minus one in 10.1.1, so that resolves the two first of the table in 9.2 Streaminfo? Or not? Because those are "intention" and it is fine as long as one stays within the interval, the extreme values need not be attained?
Would an (uncommon) sample rate value of 0 be valid? I mean, we know it does not make sense in audio to be played, but it being encoded as 8 or 16 bits admits 0. (I am on shaky ground here knowledge-wise, I don't even know if WAVE or AIFF admit such things, and if they do then I guess there is no reason to deliberately decide to choke on "perfectly valid but nonsense in audio" input files.)
And, if applicable to one or both cases: is it even within the scope of the document to say anything about how decoders should treat "technically valid but nonsensical" values? (... I think the format is safeguarded against mid and side signals that give rise to L and R not being integer?)


... and eff, I thought I had freed myself from misunderstanding the Rice paramordermeter terminology when I ruled out "4" or "5" having names. My bad. Again.

Re: Last call on FLAC specification

Reply #8
Say, block size is stored minus one in 10.1.1, so that resolves the two first of the table in 9.2 Streaminfo? Or not? Because those are "intention" and it is fine as long as one stays within the interval, the extreme values need not be attained?
For block size, streaminfo is never referenced. The streaminfo values are just there so decoders can judge whether they are able to decode a stream.

So, as streaminfo is never referenced for block size, there is no way for a frame to convey a block size of 0. Block size of 1 is the smallest possible value that can be stored.

Quote
Would an (uncommon) sample rate value of 0 be valid? I mean, we know it does not make sense in audio to be played, but it being encoded as 8 or 16 bits admits 0. (I am on shaky ground here knowledge-wise, I don't even know if WAVE or AIFF admit such things, and if they do then I guess there is no reason to deliberately decide to choke on "perfectly valid but nonsense in audio" input files.)
No. A sample rate of 0 in streaminfo is specifically said to be invalid, perhaps that should also be added to the uncommon sample rate section.
Music: sounds arranged such that they construct feelings.

Re: Last call on FLAC specification

Reply #9
No. A sample rate of 0 in streaminfo is specifically said to be invalid, perhaps that should also be added to the uncommon sample rate section.

Perhaps, but perhaps no? It is more of a policy question, and one of the reasons I asked in more general terms on whether everything that was supposed to be invalid, was forbidden in clear enough terms. It is not obvious to me that this should even be forbidden. Again, if a "0" in that field still can make for a valid AIFF or WAVE, then the only good reason that the FLAC format should outright forbid it, is that it doesn't make sense for audio. And it could still be useful for encoding a stream of - at the moment - unknown rate.

Or if someone wants to use FLAC as a general purpose compressor. Not necessary to forbid that. Maybe even a rate of 0 is then just what it needs - a "sure, decode it but don't try playing it".

Re: Last call on FLAC specification

Reply #10
In the case of RIFF wave, the format tag precedes everything else, so if a format does allow some special values in channels, sample rate, bit-depth, block align, data rate etc, the programmer can use the format tag to impose format specific restrictions. If one is lazy and does not apply appropriate checks, unexpected values can result in crash or other bad things. Here is a bug report in Google LC3 codec.

https://github.com/google/liblc3/issues/9

I tried the attached example.zip with SoX and it also crashed.

Division by zero may result in immediate exception for integer types, but not necessarily for float. For example, one may use Infinity or NaN to decide what to do later on.

Re: Last call on FLAC specification

Reply #11
In the case of RIFF wave, the format tag precedes everything else, so if a format does allow some special values in channels, sample rate, bit-depth, block align, data rate etc, the programmer can use the format tag to impose format specific restrictions.
I'm not really sure what you're trying to say. Is this concerning the FLAC spec?
Music: sounds arranged such that they construct feelings.

Re: Last call on FLAC specification

Reply #12
My post was a reply to Porcus' question about WAVE and occurrence of 0 in some specific fields.

Re: Last call on FLAC specification

Reply #13
Does FLAC getting its own RFC mean that there will be no changes in the future that would break compatibility with older versions?

Re: Last call on FLAC specification

Reply #14
My question was - or rather, was related to - whether a FLAC encoder should reject "weird but compliant" WAVE files or not. Given that it processes files. Disregarding the ambiguities of the (first) RIFF WAVE specification ...

My attitude is that if there is no danger and the FLAC format spec has - in a reasonable interpretation - allowed for it, I don't see the point in closing a "loophole" just because it doesn't look so useful for audio. A sample rate of zero ... ? No harm if it isn't played back (and isn't just buffering until out of memory). But I'm not deciding policy questions.

Uh well, reminds me of this issue. @ktf, you got the bases covered here for the purpose of the spec? Because that issue relates not to how b-bytes-in-C-container appears in a FLAC stream, only to how a given piece of software interprets certain .wav files?

Re: Last call on FLAC specification

Reply #15
Does FLAC getting its own RFC mean that there will be no changes in the future that would break compatibility with older versions?
As far as I know no one can look into the future, so there is no saying whether there will ever be a FLAC 2.0. However, even without an RFC, I think it currently unlikely the FLAC spec will ever break with the past in such a way. I think FLAC is implemented in too many devices to create ambiguity by creating a FLAC 2.0.

I just downloaded the FLAC 0.9 windows binary (released March 2001) from the FLAC sourceforge page, and it is able to decode 16-bit stereo audio (CDDA) encoded by FLAC 1.4.1 without any problems. FLAC 0.9 was the first release with the format as it is today. 24-bit audio is another story, compatibility goes only back to FLAC 1.2.0 (released July 2007), but it shows that FLAC is a very stable format indeed.

This RFC simply documents the FLAC format much more precisely than the FLAC specification that has been on the FLAC website for ages.

My attitude is that if there is no danger and the FLAC format spec has - in a reasonable interpretation - allowed for it
I think it hasn't. There has always been a note that a sample rate of 0 in the streaminfo metadata block is invalid. I only suggested to also mention this in the uncommon block size section.

Quote
Uh well, reminds me of this issue. @ktf, you got the bases covered here for the purpose of the spec? Because that issue relates not to how b-bytes-in-C-container appears in a FLAC stream, only to how a given piece of software interprets certain .wav files?
The WAV spec is rather complicated, with several ways to represent the same thing. FLAC is much more straightforward, I don't think there are similar ways to create ambiguity.
Music: sounds arranged such that they construct feelings.

Re: Last call on FLAC specification

Reply #16
New things coming up too here ...

My attitude is that if there is no danger and the FLAC format spec has - in a reasonable interpretation - allowed for it
I think it hasn't. There has always been a note that a sample rate of 0 in the streaminfo metadata block is invalid. I only suggested to also mention this in the uncommon block size section.

Not necessarily a decisive argument, as frame info does not need to agree with streaminfo. From Section 10, emphasis mine:
"Each frame header stores the audio sample rate, number of bits per sample and number of channels independently of the streaminfo metadata block and other frame headers. This was done to permit multicasting of FLAC files but it also allows these properties to change mid-stream.  Because not all environments in which FLAC decoders are used are able to cope with changes to these parameters during playback, a decoder MAY choose to stop decoding on such a change."

So, well, 0 could be used to signify: "stop playback". Again, not my decision.

But the Section 10 quote raises another few questions on Subset. Refering to the bullet items of section 8:
* Sample rate: Can a Subset-compliant file change sample rate during the file? If not: Can a Subset-compliant file have all frames' sample rate the same but distinct from streaminfo value?
Refer to bullet items 4 and 5.
* Bit depth: same question. Which also raises a question about C.5: if bit depth can vary over the stream, should it be RECOMMENDED that the max is used? (Dangerous if the input is 16 bits until the very end, where it increases to 20, to 24, 28 and finally 32.)
* [OK, no question]
* Here it says "the sample rate of the stream", in singular - suggesting that the sample rate of the stream does not change. Which should be clarified then. Can sample rate change mid-stream and still be Subset?
(Pro: decoder should be able to pick up mid-stream, so why not? Contra: decoder MAY stop when that happens, that does not sit well with a purpose of "true" streamability when decoders are given free pass not to accept the stream. Unknown (to me): What has been the interpretation for XX years?)
* Similar question to previous bullet. But here it doesn't refer to the stream's maximum filter order, so this bullet item raises yet another potential question or two provided that Subset allows for changing sample rate: Can one use -l32 if a single frame is > 48000? On every frame if just one frame is > 48000? On the frames in question? Or starting from that frame and until the end of the stream?
* [OK, no question ... I've made a fool out of myself enough times on that one]


And finally, the Section 10 quote raises an issue on variable block size:
* The encoder fires up and anticipates a stream to encode. It writes min blocksize = max blocksize = 4096 to Streaminfo and starts receiving samples.
* At sample #9100, sample rate and/or bit depth changes. [Aside: I don't know if there is any way to communicate that to the encoder without coding it in from the beginning, so issue may be moot at the outset.]
* Format has no provision for changing sample rate nor bit depth mid-frame.
* ==> a single block of fewer samples then, before sample rate changes.
* Uh-oh, but min blocksize has already been written to file!

Also it seems that, since minimum block size is 16 save for the last block (am I right that "last block" means last block in the entire stream?), that means
* big trouble if sample rate & bit depth are not constant for as long as 16 samples.
* practical trouble if it happens say at sample #9000: you have encoded 2x4096 and then the next block is too small - but that could be resolved by scrapping the previous block, giving it a new size of 4107 (or splitting it), and encoding again.

Re: Last call on FLAC specification

Reply #17
Oh, and a wording in Section 8, second to last bullet item: "linear subframes" should be "linear predictor subframes".

Re: Last call on FLAC specification

Reply #18
My attitude is that if there is no danger and the FLAC format spec has - in a reasonable interpretation - allowed for it
I think it hasn't. There has always been a note that a sample rate of 0 in the streaminfo metadata block is invalid. I only suggested to also mention this in the uncommon block size section.
Not necessarily a decisive argument, as frame info does not need to agree with streaminfo. From Section 10, emphasis mine:
"Each frame header stores the audio sample rate, number of bits per sample and number of channels independently of the streaminfo metadata block and other frame headers. This was done to permit multicasting of FLAC files but it also allows these properties to change mid-stream.  Because not all environments in which FLAC decoders are used are able to cope with changes to these parameters during playback, a decoder MAY choose to stop decoding on such a change."
A file in which the data of the streaminfo metadata block does not agree with a single frame header would not count as a reasonable interpretation as I see it  :))

Anyway, I've brought up the point to the CELLAR mailing list, suggesting to reserve a sample rate of 0 for non-audio data. Let's see what it turns up.

Quote
But the Section 10 quote raises another few questions on Subset. Refering to the bullet items of section 8:
* Sample rate: Can a Subset-compliant file change sample rate during the file? If not: Can a Subset-compliant file have all frames' sample rate the same but distinct from streaminfo value?
There is nothing that forbids that. Seems strange to have streaminfo not agree with even a tiny bit of the audio itself, but it is possible. This is one of those things where it is possible, but doesn't make sense.

Note that this is specifically forbidden for FLAC files embedded in MP4 and Matroska though, although sample rates in MP4 are even more complicated.

Quote
Refer to bullet items 4 and 5.
* Bit depth: same question. Which also raises a question about C.5: if bit depth can vary over the stream, should it be RECOMMENDED that the max is used? (Dangerous if the input is 16 bits until the very end, where it increases to 20, to 24, 28 and finally 32.)
I would argue that it makes sense to have the streaminfo agree with the first audio frame. It is made clear having such parameters changing mid-stream will decrease the number of environments being able to play back those files, but there are applications that require this, for example recording of broadcast streams (where the number of audio channels can change for commercial breaks for example). I don't know whether streaminfo has a clear use case there anyway.

Quote
* Here it says "the sample rate of the stream", in singular - suggesting that the sample rate of the stream does not change. Which should be clarified then. Can sample rate change mid-stream and still be Subset?
(Pro: decoder should be able to pick up mid-stream, so why not? Contra: decoder MAY stop when that happens, that does not sit well with a purpose of "true" streamability when decoders are given free pass not to accept the stream. Unknown (to me): What has been the interpretation for XX years?)
Pretty much no-one uses FLAC this way, so it is hard to know what the interpretation has been. A file with changing audio parameters is still perfectly streamable, that decoders can choose not to decode it is of no concern. libFLAC handles this fine, but the flac command line tool is unable to handle this for example, because it cannot be decoded to WAV losslessly.

Quote
* Similar question to previous bullet. But here it doesn't refer to the stream's maximum filter order, so this bullet item raises yet another potential question or two provided that Subset allows for changing sample rate: Can one use -l32 if a single frame is > 48000? On every frame if just one frame is > 48000? On the frames in question? Or starting from that frame and until the end of the stream?
A predictor order can be larger than 12 when the samplerate is larger than 48000. The predictor order is something chosen at subframe level, so one should treat this at subframe level.

Quote
And finally, the Section 10 quote raises an issue on variable block size:
* The encoder fires up and anticipates a stream to encode. It writes min blocksize = max blocksize = 4096 to Streaminfo and starts receiving samples.
* At sample #9100, sample rate and/or bit depth changes. [Aside: I don't know if there is any way to communicate that to the encoder without coding it in from the beginning, so issue may be moot at the outset.]
* Format has no provision for changing sample rate nor bit depth mid-frame.
* ==> a single block of fewer samples then, before sample rate changes.
* Uh-oh, but min blocksize has already been written to file!
This is not possible. Streams of which the samples change mid-stream generally cannot be fixed blocksize for this reason alone. However, if the original stream contains AC3 for example, and the FLAC blocksize matches the AC3 frame size, changes to the number of channels in the AC3 stream will always be on a frame boundary.

Quote
Also it seems that, since minimum block size is 16 save for the last block (am I right that "last block" means last block in the entire stream?), that means
* big trouble if sample rate & bit depth are not constant for as long as 16 samples.
* practical trouble if it happens say at sample #9000: you have encoded 2x4096 and then the next block is too small - but that could be resolved by scrapping the previous block, giving it a new size of 4107 (or splitting it), and encoding again.
An encoder that is able to handle this should have a lookahead of at least 16 samples because of this.

But I think these practical considerations are not something that should be mentioned in the specification. It focusses on the format and how to decode it. It is decoder-centric. There are indeed a few encoding considerations given, but these are not meant to be exhaustive.
Music: sounds arranged such that they construct feelings.

Re: Last call on FLAC specification

Reply #19
So one thing is what is OK and what is not.
Another is: is that clear enough in the text.

If I read you corectly, that "sample rate of the stream" should better be read as "sample rate (of the frame in question)" or something, and then the text could use that change. Which will also clarify that yes changes mid-stream are not forbidden in Subset - it would be obvious that it is written for the cases where the considerations apply to some but not all frames.

But that's the reason for some of these dumb questions: if it means X, then maybe it should be a bit clearer. If there is somewhere something that forbids those strange things from happening anyway, clarifications are not really changing anything.  But here they could.

Re: Last call on FLAC specification

Reply #20
If I read you corectly, that "sample rate of the stream" should better be read as "sample rate (of the frame in question)" or something, and then the text could use that change. Which will also clarify that yes changes mid-stream are not forbidden in Subset - it would be obvious that it is written for the cases where the considerations apply to some but not all frames.
Yes. The reply wasn't meant as a reason why not to include it, but I'm still pondering on how to include it.

The problem is that these considerations turn out to serve no purpose when seen in context. If a decoder is perfectly able to decode a subframe with a samplerate of 96kHz and predictor order 32, there is no reason to limit the predictor order to 12 for a 48kHz part of the stream. The predictor order is limited so resource-limited decoders can still decode CDDA c.q. non-high-res material, but if you mix both, there is really no reason to limit the predictor order for one sample rate but not for the other.

edit: Once again, I'm not saying I'm not including it, I just don't know what's the best thing to do here.
Music: sounds arranged such that they construct feelings.

Re: Last call on FLAC specification

Reply #21
So, with sample rate changing mid-stream, there is a problem about "i.e. the maximum blocksize must not be larger than 4608."
Because the document also says about maximum blocksize, quoting from 9.2.

"This is because the encoder has to write these fields before receiving any input audio data, and cannot know beforehand what block sizes it will use, only between what bounds these will be chosen."

(So here is one case where the document says something about the encoding process. But it says "has to" and not MUST, so this is an assumption about how it will work in practice, not a prescription. "cannot know beforehand" should read "without having to know beforehand". In case that is significant enough for a PR.)


But viewed together, the quotes from 9.2 and 8 have the implications - to ensure Subset! - that the encoder must write a max blocksize of at most 4608 unless it positively knows that the sample rate will never drop to 48k or below. If it boldly starts out with 8192 on a 96 kHz stream, then 8192 is already written to Streaminfo and committed to, and uh-oh: all for sudden the stream dropped to 48k and we are outside Subset with no way to rectify it.


So this suggests to tweak both the Subset definition and the requirements on Streaminfo. For example, what should (/SHOULD) the sample rate in Streaminfo be? One suggestion is that if encoder knows nothing, then - as "0" is invalid - the encoder should at least read the first few samples before writing, so that Streaminfo matches the beginning of the file. (If a decoder cannot read what Streaminfo specifies and gives up before first frame, it would coincide with the case where it would give up at first frame - sounds sane.) But possibly if the encoder knows (how can it know? Being passed the information from the user is good enough) something about the stream, it MAY choose otherwise in order to warn a decoder about ... well then that is a problem, what to warn about? Sample rate of 34567 is worse than 96000 I guess? But to stay consistent: it MAY pick the highest it knows. So encoder MAY write 192000 if it wants to warn decoder that it must expect as high as 192000, even if it starts at 48000,

Once Streaminfo is sorted out, then on to the maximum blocksize part of Subset definition: should that refer to the streaminfo max blocksize information, or to the single frame, or to the worst of those?


Sorry for not shaking up the can of worms earlier.
(Not so sorry for doing it at all - after all my writings can just be discarded ...)

Re: Last call on FLAC specification

Reply #22
Here's a PR: https://github.com/ietf-wg-cellar/flac-specification/pull/183/files

This would solve the problem of enforcing subset with changing audio parameters, it is rephrased to apply at the frame level.

Considering streaminfo: as is stated elsewhere in the document, it should not be relied upon. You could almost say the sample rate, number of channels and bit depth are there just for display if not referenced from a frame header. It seems sane to expect an encoder writes down the properties of the first block it received in there.
Music: sounds arranged such that they construct feelings.


Re: Last call on FLAC specification

Reply #24
This RFC simply documents the FLAC format much more precisely than the FLAC specification that has been on the FLAC website for ages.
Thanks for the explanation. I was just curious as to why the RFC was happening now, and whether that meant FLAC was being "finalised" as a format as a result. I guess the RFC only covers FLAC 1.0, so as you said, FLAC 2.0 could still happen someday. :)