Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: YALAC - File format development (Read 15172 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

YALAC - File format development

YALAC - File format development

Purpose

In this thread i will ask questions regarding specific features of the file format of Yalac, my upcoming lossless audio file compressor: Are specific features really needed and how should they be implemented in detail?

I need your help, because i am often not sure, what possible future users really want.

Please don't add new questions. If you are sure, that something really important is missing, send me a mail. And be aware, that i allready have more questions prepared. But i would like to post and discuss one after the other.

Questions

A chronological list of my questions. Each item contains the date of my first post and the state:

Closed: Discussion is over, deceicions have been taken.
Active: Discussion is going on.


1) What is needed for streaming? (6/06/30). Active.


Question 1: What is needed for streaming?

The lossless codec comparison page in the wiki contains the feature "Streaming support". Unfortunately i could not find an accurate definition of streaming, neither in the comparison page nor in the discussion thread.

Let me describe the current implementation of streaming support for yalac and please tell me, if something is missing:

Yalac partitions each audio file into frames, which contain up to 250 ms of audio data. Each frame can be independently decoded, it does not need data from other frames.

But it does need some general information from the file header: Sampling rate, bits per sample, channels and so on.

I know, that FLAC repeats this info (at least for standard audio formats) in the header of each individual frame. But i really don't know, why this should be neccessary. What do you think?

Each yalac frame starts with an 16 bit sync code. A player (soft- or hardware) can start at an arbitrary position within the file stream and search for the next sync code to find the probable start of the next frame.

Because the audio data itself can contain values equal to the sync code, the decoder can not be sure, that the specific value really marks a frame start. Therefore it has to try to decode the possible frame at the position of the sync code. If this fails, it has to look for the next sync code and try again.

I have written a little tool to find the optimal sync code: the value with the lowest probability to show up randomly in the compressed audio data. The currently selected  sync code will on average be found once every 80.000 bytes of compressed data. That means, that a player on average will detect one wrong sync code per seek operation, hence has to decode 2 instead of one frames before it can start playing. Not too bad, if you take yalac's high decoding speed into account. Right?

Important: Players, which are able to use the seek table contained within the file header do not have to deal with sync codes!

It's possible to dramatically reduce the probability of wrong sync code detections: If each frame contains the (compressed) frame length after the sync code, the decoder can jump accordingly to the position, where the next frame should start. If it finds there a new sync code, it's highly probable, that the position of the first sync code is valid. But the storage of the frame length needs some space and therefore the compression will be a bit worse.

BTW: The first approach without inclusion of the frame length works similar: First try to decode the frame, if this is successful, the next two bytes of the stream should contain another sync code.

Sorry, i know the exlpaination isn't too good, my english has to be improved...

  Thomas

YALAC - File format development

Reply #1
1)I think it's important to store the complete stream info (sampling rate, channels) in each frame : if a client connects to a stream while it is being distributed, it needs to know how to decode it.  Unless you can detect the frame type from the frame itself without explicitly specifying the data, you'll need to include it.

2) Considering YALAC generally decodes at AT LEAST 50x, and frames are 250 ms (max), decoding two frames will take approximately 5 ms.  I think the average user can take this kind of delay in decoding.  However, should you make your codec gapless, you have to make sure that decoding the first frame in a file doesn't have this delay...  Of course, you might have different decoder implementations that are more or less efficient, but that's another matter.

Thanks for all your hard work,

Good luck,
Tristan.

YALAC - File format development

Reply #2
Question 1: What is needed for streaming?

In my opinion, a file format that supports streaming must allow the decoder to start decoding at an arbitrary position within the stream. That means that all info about the stream that the decoder needs to know must be repeated in the stream.

A generic solution would be to interleave stream info and audio data at an arbitrary, user-definable ratio, for example:

(SI=stream info frame, A=audio frame)
Code: [Select]
ratio SI:A = 1:1   SI A SI A SI A SI A SI ...
-lowest streaming delay
-biggest storage overhead
-purpose: broadcast (streaming)

Code: [Select]
ratio SI:A = 1:5   SI A A A A A SI A A A A A SI ...
-higher streaming delay
-smaller storage overhead
-purpose: local storage for playback (quick seeking)

Code: [Select]
ratio SI:A = 1:n   SI A A A A A A A A A A ...
-no streaming support
-no storage overhead
-purpose: archiving (slow seeking)

The stream info frames could contain info like this: (just an incomplete example!)
- sync code
- SI frame CRC
- audio stream info (sample bit width, sampling frequency, etc.)
- current position within stream (timestamp and/or sample number)
- meta data (artist, title, etc.)

The sync code together with the SI frame CRC lowers the chance for a false positive match to practically zero.

A seek table and other non-streaming info (cue-sheets, album cover JPGs, etc.) could be included in an additional info frame at the start of the file, only. The seek table would allow players to skip as many SI frames as possible (defined by the precision/number of entries in the seek table) to reach the target position within the stream. This quick seeking feature would rely on the presence of SI frames because only those have sync codes that allow the decoder to re-sync with the audio stream. On the other hand, files without any SI frames (purpose: archiving) would require the player to do slow seeking, i.e. to decode the entire audio stream up to the target position.

YALAC - File format development

Reply #3
In my opinion, a file format that supports streaming must allow the decoder to start decoding at an arbitrary position within the stream. That means that all info about the stream that the decoder needs to know must be repeated in the stream.

Well, you are right. I have been a bit too greedy and didn't want to give extra space for the audio format description... Therefore i only thought about the sync code.

A generic solution would be to interleave stream info and audio data at an arbitrary, user-definable ratio, for example:
...
The stream info frames could contain info like this: (just an incomplete example!)
- sync code
- SI frame CRC
- audio stream info (sample bit width, sampling frequency, etc.)
- current position within stream (timestamp and/or sample number)
- meta data (artist, title, etc.)

The sync code together with the SI frame CRC lowers the chance for a false positive match to practically zero.

I like your idea to vary the tradeoff between space requirements for si frames and the seek efficiency!

Probably i will not use seperate si frames, but instead set a flag within the regular data frames to indicate, that this one contains extended information. I anyway want to provide every frame with a sync code to make it easier to find them if a damaged file has to be reconstructed.

If this flag immediately follows the sync code, the decoder can easily seek for a frame with extended info if it adds this bit to the sync pattern.

A seek table and other non-streaming info (cue-sheets, album cover JPGs, etc.) could be included in an additional info frame at the start of the file, only. The seek table would allow players to skip as many SI frames as possible (defined by the precision/number of entries in the seek table) to reach the target position within the stream. This quick seeking feature would rely on the presence of SI frames because only those have sync codes that allow the decoder to re-sync with the audio stream. On the other hand, files without any SI frames (purpose: archiving) would require the player to do slow seeking, i.e. to decode the entire audio stream up to the target position.

Yes. The definition of a meta data format is allready on my list.

Many thanks

  Thomas

YALAC - File format development

Reply #4
Code: [Select]
ratio SI:A = 1:n   SI A A A A A A A A A A ...
-no streaming support
-no storage overhead
-purpose: archiving (slow seeking)


Seeking will be "slow", only if user will drop SI frames and seek table (nice idea, as for me...). And - if you decide to implement SI frames, it would be nice to drop sync codes in usual frames... And, may be, it will be possible to add SI frames "on the fly" - when streaming.

YALAC - File format development

Reply #5
Quote
' date='Jun 30 2006, 03:45' post='407523']
1)I think it's important to store the complete stream info (sampling rate, channels) in each frame : if a client connects to a stream while it is being distributed, it needs to know how to decode it.  Unless you can detect the frame type from the frame itself without explicitly specifying the data, you'll need to include it.

You are totally right!

Quote
' date='Jun 30 2006, 03:45' post='407523']
Thanks for all your hard work,

Better wait until a working release! Currently you are getting nothing useful from my work...

Thanks

  Thomas


decide to implement SI frames, it would be nice to drop sync codes in usual frames... And, may be, it will be possible to add SI frames "on the fly" - when streaming.

Hm, that would be a good reason for seperate si frames (although i do not really like them).

 

YALAC - File format development

Reply #6
Is it impossible to extend/reformat "usual" frame (add stream info) on the fly? IMHO, separate SI frames have advantage only if you drop sync codes in "usual" frames... Hm, or if there will be common format for "non-usual" frames (currently stream info, but in future you may need to add more types?)

YALAC - File format development

Reply #7
Have you considered one of the existing container formats for your codec?

Using one of them would mean that you wouldn't have to re-invent the wheel. 

YALAC - File format development

Reply #8
Have you considered one of the existing container formats for your codec?

Using one of them would mean that you wouldn't have to re-invent the wheel. 


I did quick compare for FLAC native vs FLAC in Masroska - size difference is about 0,04%. May be it's acceptable... but I know nothing about internals of FLAC or Matroska.

YALAC - File format development

Reply #9
Repeating stream information has little to do with seeking. If you can access arbitrary positions in a file, you can just grab the stream information or seek table from wherever it is usually located; the file start would be a logical choice for stream information in that case to allow decoding without seeking in the file, for example through a pipe.

The repeated stream information comes into play when you have an (unseekable) stream that does not allow random access. The frequency at which the format information is embedded in the stream will in that case determine the average latency between the client connecting to the stream and being able to decode audio data.

Sync codes are also more useful for synchronizing the decoder to an unseekable stream or when recovering portions of a corrupted file than they are for seeking to a given audio position in a file with random access. Seeking to a given audio position by estimating the required file offset (for example based on the duration and the filesize) is a rather crude way to implement seeking, and accurate seeking is impossible using this approach, if the file does not contain timestamps. It would be quite inefficient if you had to decode the file from the beginning or some already known position to implement sample accurate seeking. (It would also make certain people quite bitchy.)

I think it would be a good idea to consider using an established container format instead of doing that all yourself.

YALAC - File format development

Reply #10
In the other thread we have discussion about per-frame CRC... May be, it can be moved into other type of "special" frames - and therefore can be included/excluded by user?
By the way, I don't see how can I use these checksums - if MD5 broken, I prefer to delete this file instead of keeping "partially lossless". And such things (stream info, frame sync codes, CRC32) could easily eat most of the compresion superiority of YALAC. So, IMHO, it would be best to allow user to skip all unneeded things.

Personally I use only Mokey's Audio in my archive mostly because it brings me good compression and minimum of ther stuff. But it isn't developed anymore (and Foobar2000 developer don't like it), so I hope that YALAC will be good choice - if it will have some compression ratio with comparable compression time, like HIGH preset (I don't bother with decompression - if speed is good to burn disk on the fly, it's enought).

YALAC - File format development

Reply #11
Best choice would be to integrate YALAC into FLAC.

YALAC - File format development

Reply #12
Best choice would be to integrate YALAC into FLAC.


I think that integrate YALAC into FLAC is a bad idea because:
a) FLAC files created with YALAC will be incompatible with existing FLAC decoders. Most people will avoid use the new method because they don't want to have compatibility problems.
An analogy is the ZIP format: BZIP2 is a part of the zip standard but nobody use it because only a few programs can decompress it.

b) All players and hardware that want to support the FLAC format must be able to decode the FLAC algorithm and the YALAC algorithm. That doesn't help FLAC to gain support.

YALAC - File format development

Reply #13
I think that integrate YALAC into FLAC is a bad idea because:
a) FLAC files created with YALAC will be incompatible with existing FLAC decoders. Most people will avoid use the new method because they don't want to have compatibility problems.
An analogy is the ZIP format: BZIP2 is a part of the zip standard but nobody use it because only a few programs can decompress it.

b) All players and hardware that want to support the FLAC format must be able to decode the FLAC algorithm and the YALAC algorithm. That doesn't help FLAC to gain support.

Besides the possibility of an integration of Yalac into FLAC:

I am quite sure, that the FLAC format will change sooner or later. While the file format is very well thought and excellently documented, there is some important limitation in the possible parameter set of the rice coder, that definitely hurts compression on higher sample resolutions (24 bit and up). If high resolution files will become more popular, there will be a need for a rework.

Just my 2 cents, i am not an expert for FLAC.

YALAC - File format development

Reply #14
To me it doesn't seem worth compromising the compression ratio considerably for what is most likely a fringe use for a lossless codec (at the present, at least).

As long as it can be streamed by the trial and error method, if it has to make two tries to decode the first frame, what's the worst that could happen?  A slight delay, or a burst of noise before the audio starts, perhaps?

As for specifying the stream info, could YALAC have a standard where if it is the most common type (i.e. Stereo 16 bit 44,100 KHz), it's optional, but make it mandatory to modify this... i.e. mono 16 bit 44,100 khz has to specify that it is mono, because this is the only thing that differs from default... you only need to add the symbol for mono into each frame header, but not all the other details.

That's an interesting idea that smack has about being able to customize the streaming info ratio... if you intend to stream it, you can optimize it for this purpose.  Of course, odds are that a user will eventually want to stream files, but this wasn't a concern when the files were first created.

It'd be interesting for the user to have the option to custom tailor his/her YALAC encodes for specific applications, i.e. embedding StreamInfo periodically within the file at definable intervals, adding redundancy for error correction, etc.  Perhaps these kinds of things could potentially be changed, added or removed on the fly without need for the computation of re-encoding the file as well.

I wonder how difficult it would make third party implementation of this codec if there are lots of extra features or modes of use designed into YALAC.  But if these kinds of things are going to be done, best get them right the first time, and/or reserve the ability to add such features or options to design in future reverse compatibility.


I really wonder about the idea of putting meta data/tag info in each frame, or at least periodically within the file.  Interesting idea though.  It'd make tag updating very slow and cumbersome though, it'd probably have to perhaps be a small tag that only has a few fields in of limited length (a fixed length info tag, like ID3v1).  I think this would only be important for streaming over the net, so it might have to be an option specified if you're using it for such purposes.


All in all, it is be good to closely analyze existing container formats, but if you're up for it, Thomas... I think it might be possible for you to create something that takes the best advantages of each format and create something better than all of them.  (Not that I know much about this myself).

YALAC - File format development

Reply #15
As for specifying the stream info, could YALAC have a standard where if it is the most common type (i.e. Stereo 16 bit 44,100 KHz), it's optional, but make it mandatory to modify this... i.e. mono 16 bit 44,100 khz has to specify that it is mono, because this is the only thing that differs from default... you only need to add the symbol for mono into each frame header, but not all the other details.

Yes, it's a very common way to have a very compact representation for the frequently used types.

That's an interesting idea that smack has about being able to customize the streaming info ratio... if you intend to stream it, you can optimize it for this purpose.  Of course, odds are that a user will eventually want to stream files, but this wasn't a concern when the files were first created.

Currently i prefer this approach:

1) By default streaming info is always beeing inserted, but only every 2 seconds. This does not hurt compression very much, but files are always streamable.

2) If you want lower start latencies (when connecting to a running stream), you can manually increase the rate.

YALAC - File format development

Reply #16
Despite from running an own, small icecast server, I don't know too much about streaming. But wouldn't it be possible that the client gets all the info about the stream he needs on connecting, and then never again? These information must be delivered by the server itself and not be inside the stream. In general, a streaming server has all the metainformation about a stream like tags (even though limited), samplerate, bitrate or quality and some more. What is further needed to decode a frame?

YALAC - File format development

Reply #17
Seems to me that the stream robustness info (sync codes, checksums) should be part of the stream. Metadata (sampling rate, bits per sample, channels) should be provided by the container. Container blocks do not have to match frame boundaries.

It is the container concern that you want to seek anywhere in the stream or not. If you need the ability to start anywhere without reading the header, then metadata should be repeated. If you always read the header before decoding then it is not necessary.

I think that Yalac should be designed in such fashion that the container can be replaced with a different one if desired. And if you have a copy of the file header, you can decode the stream starting at any point.

Reading a couple of frames before you can start playing is certainly fine.

In my opinion support for seeking should be kept in the container, but there is probably a case for including it in the stream instead.

YALAC - File format development

Reply #18
I don't know about sync codes, but I always streamed my audio without CRC (or so I believe) and can't remember of anyone complaining about errors.

YALAC - File format development

Reply #19
Seems to me that the stream robustness info (sync codes, checksums) should be part of the stream. Metadata (sampling rate, bits per sample, channels) should be provided by the container. Container blocks do not have to match frame boundaries.

I think it's the other way around: framing and synchronization is provided by the container format while all audio related info is stored in the embedded audio stream.

The container format is handled by a software component "parser/splitter" which extracts the payload stream (here: audio) and sends it to the decoder. The decoder can only handle such a raw audio stream, it doesn't need to know anything about sync codes, checksums or error correction codes.

Of course, this raw audio stream must be made up of independent frames to allow the decoder to start decoding at any frame in the stream. This is the case for most audio codecs, including YALAC.

I think that Yalac should be designed in such fashion that the container can be replaced with a different one if desired. And if you have a copy of the file header, you can decode the stream starting at any point.

In my opinion support for seeking should be kept in the container, but there is probably a case for including it in the stream instead.

Seeking is a feature of the container format. The raw audio stream should not be overloaded with this unrelated (non-audio) stuff.

For a good example of this concept (separation of container format and audio content) just have a look at Ogg and Vorbis.

YALAC - File format development

Reply #20
Perhaps, you can ask the Matroska Crew if they want to help you. I'll think they'll be glad to see such a promising codec natively in their container.
A nice option would be to select manually wether you want to include these error checking features when encoding.

YALAC - File format development

Reply #21
Matroska really is a nice container, robust and featureful. And, what's important, it is already developed and supported by various software applications, which won't be the case with all-new format.
Infrasonic Quartet + Sennheiser HD650 + Microlab Solo 2 mk3. 

YALAC - File format development

Reply #22
Using something open source, flexible, and already designed seems to be a logical choice.  It might be more extensible to have the container and codec quite distinct from each other too... (Not that I really know a lot about this stuff).

YALAC - File format development

Reply #23
...And, what's important, it is already developed and supported by various software applications, which won't be the case with all-new format.


You lost me there.
The developer may choose any container (even a proprietary one exclusive to YALAC) and users will still depend on someone providing decoding plugins/support in players. Container support without audio format support is useless.
I see no relation between container choice and support of the audio format....

YALAC - File format development

Reply #24
Seems to me that the stream robustness info (sync codes, checksums) should be part of the stream. Metadata (sampling rate, bits per sample, channels) should be provided by the container. Container blocks do not have to match frame boundaries.

I think it's the other way around: framing and synchronization is provided by the container format while all audio related info is stored in the embedded audio stream.

The container format is handled by a software component "parser/splitter" which extracts the payload stream (here: audio) and sends it to the decoder. The decoder can only handle such a raw audio stream, it doesn't need to know anything about sync codes, checksums or error correction codes.

I confess that I don't know much about audio formats. The idea behind the separation I proposed is that the decoder needs to handle errors, either by fading away or smoothing the signal. The sync codes are useful to recover from errors. For me it makes more sense to deal with this at decoder level rather than at container level.

Does a splitter know about outputting audio? The only thing it can do is sending back an error message, that the decoder can do too.