Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Matroska file format definition (Read 5343 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Matroska file format definition

I've tried a couple times to make sense of the Matroska file format, as defined on www.matroska.org (which is down as I type this), but I honestly can't make sense of the scheme used to describe the file structure. I read through the documentation and tried comparing it to sample files I'd created, but I can't relate the two. I need to find out basic information such as bitrate, playtime, resoution, sample rate, codecs, etc.  Can someone walk me through how to find that information, please?


Matroska file format definition

Reply #2
Those are very basic "What is Matroska? How do I play it" kind of guides. I couldn't find anything linked from there either.

Looking at the Matroska specs, I can see that all my sample files begin with [1A][45][DF][A3] as expected, but beyond that I'm lost. I can't relate the subsequent bytes to what I'm seeing in the specs. After those first 4 bytes, I see [93][42][82][88], then 'matroska', etc.

I guess what I'm looking for is either a guide for how to parse a Matroska file, or at least how to understand the specs as they're presented.


Matroska file format definition

Reply #4
What's to figure out? It's communist!

j/k
Happiness - The agreeable sensation of contemplating the misery of others.

Matroska file format definition

Reply #5
getID3(), you need to understand EBML first, before understanding Matroska.

EBML is like this : [ID][Length of Data][Data]

The Data can contain other EBML elements, or just an integer, a float, a string, etc.

The ID and the Length are coded like UTF-8. That means depending on the first bits of the ID/Length, you will have an ID/Length that span on 1,2,...,8 bytes (octets).

Once you understand that it's easy to see how Matroska works (I guess). The first element is a Level 0 element. And contains a list of other EBML element in its Data (this is called a Master element). All the elements contains other elements, that sometimes contain just valuable data (int, float, string, date, etc). That makes a hierarchy tree between the data. That's why it's close to the XML principles.

I hope it's a bit clearer now.