Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Ogg Page Header (Read 6464 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Ogg Page Header

Why is there always to page headers in Ogg Vorbis files? The files start with to times "OggS" and the the comments. Is this something to do with Physical and Logical bitstreams? Please explain.

Ogg Page Header

Reply #1
Quote
Why is there always to page headers in Ogg Vorbis files? The files start with to times "OggS" and the the comments. Is this something to do with Physical and Logical bitstreams? Please explain.
[a href="index.php?act=findpost&pid=259527"][{POST_SNAPBACK}][/a]


It's a convention that the first page of a logical stream only contains one little codec "identification packet". This is supposed to make the demultiplexing easier. (Consider a multiplexed Theora/Vorbis stream).

The demultiplexer reads those pages and as soon it reads a "non-BOS" page (a page which is not marked as the first page of a logical stream, BOS=beginning of stream) it immediately knows how many logical streams have been multiplexed. Large initial pages with huge setup data just suck. So, the comment header packet and the codec setup packet are placed into the 2nd page (may span multiple pages).

If you plan to extract some informations about ogg files, you could make yourself familar with the available libs. Reinventing the wheel is a bit harder in this situation compared to ID3v1 / ID3v2


SebastianG

Ogg Page Header

Reply #2
Thanks for the answer. I'm trying to get the raw Vorbis data and compute a md5. This way my application will reconginze my files even if the meta data changes.

I see there is a lot more data in the second page than comments (at least in my test files). What is this?

Ogg Page Header

Reply #3
Quote
Thanks for the answer. I'm trying to get the raw Vorbis data and compute a md5. This way my application will reconginze my files even if the meta data changes.

I see there is a lot more data in the second page than comments (at least in my test files). What is this?
[a href="index.php?act=findpost&pid=259564"][{POST_SNAPBACK}][/a]


This is probably the "codec setup packet". This is very important for the decoder since no static codebooks have been defined in the standard.

You have two choices:
- read the f***ing documentation and reinvent the wheel
- read the f***ing documentation and make yourself familar with libOgg

SCNR! 

SebastianG

Ogg Page Header

Reply #4
I have read the documentation. Not to easy. I'm doing the coding in Java. If anyone could explain how to get the raw data, please write.

Ogg Page Header

Reply #5
Quote
I have read the documentation. Not to easy.
[a href="index.php?act=findpost&pid=259653"][{POST_SNAPBACK}][/a]


I can't help you if you don't tell me where the problem is. Did you successfully write something that can parse the ogg pages already ? Did you understand the packet/page concept (how packets are stored in pages) ?

Since you only want to hash the raw audio packets you can do it like this:
- write something that can parse ogg pages
- read the first ogg page and check if it contains a Vorbis codec ID packet
- read all following pages
- start hashing the raw packet data of the pages once you found the first page wich meets the following requirements: BOS-flag unset, continued-flag unset and the least significant bit of the first packet data byte of this page is zero.

BTW: watch out for multiplexed streams and check the pages' serial number


SebastianG

Ogg Page Header

Reply #6
Quote
Thanks for the answer. I'm trying to get the raw Vorbis data and compute a md5. This way my application will reconginze my files even if the meta data changes.
[a href="index.php?act=findpost&pid=259564"][{POST_SNAPBACK}][/a]


Why don't you just decode your files into WAV and run md5sum on that?

Just a thought.

Triza

Ogg Page Header

Reply #7
Quote
Quote
Thanks for the answer. I'm trying to get the raw Vorbis data and compute a md5. This way my application will reconginze my files even if the meta data changes.
[a href="index.php?act=findpost&pid=259564"][{POST_SNAPBACK}][/a]


Why don't you just decode your files into WAV and run md5sum on that?

Just a thought.

Triza
[a href="index.php?act=findpost&pid=259691"][{POST_SNAPBACK}][/a]


I guess stigc wants a simple and fast way to distinguish/identify Vorbis files while tags are modified and files are move to other directories. I'd calculate something like an SHA-256 (at least SHA-1) checksum of the raw audio packet data for this purpose to
a) minimize the probability of a hash collision
b) make it fast.

Decoding - especially doing the modified discrete cosine transform - is a quite CPU intense operation. So, if he doesn't need the decoded data, he should avoid decoding. Hashing could also be done on the compressed audio data...

Without knowledge about what stigc actually wants to do ... this discussion is a bit pointless, though...


SebastianG

Ogg Page Header

Reply #8
Yes i'm very concerned about t´he speed. It has to be a fast calculating hash. What about alder32 - it seems fast. Should i use it compared to others (md5, SHA-256)?

http://www.eskimo.com/~weidai/benchmarks.html

Your second step is:

"read the first ogg page and check if it contains a Vorbis codec ID packet"

How do i confirm that it is a Vorbis packet? I know about Ogg pages, thier headers and segments, i know about Ogg Comments, but i know nothing about Vorbis data.

 

Ogg Page Header

Reply #9
Quote
Yes i'm very concerned about t´he speed. It has to be a fast calculating hash. What about alder32 - it seems fast. Should i use it compared to others (md5, SHA-256)?
[{POST_SNAPBACK}][/a]


It depends on what you actually want to do with this hash code. If you only want to protect the file to be able to detect transmittion errors, you can use adler32. If you try to protect a file to be able to detect a change by some evil person which tries to fool you you better make use of a cryptographic hash function like MD5 (128 bit) or SHA-1 (160 bit). If you plan to use the hash code as a key in a database you should take at least a 160 bit hash to minimize the probability of hash collisions.

BTW: Each ogg page contains a CRC32 hash for error detection.

Quote
Your second step is:

"read the first ogg page and check if it contains a Vorbis codec ID packet"

How do i confirm that it is a Vorbis packet? I know about Ogg pages, thier headers and segments, i know about Ogg Comments, but i know nothing about Vorbis data.
[a href="index.php?act=findpost&pid=259730"][{POST_SNAPBACK}][/a]


The following links are relevant. The Vorbis "header packets header" are explained at the first pages of the Vorbis bitstream spec.

[a href="http://www.xiph.org/ogg/vorbis/doc/oggstream.html]Ogg logical/physical bitstream overview[/url]
Ogg logical bitstream/framing specification
Vorbis bitstream spec Overview


SebastianG