Skip to main content
Topic: May help reading vorbis comments header [Split from: OGG Vorbis header bits] (Read 771 times) previous topic - next topic - Topic derived from OGG Vorbis header bits
0 Members and 1 Guest are viewing this topic.

May help reading vorbis comments header [Split from: OGG Vorbis header bits]

maybe of some help to anyone wanting to read vorbis comments.

Each packet starts with 'OggS' and the comments are in the second packet.

1. open the file and search for the second 'OggS'
2. From this point search for 'vorbis'
3. The next four bytes are a little endian number  so read these four bytes as a long number

In a HEX editor you will see as an example
76 6F 72 62 69 73 34 00 00 00
which is 'vorbis' followed by a four byte number in real life 00 00 00 34 (52 in decimal remember it is little endian)

4. Read in the next 52 bytes as a UTF-8 encoded string. This is the vendor string.

5. Read the next four bytes as a number this gives you the number of comments in the file.

6. The four bytes after this are the length of the comment.

7. Read in this number of bytes as a UTF-8 encoded string

Repeat steps 6 and 7 for the number of comments retrieved in step 5

All comments are in the form 'TITLE=Tell Me Why'

The first part before the = is the comment title (not TITLE) the second part is the comment content.

Some files have picture metadata in them I have not sorted this out yet but it is still read the same way.

Writing new comments I have not looked into yet but I think it is copy file to comment start to another temp file add new comments including total and length for each then from comments end in original copy all bytes to temp file, delete original file then rename temp to original.

Hope this helps





Re: May help reading vorbis comments header [Split from: OGG Vorbis header bits]

Reply #1
Understand a little more about the comments Page\Packet

4F 67 67 53 00 00 00 00 00 00 00 00 00 00 DD 5B 00 00 01 00 00 00 88 4D C7 08 10 DB FF FF FF FF FF FF FF FF FF FF FF FF FF FF

The first four bytes are always 4F 67 67 53 = "OggS"

The next part to look for is the stream serial number here it is DD 5B 00 00

After that is the sequence number 01 00 00 00 this shows that this page\packet is the first packet in the stream identified by the serial number.

Next is the page\packet CRC32 checksum 88 4D C7 08 obtained by setting these bytes to 00 and then working the CRC32 over the whole page\packet i.e. from OggS to the start of the next page which starts with OggS.
not including the second OggS

The next byte tells you the number of segments in this page\packet 10 (16 decimal)

Each segment can be anything from 0 to 255 bytes in length.

To calculate the size of lets call it a section here you read the next byte and if it is FF (255) read the next byte and add it to the previous one.

Keep adding these together until you find a byte which is less than FF (255)

So in the example above we have 10 followed by DB and we know that the comments section is the first section in this page\packet.

As DB is less than FF then length of the comments section is DB bytes in length.

The start of the first segment is the byte after the last segment read allways seems to be 03 I do not know why.

The file I have taken this example from shows the comments section as:-

03 76 6F 72 62 69 73 34 00 00 00 41 4F 3B 20 61
6F 54 75 56 20 5B 32 30 31 31 30 34 32 34 5D 20
28 62 61 73 65 64 20 6F 6E 20 58 69 70 68 2E 4F
72 67 27 73 20 6C 69 62 56 6F 72 62 69 73 29 05
00 00 00 12 00 00 00 54 49 54 4C 45 3D 42 69 63
79 63 6C 65 20 52 61 63 65 0C 00 00 00 41 52 54
49 53 54 3D 51 75 65 65 6E 09 00 00 00 44 41 54
45 3D 31 39 39 31 1E 00 00 00 4F 52 47 41 4E 49
5A 41 54 49 4F 4E 3D 48 6F 6C 6C 79 77 6F 6F 64
20 52 65 63 6F 72 64 73 3E 00 00 00 43 4F 4D 4D
45 4E 54 3D 66 72 65 3A 61 63 20 2D 20 66 72 65
65 20 61 75 64 69 6F 20 63 6F 6E 76 65 72 74 65
72 20 3C 68 74 74 70 73 3A 2F 2F 77 77 77 2E 66
72 65 61 63 2E 6F 72 67 2F 3E 01

Looks a mess but the first few bytes are   03 76 6F 72 62 69 73 which is  03 then vorbis

The next four bytes 34 00 00 00 (52) are the vendor string length

This means that the next 52 bytes are the vendor string.

This takes you to 62 69 73 29 on the fourth line down.

The next four bytes 05 00 00 00 tell you how many comments are included in this case 5.

The next part follows a sequence of length then comment repeated 5 times.

So next four bytes 12 00 00 00 (18) are the length of this comment in this case "TITLE=Bicycle Race"

and the next four bytes are length of next comment etc.

After extracting all the comments you find the last byte is 01 I have no idea why this is here but it is important that comments end with this.

The file will not play if missed out.

I am developing an ogg vorbis comment reader and writer without using libvorbis so all native code. I use PureBasic and you can find me on their forum at :-

https://www.purebasic.fr/english/

Anyone interested in helping would be welcome.

I hope this helps someone.

Re: May help reading vorbis comments header [Split from: OGG Vorbis header bits]

Reply #2
Holy thread resurrection!

Ogg Vorbis uses a "framing bit" to end the comment header.  It must be non-zero.  This is required and the stream is technically unreadable if it is missing.  Most other Ogg-encapsulated codecs do not use this.

Re: May help reading vorbis comments header [Split from: OGG Vorbis header bits]

Reply #3
There are numerous libraries that allows to create/edit Vorbis tags. No need to re-invent the wheel, but if you insist just analyze the source codes. Most are free. (vorbiscomment, taglib, etc). Your empirical way only make sense if no documentation or source code. Inverse engineering will not result in a trusted system unless you can generate all valid permutations and you can inverse engineer all reliably. Very hard even for the experienced people.

Just my 2 cents.

Re: May help reading vorbis comments header [Split from: OGG Vorbis header bits]

Reply #4
The main problem when anyone approaches ogg vorbis files is understanding that ogg is not vorbis and vorbis is not ogg.

The ogg file is split into pages and each page is split into segments, these segments store in the case of a vorbis bit stream the vorbis bit stream as defined by the vorbis specification.

The first thing is to learn how to decode an ogg file. Then look at the vorbis stream away from the ogg file.

If it helps you can look at this thread on a programmers forum (all code in basic) where you can see my journey in understanding this was played out. It includes details of reading ogg files etc and some detail of the vorbis stream.

vorbis comment reading

It does include source code for a comment reader which works very well. Remember there is no standard vorbis comments each read/write program seems to do it differently.

Writing comments is a whole different ball game however and a lot more complicated.

Hope this helps anyone attempting to understand these files.

collectordave

 
SimplePortal 1.0.0 RC1 © 2008-2019