Skip to main content

Topic: Experimental release of Ghost/CELT 0.0.1 (Read 543031 times) previous topic - next topic

0 Members and 2 Guests are viewing this topic.
  • jmvalin
  • [*][*][*][*][*]
  • Developer
Experimental release of Ghost/CELT 0.0.1
I've just made the first public release of some new *experimental* codec work I've been doing (part of the vague Ghost project) with help from Monty and Timothy. This is mainly intended for developers with DSP knowledge, not for doing anything useful with it (but it does encode and decode already). Also, the main idea is *not* to replace either Speex or Vorbis, but to code audio with really low latency -- currently 8 ms.

This is still very experimental and everything is still likely to change, including the exact goals. The algorithm is called (temporary name) Code-Excited Lapped Transform (CELT) and the main ideas are:
- Using an MDCT on very short frames
- Dividing into 15 bands and transmitting the energy for each band
- Using a pitch predictor (good for speech, but helps for music as well).
- The rest is coded using a unit-pulse codebook.
At this point, I'm still trying to figure out how to fit psychoacoustics into this.

CELT is based on a paper I submitted to ICASSP and which I'm hoping will be accepted so I can make it available to everyone. The only difference is that the ICASSP paper was based on the FFT (non critically sampled), whereas this version is based on the MDCT. One part that is already published though is Tim's explanation of the pulse codebook encoding.

The full source for CELT is available at: http://downloads.us.xiph.org/releases/celt/celt-0.0.1.tar.gz or through git at http://git.xiph.org/celt.git

I've put some music samples at 56 kbps CBR at http://people.xiph.org/~jm/comp_celt58cbr.wav with the original at http://people.xiph.org/~jm/comp44.wav . As you can hear, it definitely doesn't suck as much as Speex on music, but there's still room for improvement.

I'm open to interesting ideas, but don't bother complaining if it doesn't work or if it explodes in your face  :-)  Oh, and don't expect a final codec any time soon.

Have fun!

  • radorn
  • [*][*]
Experimental release of Ghost/CELT 0.0.1
Reply #1
So this codec is not intended to eventually replace vorbis?

From what monty said here, I thought that was the case.
  • Last Edit: 20 December, 2007, 08:42:12 AM by radorn

Experimental release of Ghost/CELT 0.0.1
Reply #2
So this codec is not intended to eventually replace vorbis?

From what monty said here, I thought that was the case.

he said the codec is part of the ghost project, maybe we'll see another codec soon?
  • Last Edit: 20 December, 2007, 10:05:10 AM by Benjamin Lebsanft

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Experimental release of Ghost/CELT 0.0.1
Reply #3
So this codec is not intended to eventually replace vorbis?

From what monty said here, I thought that was the case.


What I mean is that in its current form, there's no way it'll beat Vorbis. However, if you add sinusoidal coding and use CELT just to encode the noise, that's another matter. That's what Monty is currently hoping to do with Ghost. Whether we'll end up with one codec or two is still an open question. Even then, a codec that will replace Vorbis is still a long way off, so don't hold your breath.

For those interested, I've posted this overview of CELT along with some technical details on energy coding.

  • radorn
  • [*][*]
Experimental release of Ghost/CELT 0.0.1
Reply #4
Sorry for the misunderstanding and thank you both for clarifying 

I'm not holding my breath, though, I'm pretty happy with vorbis, it was just that I remembered that from monty and sounded a bit contradictory to me.

  • SebastianG
  • [*][*][*][*][*]
  • Developer
Experimental release of Ghost/CELT 0.0.1
Reply #5
For those interested, I've posted this overview of CELT along with some technical details on energy coding.


Nice!

What's the rationale behind "pulse code books"? I read Timothy's pulse coding web site and I'm not sure of what he's trying to do. Is it an attempt to uniformly tesselate the unit sphere or only a code book specialized tor "pulsy" signals? Judjung by the way it is used in CELT I'd say it's the former -- which is weird because at first glance the formulas don't look like the resulting codebooks do tesselate the unit sphere well. It might be me as I'm not familiar with those kind of specialized vector code books.

Regarding pitch prediction: Am I correct when I say what you do is pretty much the same thing as LTP (MPEG4)?

Cheers!
SG

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Experimental release of Ghost/CELT 0.0.1
Reply #6
What's the rationale behind "pulse code books"? I read Timothy's pulse coding web site and I'm not sure of what he's trying to do. Is it an attempt to uniformly tesselate the unit sphere or only a code book specialized tor "pulsy" signals? Judjung by the way it is used in CELT I'd say it's the former -- which is weird because at first glance the formulas don't look like the resulting codebooks do tesselate the unit sphere well. It might be me as I'm not familiar with those kind of specialized vector code books.


In the CELT context, the pulse codebook has the following properties:
  • Decent (though suboptimal) tesselation of the sphere
  • algebraic representation (no storage)
  • slight "bias" towards impulses (i.e. tonal signals)
  • Fast recursive search even for huge codebooks (one pulse at a time)

Of course, none of that is frozen, so I'm always open to other ideas. Did you have something in particular in mind?

Regarding pitch prediction: Am I correct when I say what you do is pretty much the same thing as LTP (MPEG4)?


I'm not aware of any other codec doing things like I'm doing, but then again, I've no idea what this mpeg4 LTP is or does. Can you give some details about that codec? BTW, the next part on CELT will be on the pitch prediction.

  • SebastianG
  • [*][*][*][*][*]
  • Developer
Experimental release of Ghost/CELT 0.0.1
Reply #7
In the CELT context, the pulse codebook has the following properties:
(...)

Oh, okay. I almost forgot the "algebraic representation" thing. You're right, this is a big advantage.

Of course, none of that is frozen, so I'm always open to other ideas. Did you have something in particular in mind?

No, sorry. None that fits the strict gain/shape separation you are striving for. But there is something that I always wanted to see tested: A combination of trellis-coded quantization with subtractive dithering. "Subtractive dithering" is achieved by a randomized trellis graph using the same pseudo random number generator within both, the encoder and decoder. I believe it's worth a try because
  • En- and decoding is fast and doesn't require big tables either
  • It's possible to fade seamlessly between PNS (perceptual noise substitution) and high rate coding.
  • No odd (ie metallic sounding) artefacts besides white noise even on large MDCT blocks (dithering is "the right thing to do"^TM which unfortunately got lost on the way to the lossy coding world -- until now -- *hint hint*)  ;-)
  • it should be possible to keep the quantized signal's energy close to the original's even at very low rates (thanks to subtractive dithering and adaptive rate-distortion optimization)
I'm not aware of any other codec doing things like I'm doing, but then again, I've no idea what this mpeg4 LTP is or does. Can you give some details about that codec? BTW, the next part on CELT will be on the pitch prediction.

LTP is an MPEG4-AAC tool and is short for long term prediction. The decoder just remembers the past decoded samples. The encoder might signal that the following block looks similar to one that was xyz samples ago (aka "pitch period"). Then the decoder would do an (forward) MDCT on the block referenced by the "pitch period", apply frequency-adaptive prediction gains and subtract this from the current MDCT block's samples.

I'm not sure right now but I guess LTP can be coupled with LD-AAC (low delay, 480 samples/frame) which would then be very similar to your CELT except for the special pulse codebook.

Cheers!
SG
  • Last Edit: 23 December, 2007, 06:24:48 AM by SebastianG

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Experimental release of Ghost/CELT 0.0.1
Reply #8
No, sorry. None that fits the strict gain/shape separation you are striving for. But there is something that I always wanted to see tested: A combination of trellis-coded quantization with subtractive dithering. "Subtractive dithering" is achieved by a randomized trellis graph using the same pseudo random number generator within both, the encoder and decoder. I believe it's worth a try because

  • En- and decoding is fast and doesn't require big tables either
  • It's possible to fade seamlessly between PNS (perceptual noise substitution) and high rate coding.
  • No odd (ie metallic sounding) artefacts besides white noise even on large MDCT blocks (dithering is "the right thing to do"^TM which unfortunately got lost on the way to the lossy coding world -- until now -- *hint hint*)  ;-)
  • it should be possible to keep the quantized signal's energy close to the original's even at very low rates (thanks to subtractive dithering and adaptive rate-distortion optimization)


I'm not sure I fully understand the dithering part, but I was thinking or something (possibly) similar to prevent tonal noise. Basically, the idea was to perform a pseudo-random rotation on the input before encoding with the pulse codebook. On the decode side (and encoder synthesis), the inverse rotation is applied. I described it in an ICASSP paper which, unfortunately didn't get accepted (email me for a copy). Of course, I'd still like to hear more about your idea.

Cheers,

  Jean-Marc

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Experimental release of Ghost/CELT 0.0.1
Reply #9
Here's some information on the pitch predictor.

  • SebastianG
  • [*][*][*][*][*]
  • Developer
Experimental release of Ghost/CELT 0.0.1
Reply #10
Here's some information on the pitch predictor.


Thanks! Yup, it looks "LTPish".

Another thought that just struck me is: Did you check whether there's any gain in subsample delay accuracy? There's a slim chance that the probability distributions of the prediction gains for higher bands look different with subsample delay accuracy because of the higher phase error at higher frequencies.

subsample delay accuracy could be achieved by doing both, the MDCT (X[]) and an MDST (Y[])  on the referenced block (as kind of real and imaginary components) and predict the current k-th MDCT sample by
Code: [Select]
pred[k] = gain * real( (X[k] + Y[k]*i) * e^(ssd*(k+0.5)*2*pi*i) )

with
i = sqrt(-1)
0 <= k < n, k=current frequency index, n=number of transform coeffs
ssd = subsample_delay / n

The MDST code could reuse most of the MDCT code. Or one could write a dedicated MCLT (modulated complex lapped transform) which is really nothing more than both MDCT and MDST combined (but easier to compute together).

Some more stuff to digest is waiting in your EMail in-box

Cheers!
SG

  • Kef
  • [*][*][*]
Experimental release of Ghost/CELT 0.0.1
Reply #11
I've just made the first public release of some new *experimental* codec work I've been doing (part of the vague Ghost project) with help from Monty and Timothy. This is mainly intended for developers with DSP knowledge, not for doing anything useful with it (but it does encode and decode already). Also, the main idea is *not* to replace either Speex or Vorbis, but to code audio with really low latency -- currently 8 ms.

This is still very experimental and everything is still likely to change, including the exact goals. The algorithm is called (temporary name) Code-Excited Lapped Transform (CELT) and the main ideas are:
- Using an MDCT on very short frames
- Dividing into 15 bands and transmitting the energy for each band
- Using a pitch predictor (good for speech, but helps for music as well).
- The rest is coded using a unit-pulse codebook.
At this point, I'm still trying to figure out how to fit psychoacoustics into this.

CELT is based on a paper I submitted to ICASSP and which I'm hoping will be accepted so I can make it available to everyone. The only difference is that the ICASSP paper was based on the FFT (non critically sampled), whereas this version is based on the MDCT. One part that is already published though is Tim's explanation of the pulse codebook encoding.

The full source for CELT is available at: http://downloads.us.xiph.org/releases/celt/celt-0.0.1.tar.gz or through git at http://git.xiph.org/celt.git

I've put some music samples at 56 kbps CBR at http://people.xiph.org/~jm/comp_celt58cbr.wav with the original at http://people.xiph.org/~jm/comp44.wav . As you can hear, it definitely doesn't suck as much as Speex on music, but there's still room for improvement.

I'm open to interesting ideas, but don't bother complaining if it doesn't work or if it explodes in your face  :-)  Oh, and don't expect a final codec any time soon.

Have fun!


Cheers! It's nice to see there is activity from the xiph organization.  I haven't tried your new codec yet but it is definitely the next thing on my to-do list.

Now, I know this is not your "thing"  or your responsibility but what about Vorbis and Theora? There's been very little activity on both fronts and as user and proponent of open source applications, I am deeply concerned to say at least. Vorbis is a great codec and my music collection is > 50% Vorbis right now (and the rest mp3) but the lack of activity, except the tunings of Ayomi, makes it very difficult to continue using it. 
It's like Vorbis has been completely abandoned by xiph and I believe this is one of the reasons why people start to look for alternatives. I think this is at least one of the reasons why the Vorbis usage has dropped significantly the last two years here at HA.

http://www.hydrogenaudio.org/forums/index....showtopic=60145
http://img406.imageshack.us/my.php?image=lossydi7.png

Please at least make Ayomis tunings into the official branch at xiph. There are many companies which only use the official xiph released versions of Vorbis (like http://www.hbr1.com/ for example) and the latest tunings  are from 2004... Not a good sign.

Not even to mention Theora, which was donated by On2 in 2002. It's now 2008 and it's still in beta stage and the quality is closer to mpeg-1 than mpeg-4...

Please don't take this criticism personally. It's not anyone's fault. But some development form xiph's side would be great. Instead of focusing on future codecs, why not maintain those you already have? SPEEX and FLAC are doing great. Vorbis and Theora, not so great. If I was a better programmer than I am ,  I would not hesitate to join xiph and at least try to help out but the sad fact is I'm simply not up to the task, at least not now, and the only thing I can do is to write crap on internet forums as I have proved today.

Anyway, thank you for your time and please forward my concerns to the xiph foundation. Thank you for your great work with Speex!

/Kef

  • Nicos
  • [*][*]
Experimental release of Ghost/CELT 0.0.1
Reply #12
I absolutetly agree with what u stated above Kef. Im using ogg myself and the 2/3 of my collection is in ogg. But whats happening with this format? we dont see any active actions on it. Im really curious too about the future of it.

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Experimental release of Ghost/CELT 0.0.1
Reply #13
Now, I know this is not your "thing"  or your responsibility but what about Vorbis and Theora? There's been very little activity on both fronts and as user and proponent of open source applications, I am deeply concerned to say at least. Vorbis is a great codec and my music collection is > 50% Vorbis right now (and the rest mp3) but the lack of activity, except the tunings of Ayomi, makes it very difficult to continue using it. 
It's like Vorbis has been completely abandoned by xiph and I believe this is one of the reasons why people start to look for alternatives. I think this is at least one of the reasons why the Vorbis usage has dropped significantly the last two years here at HA.

http://www.hydrogenaudio.org/forums/index....showtopic=60145
http://img406.imageshack.us/my.php?image=lossydi7.png


Actually, Theora has been under <b>very</b> active development in the past few months by Monty (see http://svn.xiph.org/branches/theora-thusnelda/). As for Vorbis, there are still some things to improve, but it's already quite mature. Keep in mind that it's not because you don't see anything that we're not doing anything. That being said, we do have a "resource" problem. All of us have day jobs, little of which involves Xiph codecs. If more people helped instead of complaining that nothing's done, we'd be moving much faster. BTW, not everything we do requires advanced signal processing knowledge. All the time we spend maintaining websites, fixing documentation, makefiles, ... is time not spent on the core codecs.

  • johnsonlam
  • [*][*][*]
Experimental release of Ghost/CELT 0.0.1
Reply #14
Keep in mind that it's not because you don't see anything that we're not doing anything. That being said, we do have a "resource" problem. All of us have day jobs, little of which involves Xiph codecs.


Most of us here know you developers have a hard time working day and night, thank you very much.

All we can do is waiting patiently, hopefully a few lines of text one or twice a month can ease the pain of the people waiting endlessly.

Thanks again.
Hong Kong - International Joke Center (after 1997-06-30)

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Experimental release of Ghost/CELT 0.0.1
Reply #15
All we can do is waiting patiently, hopefully a few lines of text one or twice a month can ease the pain of the people waiting endlessly.


Waiting endlessly for what exactly?

  • Brent
  • [*][*][*]
Experimental release of Ghost/CELT 0.0.1
Reply #16
A sense of "progress" I suppose. Every now and then there's a new Nero AAC release or a new Lame, but from the Xiph codecs there's rarely official news. Although it's a fully OS codec, development aparently isn't that open to us mere fanboys/users.

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Experimental release of Ghost/CELT 0.0.1
Reply #17
A sense of "progress" I suppose. Every now and then there's a new Nero AAC release or a new Lame, but from the Xiph codecs there's rarely official news. Although it's a fully OS codec, development aparently isn't that open to us mere fanboys/users.


Then maybe what you want is subscribing to the Commits mailing list or following the Xiph.Org Trac. You'll see announcements for lots of "minor releases". So far, there's been 200 of them this month, by at least a dozen different people.

  • Saoshyant
  • [*][*]
Experimental release of Ghost/CELT 0.0.1
Reply #18
Xiph also made an announcement of everything that happened last year, and it's quite a lot.  Add to that the release of a new version of vorbis-tools, and I'd say things are going just fine for an organization with so few volunteers.
Join //spreadopenmedia.org to promote Vorbis, FLAC, Speex, etc

  • Bourne
  • [*][*][*][*][*]
  • Banned
Experimental release of Ghost/CELT 0.0.1
Reply #19
-
  • Last Edit: 01 April, 2008, 10:50:15 PM by Bourne

  • Saoshyant
  • [*][*]
Experimental release of Ghost/CELT 0.0.1
Reply #20
The recent aoTuV tunings have not yet been merged with the official tree because it's no easy matter.  It requires a lot of testing and studying both the code and the results.  On a recent talk, Monty mentioned it would be a work of two full-time weeks, which surprised me as I had no idea he would be so through.  However, he also mentioned it would be something that would be likely done this year, which I suppose is good news for all those waiting for the merge.

The Xiph people are doing some good work.  Put some faith in them and keep spreading those files.  And if you want to take a more direct approach to help out, join in and contribute somehow.  There will always be something you (and anyone else) can do to speed up progress.
Join //spreadopenmedia.org to promote Vorbis, FLAC, Speex, etc