Speech compressed as MP3 for Stream and Download

Topic: Speech compressed as MP3 for Stream and Download (Read 11632 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Speech compressed as MP3 for Stream and Download

2003-07-15 17:54:49

Hi,

I have a lot of speeches/lectures as *.wav files and I want to publish them on a website. First I want to offer a stream, so I need one version with CBR. As the stream does not sound too well, I also want to offer a better download-version.

Currently my two versions are:
- 24kbps/16khz CBR (Would you recomend 22.5Khz?)
- 64 Kbit/32Khz ABR (which still is mpeg1 Layer3)

The following restrictions are given:

- The 'clients' are not MP3-experts, they simly want to download the speech an hear it (as mp3 in portable-players, or they burn it as a audio-cd). So the files have to be compatible (thats why I can not use speex or ogg).
- download should also be interesting for modem/isdn-users (about: do not download longer than the playtime of the file is!)

After a very long testing period, I decided to try a FhG Encoder (exspecialley for the low-qualy-stream-version), since LAME is not that good with low bitrates (For a good discussion about optimizing LAME for low bitrates, I can recomend: 'Lame settings for Speech').

Finally there are these questions left:
Stream
1) Which enoder is best for encoding (e.g. Fastencc?). Althogh non-free programs are welcome.
2) How do I configure this Program for the stream version?
3) Whre Do I get this Encoder? Perhaps: a good frontend? (yet Razor-LAME)

mid-qualy download version
4) Do you think lame does better than the encoder from the stream-version for about 64-80 Kbit/32Khz? If yes: what command line would you suggest? (Currently: -b 32 -B 160 -a --abr 64 -F --resample 32 --lowpass 16 -h -c)
5) will LAME be also optimized for speech/low bitrates in future versions? I heared about that...)

Thanx a lot!
.lu

Speech compressed as MP3 for Stream and Download

Reply #1 – 2003-07-15 18:09:06

I recommend you use WMA with it's ACELP.net encoder. It's a voice-targeted encoder, that offers very good sound quality at very low bitrates.

Usage is as no-brainer as it is with MP3. All you need is any program that supports WMA.

Burning to CD is also easy. Most CD writers support WMA out of the box.

And, of course, it would be very interesting for Modem users: You can use bitrates around 10kbps

Just my 10 cents...

Speech compressed as MP3 for Stream and Download

Reply #2 – 2003-07-16 12:14:46

Quote

I recommend you use WMA with it's ACELP.net encoder.

Ok, thx. I am no pro in mp3 and I dont know anything about WMA yet. So I have some further questions (Yes, I know we are talking about Windows Media Audio (MS Website). ).
The description by acelp.net sounds pretty good, as yours does.

Quote

Usage is as no-brainer as it is with MP3.

Never heard about 'no brainer' ?!

Quote

All you need is any program that supports WMA.

Hu, that sound so easy Is there a similar method to Lame&RazorLame to compress WAV to WMA with the ACELP codec?
Do you know how much I have to pay for using acelp.net / wma?

Thanks again, .lu

Speech compressed as MP3 for Stream and Download

Reply #3 – 2003-07-16 14:55:32

Quote

Never heard about 'no brainer' ?!

"easy". You need no brain to use it.

Quote

Hu, that sound so easy Is there a similar method to Lame&RazorLame to compress WAV to WMA with the ACELP codec?

Yes, use the Windows Media Encoder. (Easy to find inside Microsoft's WindowsMedia site)

Quote

Do you know how much I have to pay for using acelp.net / wma?

I think you don't need to pay streaming royalties to Microsoft. You'll probably need to pay them if you want to use their streaming server (that is commercial software).

Regards;

Roberto.

Speech compressed as MP3 for Stream and Download

Reply #4 – 2003-07-16 21:23:01

Quote

"easy". You need no brain to use it.

ok.. this seems quite perfect for me

Do I need a special streaming-server - or can I just link the file, similar to mp3 streaming (by linking a *.m3u)?

What do you think: Is wma still better for the higher qualy version (about 64-80kbps and 32KHz)? Or is Mp3 similar (I would prefer MP3 here, it is still more compatible & the 'M' does not stand for M$ .) And: Is the Frauenhofer Codec better than LAME to create a 32Khz/80kbit ABR File?

Thanx again, .lu

Speech compressed as MP3 for Stream and Download

Reply #5 – 2003-08-02 18:31:53

Hi,

WMA was a good idea, thanx again. I testet a lot, and I think it is really great for speech-kompression. OGG is also great, but less compatible for inexperienced users. I think WMA sounds smoother, OGG more direct...

Is there a possibility to use the ACELP.net WMA Encoder without the Windows Media Encoder (I'd like something like RazorLame)?

Besides: Streaming WMA is quite simple:
msdn.microsoft.com about streaming wma

Some questions left:

As a mid-qualy download version I want to create a MP3 with either 80 kbps or 64 kbps / 32 khz.
LAME is not optimized for that, which Encoder would you recommend?
-Fraunhofer Mp3ENC (demo: ftp.iis.fhg.de/pub/layer3/mp3encdemo_3_1_win32.zip - full ?)
-Fraunhofer AlternateHQ (?)
-Fraunhofer FastENC (full: http://www.geocities.com/fastenc/)
-...

I'd read somewhere that the MusicMatch Jukebox incluedes an unlimmited FhG codec?! Which one? Can I use it without the Jukebox?

Do you think ABR or VBR is much better @ 80kbps ? - otherwise I would prefer CBR, because it is much more compatible with portable players.
thx.. .lu

Speech compressed as MP3 for Stream and Download

Reply #6 – 2003-08-02 18:37:44

Try --preset medium (VBR) in LAME - for speech files, it might produce bitrates you want
or --alt-preset <bitrate> (ABR), --alt-preset cbr <bitrate>
Or use --preset help and pick one that suits you - beware, they're mostly old.

MMJB employs modifies FhG codec, which had many bugs, that affected its quality.

Speech compressed as MP3 for Stream and Download

Reply #7 – 2003-08-02 18:41:19

Quote

Is there a possibility to use the ACELP.net WMA Encoder without the Windows Media Encoder (I'd like something like RazorLame)?

You can use dBpowerAMP, it's a very good encoder.

Speech compressed as MP3 for Stream and Download

Reply #8 – 2003-08-02 19:03:32

jsut wanted to throw my opion

i have uses acelpt.net for looongtime for audio books. (for blind people)
and I'm very happy with the sound quailty of aceltpÃ¥ net.
its not perfect identical to the original. (who cares) but the vocie is very clear(this is important)

i used it in highest mode which is around 16kbits. you can go down to 8kbits and it still sound lisenable for story telling.

i haven tried spexx yet. because its darn difficult to get blind people to install winamp and a extra plugin. its much easier with just wma and wmp

Speech compressed as MP3 for Stream and Download

Reply #9 – 2003-08-02 20:42:24

Thanx, all! I thought I am near a final solution, you are confusing me again.

Quote

Try --preset medium (VBR) in LAME - for speech files, it might produce bitrates you want
or --alt-preset <bitrate> (ABR), --alt-preset cbr <bitrate>
Or use --preset help and pick one that suits you - beware, they're mostly old.

I pursued some Diskussions abut LAME command lines for speech compression (a good one: Lame Settings For Speech). But I finally made up my mind, that LAME is NOT the right encoder for 80kbps / 32KHz... (?!)

Quote

MMJB employs modifies FhG codec, which had many bugs, that affected its quality.

I guess MMJB stands for MusicMatch JukeBox. Would you recommend MMJB today? (Ok, I think it is terrible, but I just mean the Codec).

Whats about FastENC? Finally, I would spend some bucks on a encoder, if it provides really better quality than the others... but differences must be hearable!

Quote

You can use dBpowerAMP, it's a very good encoder.

First you mentioned the ACELP.net / WMA Encoder.. Do you think dBpowerAMP is equivalently good or even better? So I'd need the dMC + WMA PlugIn?

Quote

I'm very happy with the sound quailty of aceltpÃ¥ net.

If I understood you right, you used a modified ACELP.net encoder? Is quality similar or better? I'm happy with ACELP.net, too...

Quote

you can go down to 8kbits and it still sound lisenable for story telling

Well I compressed a 30min WAV(300MB) down to less than 4 MB with WMA @ 19kbps - this ca be downloaded in less than 10 minutes, even with a slow connection (56k modem). Thats enough - and quality should not suffer more for my requirements.

Whats about the difference between MP3 @ 80kbps CBR // VBR ?

.. .lu

Speech compressed as MP3 for Stream and Download

Reply #10 – 2003-08-02 21:38:15

Quote

First you mentioned the ACELP.net / WMA Encoder.. Do you think dBpowerAMP is equivalently good or even better? So I'd need the dMC + WMA PlugIn?

dBpowerAMP is just a frontend for the standard WMA/Acelp encoding libraries provided by microsoft. Any program that encodes to WMA is actually using these libraries.

Quote

Whats about the difference between MP3 @ 80kbps CBR // VBR ?

Go with CBR. It will probably be overkill anyway given you are encoding 32kHz/Mono...

And I would recommend using either MP3enc 3.1 or MusicMatch's Fastenc

Speech compressed as MP3 for Stream and Download

Reply #11 – 2003-08-02 23:09:01

I experiment a lot with dBpowerAMP and I guess, you found what I was looking for - great! Some questions still occure, I will talk back in some days, if I can't find an answer by my one.

Quote

And I would recommend using either MP3enc 3.1 or MusicMatch's Fastenc

MP3Enc (info @ dBpowerAmp Website) is about 200$. Uff. Does anybody know if differences between MP3Enc and FastEnc or LAME are large?
You wrote 'MusicMatch's Fastenc' - are there different versions of FastEnc? Does Music Match's FastEnc differ from FHGs FastEncc?

Quote

Go with CBR. It will probably be overkill anyway given you are encoding 32kHz/Mono...

I dont really understand (just a language problem?!). Perhaps you can say it again with different words?

Thanx alot
.. .lu

Speech compressed as MP3 for Stream and Download

Reply #12 – 2003-08-02 23:20:59

Quote

MP3Enc (info @ dBpowerAmp Website) is about 200$. Uff. Does anybody know if differences between MP3Enc and FastEnc or LAME are large?
You wrote 'MusicMatch's Fastenc' - are there different versions of FastEnc? Does Music Match's FastEnc differ from FHGs FastEncc?

Fastencc uses very old encoding routines. Quality is bad because of some serious bugs.

Musicmatch's encoding routines are up-to-date and they are of high quality.

MP3enc is based on FhG's professional encoder, and I would guess that Musicmatch's slow setting uses that encoder. I can't say for sure since I don't have Musicmatch here.

Speech compressed as MP3 for Stream and Download

Reply #13 – 2003-08-15 11:35:33

Quote

MP3enc is based on FhG's professional encoder, and I would guess that Musicmatch's slow setting uses that encoder.

I asked MusicMatch about the Encoder, they only replied

Quote

MusicMatch Jukebox uses multiple Fraunhofer encoders based on the mode
and settings used to record your audio tracks. Which encoder is used
for which modes is proprietary information that we can not disclose.

But, as I already wrote @ 'Lame Settings For Speech?', I compared LAME and FhG's MP3Enc. Propably, there might be differences (which are interesting for certain people). For me, differences I can not hear, are not 200$ (Mp3Enc) worth.

edit 1&2: (But I'm excited about the (specch-) results of the 64kbps test...)

Besides, dBpowerAmp with LAME for MP3 and ACELP.net for WMA9 works great. Only one question, yet:
- Which lite & easy to use Software-Players would you recommend to play WMA9 and of course MP3? (Again, for ordinary users. So e.G. foobar2000 would not be the best solution, I think.)
Something beside Winamp and Windows Media Player?

Speech compressed as MP3 for Stream and Download

Reply #14 – 2003-08-15 14:33:03

Just on a sidenote.
If you like Ogg Vorbis - like I do.
You can still provide it for no brainers using, Jorbis. It's a web java applet, that will load and playback Vorbis files without any software besides the browser needed at the clients.

Example:
http://www.jcraft.com/jorbis/player/JOrbis...er/songname.ogg

Working example:
http://www.jcraft.com/jorbis/player/JOrbis...er-airwaves.ogg

Speech compressed as MP3 for Stream and Download

Reply #15 – 2003-08-22 09:32:22

another thing i dont understand:
As I mentioned at 'Lame settings for Speech', I tried and could not find significant differences between 64 and 80 kbps mp3 with great testfiles on great speakers.
Now, in practice, the 64kbps file clearly sounds worse, if my source file is no good quality (background noise...). It sounds more dull and there is more chirping. Why that?
edit: (Perhaps the noise also counts as (unwanted) information which has to be saved and this way there is less space left for the wanted signal?! Does a lowpass filter help?)

Besides: Still looking for easy and light players for WMA and MP3...
Valefor: nice hint, thanx. This time it is to late, but for the my project...

Speech compressed as MP3 for Stream and Download

Reply #16 – 2003-08-22 09:59:27

Quote

edit: (Perhaps the noise also counts as (unwanted) information which has to be saved and this way there is less space left for the wanted signal?!
...
)

That's correct.

Quote

Does a lowpass filter help?

[Edit]
I thought you already used a lowpass (as it's integrated in --alt-preset <bitrate> settings). Using a lowpass for speech is a very good idea, although it probably won't help much against the negative effects of noise specifically.
[/Edit]

Another option would be using a sound editor's noise reduction (e.g. Cool Edit /Adobe Audition) before encoding. Again it's a trade-off as the more noise you take away the more likely you get artifacts that probably get worse by encoding.

Edit: I just noticed in the other thread that you use 32kHz sampling rate and don't mention any lowpass. In case you want to compress speech only you should use a lowpass, in my experience something between 5 and 8 kHz gives best results depending on the bitrate you want.

Speech compressed as MP3 for Stream and Download

Reply #17 – 2003-08-22 12:10:05

thx tigre!

Quote

Again it's a trade-off as the more noise you take away the more likely you get artifacts that probably get worse by encoding.

that fits what I observed!

Quote

I thought you already used a lowpass

first I did with RazorLAME. Now I switched to dbPowerAMP (because it supports WMA and is VERY easy to use - exactly what I need, because I have to explain that to some ordinary-user). dbPowerAMP uses the lame.dll - so I'm not shure if there is a lowpassfilter, yet?!

At least, I can use the filter of my editing software.

Would you suggest to use --alt-preset <bitrate> to create a 64kbps / 32Khz / mono file ? (is it possible?)

By the way, although it is a bit off-topic, how would you prepare the material (should not take toooo much time) ?
1) lowpassfilter
2) highpass?
3) noise reduction filter
4) expander/compressor?

And which programs would you choose? Currently I use GoldWAVE, because it is easy to use and has a great option to split a long file in several smaller ones.

I heard that Adobe's Audition now also has a feature to set 'split marks'...
Do you think the results are better if I use the adobe program? (I once used CoolEdit. Now I'm happy with GoldWave, which easy to use, cheep and has a nice author. But I'm not shure if quality sometimes suffers more than it would with CoolEdit?)

Speech compressed as MP3 for Stream and Download

Reply #18 – 2003-08-22 12:40:01

Quote

Now I switched to dbPowerAMP (because it supports WMA and is VERY easy to use - exactly what I need, because I have to explain that to some ordinary-user).

[edit]Questions I was going to ask here already answered in your 1st post[/edit]
If you use something less easy for *en*coding what'll be the difference for the ordinary-user (besides maybe better quality)?

Quote

dbPowerAMP uses the lame.dll - so I'm not shure if there is a lowpassfilter, yet?!

I've had a quick look at DBPoweramp's options. You can use CBR presets like --alt-preset cbr 64 which have integrated resampling and lowpass but I haven't seen a possibility to convert to mono. Maybe it'd be better to use a frontend for lame.exe.

edit: part II

Quote

Would you suggest to use --alt-preset <bitrate> to create a 64kbps / 32Khz / mono file ? (is it possible?)

I'd suggest to use lame.exe (+ frondend if needed - or create a .bat file for easy use) and
1) Find out what lowpass setting you can tolerate (the lower the better to avoid later encoding artifacts) by encoding some of your files with lame.exe from commandline like this: "lame --alt-preset standard --lowpass <x> infile.wav" with <x> values between 5.0 and 10.0. If you find that e.g. <x> = 7 still sounds ok for you,
2) choose the next possible sampling rate above 2*<x> (e.g. 2*7 = 14 -> 16kHz) for resampling.
The commandline would be --alt-preset CBR 64 --lowpass 7 --resample 16 -m m.
You might want to do some testing with switches mentioned in the other thread about encoding speech.
3) Highpass won't help much to avoid artifacts - and you'd have to use an external one as lame's lowest possible highpass is far too high to be useful for speech.
4) The only thing what could be done with a sound editor is noise reduction. I'd test to find a level of noise reduction small enough to avoid artifacts - and choose a noise reduction level even lower than that (e.g. 2dB) to have some headroom for encoding.

I don't have much experience with lame + bitrates that high for speech encoding (I usually go for 32kbps abr) but probably with a good commandline (mono, resampling, lowpass) it's not necessary to perform noise reduction before encoding unless noise is just too loud for your taste.

I wouldn't use compression/limiting as it increases noise level or causes noise modulation (bumping) which is more annoying than constant noise IMO.

Cool Edit Pro 2.1 (which I use and am satisfied with) is exactly the same as Adobe Audition. Can't tell how it compares to goldwave.

Speech compressed as MP3 for Stream and Download

Reply #19 – 2003-08-22 14:36:21

Cool Edit is a safer choice than Goldwave in terms of quality. Don't have much experience but I do know that for instance the resampling in Goldwave is pretty bad.

Speech compressed as MP3 for Stream and Download

Reply #20 – 2003-08-25 13:49:14

Thank you, again.

Quote

If you use something less easy for *en*coding what'll be the difference for the ordinary-user (besides maybe better quality)?

I've got to explain the encoding procedure to some ordinary-users, who later will perform the recording and encoding of the files (=speeches)...
dbPowerAmp is easy to use and supports both, wma and mp3. I'd like to use only one program... but if it really makes sense, I could use dbPowerMap and RazorLAME...

Quote

1) Find out what lowpass setting you can tolerate [...]

Ok. I'll see...

Quote

(the lower [the highpass-filter value,] the better to avoid later encoding artifacts)

Again, because there is less information, which has to fit into the given kbps-limitation!?

Quote

2) choose the next possible sampling rate above 2*<x> (e.g. 2*7 = 14 -> 16kHz) for resampling.

I do not understand all the technical background, but I guess: Sample rates above '2x[value of lowpassfilter]' do not make much sense?
If so, I dont understand, why @ Lame Settings For Speech they recommendet

Quote

24kbps speech:
--alt-preset 24 -a --resample 22 --lowpass 7 -Z

Yet, I used 32kbps, because it still is mpeg1 Layer3 (and this more compatible?!)..

For my requirements in terms of the quality, in my own experiments I did not like bitrates less than 64kbps. Would you also resemple to 16kHz (with lowpass 7) for 64kbps - or is 32kbps all right?

Quote

I'd test to find a level of noise reduction small enough to avoid artifacts - and choose a noise reduction level even lower than that (e.g. 2dB) to have some headroom for encoding.

Sorry, I don't really understand what all this means [span style='font-size:8pt;line-height:100%'](When I use the noise-reduction filter, I copy a part of the noise and than aply the filter. So there is no db value - see GoldWave Screenshot)[/span]

How does the noise reduction influence artifacts? What does 'headroom for encoding' mean?

Quote

it's not necessary to perform noise reduction before encoding unless noise is just too loud for your taste

well, with noise reduction it sounds better, unless there apears this squeeking noise... I'll try around a bit.

Quote

I wouldn't use compression/limiting as it increases noise level or causes noise modulation (bumping) which is more annoying than constant noise IMO

Some sources are really bad. They really sound better, if I reduce the (few) loud parts and then raise the whole level (maximize). But you are right: It only sounds better, as long as you dont get a non-constant noise for a constant.

Quote

Cool Edit is a safer choice than Goldwave in terms of quality. Don't have much experience but I do know that for instance the resampling in Goldwave is pretty bad

I think, I'll try Cool Edit Pro 2.1 / Adobe Audition (quite expensive), too.
so far.. .lu

Speech compressed as MP3 for Stream and Download

Reply #21 – 2003-08-25 14:57:46

Quote

Quote
(the lower [the highpass-filter value,] the better to avoid later encoding artifacts)

Again, because there is less information, which has to fit into the given kbps-limitation!?

Exactly - actually I meant "the lower the *low*pass-filter value ..."

Quote

Sample rates above '2x[value of lowpassfilter]' do not make much sense?

Sample rates *lower* than '2x[value of lowpassfilter]' do not make sense because the highest frequency that can be put into a PCM file is 0.5xsamplingrate.
Using sampling rates much higher than '2x[value of lowpassfilter]' is unnecessary and wastes bits as lower sampling rates need less bits.

Quote

If so, I dont understand, why @ Lame Settings For Speech they recommendet
Quote
24kbps speech:
--alt-preset 24 -a --resample 22 --lowpass 7 -Z

Maybe there's a reason for using --resample 22 instead of --resample 16 I don't know. It should be easy to do some quick comparisions to find out which one sounds better.

Quote

Yet, I used 32kbps, because it still is mpeg1 Layer3 (and this more compatible?!)..

Sounds reasonable. As I don't encode speechy things for others and all players I use can handle lower bitrates (mpeg2 Layer 3) I can't tell how much of a problem this could be.

Quote

For my requirements in terms of the quality, in my own experiments I did not like bitrates less than 64kbps. Would you also resemple to 16kHz (with lowpass 7) for 64kbps - or is 32kbps all right?

OK: You've said you want MPEG1 so you can't resample to anything lower than 32kHz. So as 64kbps setting (ABR) I'd try
--alt-preset 64 -m m --resample 32 --lowpass 8
or even with lower lowpass.
If this sounds "too good" for you can go for lower bitrate.

Quote

Sorry, I don't really understand what all this means [span style='font-size:8pt;line-height:100%'](When I use the noise-reduction filter, I copy a part of the noise and than aply the filter. So there is no db value - see GoldWave Screenshot)[/span]

"db" seems to be CoolEdit-specific here. I gues the "Scale%" value is similar in GoldWave.

Quote

How does the noise reduction influence artifacts? What does 'headroom for encoding' mean?

Too much noise reduction introduces artifacts similar to encoding artifacts as removing only noise without touching the signal isn't possible. Artifacts introduced by noise reduction that are not noticable could become clearly audible during encoding. So it's important to choose a reasonably small amount of noise reduction.

Speech compressed as MP3 for Stream and Download

Reply #22 – 2003-08-30 11:26:26

About Low/Highpass: Oops, of course you are right!

Quote

OK: You've said you want MPEG1 so you can't resample to anything lower than 32kHz. So as 64kbps setting (ABR) I'd try
--alt-preset 64 -m m --resample 32 --lowpass 8

Is '-m m' identically to '-a' ? Don't I have to limit the minimal bitrate value to '-b32' (because of MPEG1)?

What do you think of the following command line for 80kbps / mono /vbr? (I like it - Which one would you use for a mono speech file?)

Code: [Select]

-b 32 -B 128 -a --abr 80 -F --resample 32 --lowpass 16 -h -c  --athtype 2 -X3

thanx again.. .lu

Speech compressed as MP3 for Stream and Download

Reply #23 – 2003-11-02 18:14:27

still looking for the best lame setting for 80 kbps (speech compression)... an idea?
What do you think about
--alt-preset 80 -a --resample 32 --lowpass 10

thx

Notice